Created attachment 2159 [details] spec file And eventually runs out of physical/swap.
Created attachment 2160 [details] squid.conf
This is ok, I configured it with 768MB: Cache information for squid: Storage Mem size: 777320 KB Storage Mem capacity: 99.8% used, 0.2% free But here... It's already using 1.7GB: Memory usage for squid via mallinfo(): Total space in arena: 1704836 KB Ordinary blocks: 1629355 KB 341896 blks Holding blocks: 280440 KB 2048 blks Free Ordinary blocks: 75480 KB Total in use: 1909795 KB 96% Total free: 75480 KB 4% Total size: 1985276 KB
Your Squid may be leaking. Please collect cache manager mgr:mem output samples. For example, twice per hour for a few hours. Something like the loop below may work: while squidclient mgr:mem >> mem.out; do sleep 1800; done Terminate after at least 2 hours and post mem.out here, compressed if needed. More samples would make it easier to identify the leak, but not all leaks can be identified that way. Thank you. N.B. Please note that cache_mem does not limit total Squid memory usage.
(In reply to comment #3) > Your Squid may be leaking. Please collect cache manager mgr:mem output samples. > For example, twice per hour for a few hours. Something like the loop below may > work: > > while squidclient mgr:mem >> mem.out; do sleep 1800; done > > Terminate after at least 2 hours and post mem.out here, compressed if needed. > More samples would make it easier to identify the leak, but not all leaks can > be identified that way. > > Thank you. > > N.B. Please note that cache_mem does not limit total Squid memory usage. Ok. I'm doing this right away. I just needed to kill squid on one proxy because it was using 2.2gb of RAM (still increasing) and squid was using 99% of the CPU. Squid was working but very very slowly ... even the squidclient mgr:mem didn't work. Normally squid doesn't use more than 5/10% of the CPU. It's the second time it happens. I'm not sure if this issue is memory related.
(In reply to comment #4) > (In reply to comment #3) > > Your Squid may be leaking. Please collect cache manager mgr:mem output samples. > > For example, twice per hour for a few hours. Something like the loop below may > > work: > > > > while squidclient mgr:mem >> mem.out; do sleep 1800; done > > > > Terminate after at least 2 hours and post mem.out here, compressed if needed. > > More samples would make it easier to identify the leak, but not all leaks can > > be identified that way. > > > > Thank you. > > > > N.B. Please note that cache_mem does not limit total Squid memory usage. > > > Ok. I'm doing this right away. > I just needed to kill squid on one proxy because it was using 2.2gb of RAM > (still increasing) and squid was using 99% of the CPU. Squid was working but > very very slowly ... even the squidclient mgr:mem didn't work. > > Normally squid doesn't use more than 5/10% of the CPU. > > It's the second time it happens. I'm not sure if this issue is memory related. More details: when the problem occured today, the bandwidth (in both direction) was at 0Mb/s and CPU usage at 99/100%... so basically, nothing was happening. Request would not be served.
If sudden 100% CPU usage with no traffic is your symptom, see bug #1956 and especially comment thirteen there. Let's leave this bug specific to slow memory leaks.
(In reply to comment #6) > If sudden 100% CPU usage with no traffic is your symptom, see bug #1956 and > especially comment thirteen there. > > Let's leave this bug specific to slow memory leaks. Could it be possible that my slow memory leak is caused by the above mentioned bug? memory_pools is set to off (and in the other bug report, he mentions that memory_pools is turned off and he doesn't see the problem... which is not my case)
(In reply to comment #7) > (In reply to comment #6) > > If sudden 100% CPU usage with no traffic is your symptom, see bug #1956 and > > especially comment thirteen there. > > > > Let's leave this bug specific to slow memory leaks. > > Could it be possible that my slow memory leak is caused by the above mentioned > bug? > > memory_pools is set to off (and in the other bug report, he mentions that > memory_pools is turned off and he doesn't see the problem... which is not my > case) Ok, this is definitely the same problem I'm experiencing. I set the cache_mem to 512 MB but squid ends up using 2.1GB/2.2GB and then, finally it stops working with a 100% CPU usage and no bandwidth at all. I seem to be able to reproduce this problem once per week on a single CPU server. I never saw (up to now) the problem on a SMP server with 4 CPUs and more memory. Squid is configured with a 256MB cache_mem setting. Seems to never consume more than 440MB/450MB. Is 256MB a magic number? The SMP server is also using disk cache (which the non-SMP is not using)... maybe it will only delay the fatal crash of the squid process because I have a lot more memory?
(In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6) > > > If sudden 100% CPU usage with no traffic is your symptom, see bug #1956 and > > > especially comment thirteen there. > > > > > > Let's leave this bug specific to slow memory leaks. > > > > Could it be possible that my slow memory leak is caused by the above mentioned > > bug? It is possible (since the true bug cause is not known yet), but unlikely. > Ok, this is definitely the same problem I'm experiencing. > I set the cache_mem to 512 MB but squid ends up using 2.1GB/2.2GB and then, > finally it stops working with a 100% CPU usage and no bandwidth at all. If it is the same problem as bug #1956, you should not be updating this bug :-). For sudden 100% CPU utilization without load, use bug #1956. For slow memory leaks, use this bug #2927. Each bug has its own instructions on what to do next to triage it. You may be suffering from both bugs, but that does not change the triage instructions.
Created attachment 2164 [details] mgr:mem output
(In reply to comment #10) > Created an attachment (id=2164) [details] > mgr:mem output Assuming your Squid load was more or less steady during captures, the following structures may be leaking: cbdata_ErrorState_(27) +26 +14 +20 +20 +13 +15 +5 +8 +6 +12 +41 +9 +5 +1 +1 +2 +0 +0 +0 +1 +0 +0 +1 +0 +0 +1 +0 +1 +1 +1 +0 +0 +1 +0 +1 +1 +4 +0 +0 +24 +190 +175 +143 +0 +721 +605 +565 +459 +346 +144 +9 +37 +0 +1 HttpHdrRangeSpec +5 +2 +13 +11 +17 +6 +0 +146 +156 +13 +6 +0 +7 +0 +0 +0 +21 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +53 +0 +0 +0 +0 +0 +0 +0 +4 +2 +8 +4 +0 +3 +1 +0 +5 +3 +4 HttpHdrRange +3 +2 +12 +11 +17 +6 +0 +144 +156 +13 +6 +0 +7 +0 +0 +0 +21 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +53 +0 +0 +0 +0 +0 +0 +0 +4 +2 +8 +4 +0 +3 +1 +0 +3 +3 +4 +N is increase in the number of these structures for each mgr:mem snapshot (i.e., every 30 minutes). None of these increases are huge so it is possible they are just noise and the leaking structures are not accounted for. Nevertheless, these should be checked. I found one ErrorState leak already (I will attach a patch) but there may be more. Have you ever run Squid under valgrind? Do you think you can do that?
Created attachment 2165 [details] one ErrorState leak fix Partial bug #2927 fix: Squid-3.1.3 consumes all memory gradually ErrorState object was not destroyed if the failed request to the origin server or peer was retried. This bug was present since r9398 (2009-02-01). If that change was ported from Squid2, there may be a similar bug there.
Created attachment 2166 [details] Range leak fix [I thought I posted this patch a month ago, but apparently not.] Prevent memory leaks when cloning Range requests. HttpRequest::range field was set to a new HttpHdrRange object twice: once in HttpRequest::clone() and once in HttpRequest::hdrCacheInit() called from clone(). Polished HttpReply::clone() to make sure HttpReply::hdrCacheInit() does not use uninitialized HttpReply::sline field and to prevent benign double-initialization of HttpReply::keep_alive.
Please apply both posted patches. If the leak is still there, please collect and post fresh mgr:mem output. If you can also add valgrind memory check output that could be very helpful.
(In reply to comment #14) > Please apply both posted patches. If the leak is still there, please collect > and post fresh mgr:mem output. > > If you can also add valgrind memory check output that could be very helpful. I can't seem to be able to apply the patch on HttpReply.cc ... Is the patch meant for 3.1.3?
Created attachment 2167 [details] Failed to apply the range leak patch.
(In reply to comment #15) > (In reply to comment #14) > > Please apply both posted patches. If the leak is still there, please collect > > and post fresh mgr:mem output. > > > > If you can also add valgrind memory check output that could be very helpful. > > I can't seem to be able to apply the patch on HttpReply.cc ... > > Is the patch meant for 3.1.3? Can we delete the "Failed to apply the range leak patch" and all comments related to this failure? I successfully applied all patches on all patched files. I'm compiling it and will apply this to the worst affected proxy.
Do not worry about wrong comments. We will just ignore them.
Created attachment 2168 [details] Crash of today after patch.
Created attachment 2169 [details] Valgrind output.
(In reply to comment #19) > Created an attachment (id=2168) [details] > Crash of today after patch. The above mgr:mem output looks more like Squid overload than a slow memory leak: HttpHeaderEntry +66 +698 +162 +26 +50 +256 +14148 +24770 +23176 +24764 +60854 +5 StoreEntry +8 +69 +19 +4 +7 +28 +1553 +2736 +2588 +2624 +6293 +219 +470 mem_node +31 +160 +27 +3 +15 +15 +3674 +11825 +7662 +17721 +23682 -185 -245 MemObject +8 +69 +19 +4 +7 +28 +1553 +2736 +2588 +2624 +6293 +219 +470 Short_Strings +132 +1556 +299 +88 +109 +541 +31171 +54516 +50956 +54073 +130682 HttpReply +8 +69 +19 +4 +7 +28 +1558 +2732 +2589 +2625 +6293 +219 +471 ipcache_entry +12 +56 +7 +1 +3 +11 +178 +210 +258 +150 +3 -5 +1 HttpHdrCc +8 +53 +13 +4 +5 +26 +1019 +1924 +1595 +1576 +3914 +1172 +108 Medium_Strings +16 +20 +9 +8 +11 +9 +239 +458 +462 +732 +3326 +600 +0 cbdata_MemBuf_(8) +206 +162 +93 +124 +106 +159 +1689 +2887 +2642 +2634 +6302 +29 MD5_digest +8 +69 +19 +4 +7 +28 +1553 +2736 +2588 +2624 +6293 +219 +470 This looks very different from the mgr:mem output you have posted before. Was Squid under more-or-less steady load during this log collection? Did you observe a slow memory usage increase? (In reply to comment #20) > Created an attachment (id=2169) [details] > Valgrind output. Valgrind does not show any leaks as far as I can tell. I presume this is the result of fixing the memory leaks with the previously attached patches.
The mgr:mem output is from the patched version and eventually reached 2.5GB of memory and died in a very short period of time. The proxy was under heavy use... you know, google, facebook, etc. The valgrind is with the unpatched version ... because I had to rollback because the patched version died faster. Tomorrow I won't have choice but to rollback to 3.0 ... I never noticed any bug with 3.0. What's strange is that we have 3 proxies with 3.1.3 and only one has a problem. They are almost all using the same hardware, 64bits. One of them is used a bit more and is pretty stable. So I can't figure out why that is happening. I noticed thought that the one with the most problem have many "0 0" in the logfiles. "http_error_code size" . Like if some connections were failing thus, not returning a 404/200/etc and a size. (In reply to comment #21) > (In reply to comment #19) > > Created an attachment (id=2168) [details] [details] > > Crash of today after patch. > > The above mgr:mem output looks more like Squid overload than a slow memory > leak: > > HttpHeaderEntry +66 +698 +162 +26 +50 +256 +14148 +24770 +23176 +24764 +60854 > +5 > StoreEntry +8 +69 +19 +4 +7 +28 +1553 +2736 +2588 +2624 +6293 +219 +470 > mem_node +31 +160 +27 +3 +15 +15 +3674 +11825 +7662 +17721 +23682 -185 -245 > MemObject +8 +69 +19 +4 +7 +28 +1553 +2736 +2588 +2624 +6293 +219 +470 > Short_Strings +132 +1556 +299 +88 +109 +541 +31171 +54516 +50956 +54073 +130682 > HttpReply +8 +69 +19 +4 +7 +28 +1558 +2732 +2589 +2625 +6293 +219 +471 > ipcache_entry +12 +56 +7 +1 +3 +11 +178 +210 +258 +150 +3 -5 +1 > HttpHdrCc +8 +53 +13 +4 +5 +26 +1019 +1924 +1595 +1576 +3914 +1172 +108 > Medium_Strings +16 +20 +9 +8 +11 +9 +239 +458 +462 +732 +3326 +600 +0 > cbdata_MemBuf_(8) +206 +162 +93 +124 +106 +159 +1689 +2887 +2642 +2634 +6302 > +29 > MD5_digest +8 +69 +19 +4 +7 +28 +1553 +2736 +2588 +2624 +6293 +219 +470 > > This looks very different from the mgr:mem output you have posted before. Was > Squid under more-or-less steady load during this log collection? Did you > observe a slow memory usage increase? > > > (In reply to comment #20) > > Created an attachment (id=2169) [details] [details] > > Valgrind output. > > Valgrind does not show any leaks as far as I can tell. I presume this is the > result of fixing the memory leaks with the previously attached patches.
Oh, and that one use a delay_pools... the others are not. (In reply to comment #22) > The mgr:mem output is from the patched version and eventually reached 2.5GB of > memory and died in a very short period of time. The proxy was under heavy > use... you know, google, facebook, etc. > > The valgrind is with the unpatched version ... because I had to rollback > because the patched version died faster. Tomorrow I won't have choice but to > rollback to 3.0 ... I never noticed any bug with 3.0. > > What's strange is that we have 3 proxies with 3.1.3 and only one has a problem. > They are almost all using the same hardware, 64bits. > > One of them is used a bit more and is pretty stable. So I can't figure out why > that is happening. > > I noticed thought that the one with the most problem have many "0 0" in the > logfiles. "http_error_code size" . Like if some connections were failing > thus, not returning a 404/200/etc and a size. > > > (In reply to comment #21) > > (In reply to comment #19) > > > Created an attachment (id=2168) [details] [details] [details] > > > Crash of today after patch. > > > > The above mgr:mem output looks more like Squid overload than a slow memory > > leak: > > > > HttpHeaderEntry +66 +698 +162 +26 +50 +256 +14148 +24770 +23176 +24764 +60854 > > +5 > > StoreEntry +8 +69 +19 +4 +7 +28 +1553 +2736 +2588 +2624 +6293 +219 +470 > > mem_node +31 +160 +27 +3 +15 +15 +3674 +11825 +7662 +17721 +23682 -185 -245 > > MemObject +8 +69 +19 +4 +7 +28 +1553 +2736 +2588 +2624 +6293 +219 +470 > > Short_Strings +132 +1556 +299 +88 +109 +541 +31171 +54516 +50956 +54073 +130682 > > HttpReply +8 +69 +19 +4 +7 +28 +1558 +2732 +2589 +2625 +6293 +219 +471 > > ipcache_entry +12 +56 +7 +1 +3 +11 +178 +210 +258 +150 +3 -5 +1 > > HttpHdrCc +8 +53 +13 +4 +5 +26 +1019 +1924 +1595 +1576 +3914 +1172 +108 > > Medium_Strings +16 +20 +9 +8 +11 +9 +239 +458 +462 +732 +3326 +600 +0 > > cbdata_MemBuf_(8) +206 +162 +93 +124 +106 +159 +1689 +2887 +2642 +2634 +6302 > > +29 > > MD5_digest +8 +69 +19 +4 +7 +28 +1553 +2736 +2588 +2624 +6293 +219 +470 > > > > This looks very different from the mgr:mem output you have posted before. Was > > Squid under more-or-less steady load during this log collection? Did you > > observe a slow memory usage increase? > > > > > > (In reply to comment #20) > > > Created an attachment (id=2169) [details] [details] [details] > > > Valgrind output. > > > > Valgrind does not show any leaks as far as I can tell. I presume this is the > > result of fixing the memory leaks with the previously attached patches.
Created attachment 2170 [details] Another valgrind output.
(In reply to comment #22) > The mgr:mem output is from the patched version and eventually reached 2.5GB of > memory and died in a very short period of time. The proxy was under heavy > use... you know, google, facebook, etc. It is not clear to me whether your proxy ran out of RAM because it was overloaded or because it was leaking very fast. Mgr:mem output cannot distinguish these two cases. > The valgrind is with the unpatched version ... Did you ./configure Squid --with-valgrind-debug or did you disable memory pools in squid.conf? Without one of those two actions, valgrind will not see some of the leaks, including the leaks fixed by the patches. > I had to rollback because the patched version died faster. The posted patches should be valid and needed, but may not be sufficient. I do not think they can speed leaks up. > Tomorrow I won't have choice but to > rollback to 3.0 ... I never noticed any bug with 3.0. This is your call, of course. Just keep in mind that we may not be able to find and fix the leak soon without your help, especially if it is specific to your environment. > What's strange is that we have 3 proxies with 3.1.3 and only one has a problem. > They are almost all using the same hardware, 64bits. > > One of them is used a bit more and is pretty stable. So I can't figure out why > that is happening. > > I noticed thought that the one with the most problem have many "0 0" in the > logfiles. "http_error_code size" . Like if some connections were failing > thus, not returning a 404/200/etc and a size. One of the fixed leaks was the error page that could be created under such conditions but, again, there could be other leaks. Also, if only one proxy has many "0 0" entries, then that proxy may receive special/different traffic than the others. Do those "0 0" entries have something in common?
(In reply to comment #23) > Oh, and that one use a delay_pools... the others are not. Delay pools can easily overload a proxy under the "right" conditions because you are essentially telling the proxy to be the bottleneck between clients and servers. They also used to contain quite a few bugs but I am not sure whether that is still true. Did you use delay pools with Squid 3.0 as well?
(In reply to comment #26) > (In reply to comment #23) > > Oh, and that one use a delay_pools... the others are not. > > Delay pools can easily overload a proxy under the "right" conditions because > you are essentially telling the proxy to be the bottleneck between clients and > servers. They also used to contain quite a few bugs but I am not sure whether > that is still true. > > Did you use delay pools with Squid 3.0 as well? Yes . And with 2.6 too. Never had any problems. So we'll forget this. :)
(In reply to comment #25) > (In reply to comment #22) > > The mgr:mem output is from the patched version and eventually reached 2.5GB of > > memory and died in a very short period of time. The proxy was under heavy > > use... you know, google, facebook, etc. > > It is not clear to me whether your proxy ran out of RAM because it was > overloaded or because it was leaking very fast. Mgr:mem output cannot > distinguish these two cases. It's a virtual machine. It has 4GB of memory. > > > > The valgrind is with the unpatched version ... > > Did you ./configure Squid --with-valgrind-debug or did you disable memory pools > in squid.conf? Without one of those two actions, valgrind will not see some of > the leaks, including the leaks fixed by the patches. memory_pools is set to off (see the attached config file)... Are you talking about comment #13 in bug 1956? If I recompile squid with --with-valgrind-debug, will it be slower? > > > > I had to rollback because the patched version died faster. > > The posted patches should be valid and needed, but may not be sufficient. I do > not think they can speed leaks up. So this version is better, I'll try to keep it a bit more. > > > Tomorrow I won't have choice but to > > rollback to 3.0 ... I never noticed any bug with 3.0. > > This is your call, of course. Just keep in mind that we may not be able to find > and fix the leak soon without your help, especially if it is specific to your > environment. See above :) > > > > What's strange is that we have 3 proxies with 3.1.3 and only one has a problem. > > They are almost all using the same hardware, 64bits. > > > > One of them is used a bit more and is pretty stable. So I can't figure out why > > that is happening. > > > > I noticed thought that the one with the most problem have many "0 0" in the > > logfiles. "http_error_code size" . Like if some connections were failing > > thus, not returning a 404/200/etc and a size. > > One of the fixed leaks was the error page that could be created under such > conditions but, again, there could be other leaks. > > Also, if only one proxy has many "0 0" entries, then that proxy may receive > special/different traffic than the others. Do those "0 0" entries have > something in common? Nope. Doesn't seem they have something in common.
I've added "top" output of squid in the mgr_mem.out file. TOP: ESC[m13463 squid 15 0 398m 343m 4644 S 0.0 9.1 1:19.68 (squid) -s ESC[mESC[39;49m ESC[m13461 root 20 0 57412 2784 408 S 0.0 0.1 0:00.00 squid -s ESC[mESC[39;49m You can look at it too if it can help a bit. But I started this squid 10 minutes ago and it's growing slowly. Lunch time has past so usage is more normal... but we can see it grows and grows and grows. I'll attach another mgr_mem.out shortly. I'm compiling squid with valgrind debug, so I may provide something else later.
(In reply to comment #28) > It's a virtual machine. It has 4GB of memory. OK, but that does not tell us whether you are overloading or Squid is leaking. > > > The valgrind is with the unpatched version ... > > > > Did you ./configure Squid --with-valgrind-debug or did you disable memory pools > > in squid.conf? Without one of those two actions, valgrind will not see some of > > the leaks, including the leaks fixed by the patches. > > memory_pools is set to off (see the attached config file)... Are you talking > about comment #13 in bug 1956? I was not talking about the other bug. If memory_pools are off, then valgrind should detect leaks and it did not find any. This is an indication that you are overloading the proxy rather than leaking. Please note that your mgr:mem output indicated that memory pools were enabled but perhaps that was collected before you disabled them. > If I recompile squid with --with-valgrind-debug, will it be slower? I do not think you will notice significant performance difference, but I do not know for sure. You probably do not need --with-valgrind-debug if your memory pools are off. > > Also, if only one proxy has many "0 0" entries, then that proxy may receive > > special/different traffic than the others. Do those "0 0" entries have > > something in common? > > Nope. Doesn't seem they have something in common. I suspect they are related to transactions that died due to delay pools. Can you turn delay pools off and see what happens?
(In reply to comment #30) > (In reply to comment #28) > > > It's a virtual machine. It has 4GB of memory. > > OK, but that does not tell us whether you are overloading or Squid is leaking. > > > > > > The valgrind is with the unpatched version ... > > > > > > Did you ./configure Squid --with-valgrind-debug or did you disable memory pools > > > in squid.conf? Without one of those two actions, valgrind will not see some of > > > the leaks, including the leaks fixed by the patches. > > > > memory_pools is set to off (see the attached config file)... Are you talking > > about comment #13 in bug 1956? > > I was not talking about the other bug. If memory_pools are off, then valgrind > should detect leaks and it did not find any. This is an indication that you are > overloading the proxy rather than leaking. > > Please note that your mgr:mem output indicated that memory pools were enabled > but perhaps that was collected before you disabled them. They have always been turned off. How come they would show as turned on if I turned them off? I have the following in my squid.conf: "memory_pools off" > > If I recompile squid with --with-valgrind-debug, will it be slower? > > I do not think you will notice significant performance difference, but I do not > know for sure. You probably do not need --with-valgrind-debug if your memory > pools are off. Ok, next time it crash, I'll restart it with the new RPM I just finished creating. > > > > Also, if only one proxy has many "0 0" entries, then that proxy may receive > > > special/different traffic than the others. Do those "0 0" entries have > > > something in common? > > > > Nope. Doesn't seem they have something in common. > > I suspect they are related to transactions that died due to delay pools. Can > you turn delay pools off and see what happens? I'll try this. Next time it crash! :)
Created attachment 2171 [details] mgr_mem with memory_pools set to off in the configuration file and normal usage.
(In reply to comment #27) > (In reply to comment #26) > > (In reply to comment #23) > > > Oh, and that one use a delay_pools... the others are not. > > > > Delay pools can easily overload a proxy under the "right" conditions because > > you are essentially telling the proxy to be the bottleneck between clients and > > servers. They also used to contain quite a few bugs but I am not sure whether > > that is still true. > > > > Did you use delay pools with Squid 3.0 as well? > > Yes . And with 2.6 too. > Never had any problems. So we'll forget this. :) Sorry, I never tried 3.0 on that server. Only 2.6 and it was working fine. But I'm beginning to think that the problem is with delay_pools... 1956 has delay pool and same problem as I have... I'm pretty sure it has exactly the same symptoms.
> > Please note that your mgr:mem output indicated that memory pools were enabled > > but perhaps that was collected before you disabled them. > > They have always been turned off. How come they would show as turned on if I > turned them off? > > I have the following in my squid.conf: > "memory_pools off" Apparently, setting memory_pools to "off" does not turn memory pools off unless you also perform a secret dance under the full moon on Thursday. Sorry. I am not going to explain all those dance moves here. I hardly know them myself. If you suspect leaks, your best way forward is to run with --with-valgrind-debug and see if that can pinpoint any new leaks beyond those we already patched. You can leave "memory_pools off" in squid.conf. If you suspect overload, turn delay pools off and see if it helps.
(In reply to comment #34) > > > Please note that your mgr:mem output indicated that memory pools were enabled > > > but perhaps that was collected before you disabled them. > > > > They have always been turned off. How come they would show as turned on if I > > turned them off? > > > > I have the following in my squid.conf: > > "memory_pools off" > > Apparently, setting memory_pools to "off" does not turn memory pools off unless > you also perform a secret dance under the full moon on Thursday. Sorry. I am > not going to explain all those dance moves here. I hardly know them myself. > > If you suspect leaks, your best way forward is to run with > --with-valgrind-debug and see if that can pinpoint any new leaks beyond those > we already patched. You can leave "memory_pools off" in squid.conf. > > If you suspect overload, turn delay pools off and see if it helps. For the memory_pools, do I need to compile it with malloc like it says in the FAQ? I just looked at the 2.6 squid I have and it seems to have the same "memory pools" thing in mgr:mem...
It has been discovered that it's not very easy to disable memory pools in squid-3. The squid.conf setting is largerly ineffective, but the following two methods should work: a) Build squid with --disable-mempools configure option. b) Set the environment variable MEMPOOLS=0 when starting squid. The MEMPOOLS=0|1 environment variable can always be use to configure this runtime. All the --disable-mempools configure option does is to flip the default setting from 1 to 0.
Forgot to mention that running under valgrind also disables memory pools by default if compiled with valgrind support.
(In reply to comment #37) > Forgot to mention that running under valgrind also disables memory pools by > default if compiled with valgrind support. Seems like we didn't crash with delay_pools disabled ... Memory footprint is really small today. What's next?
3.1.4 is released fixing the difficulties with tuning memory pools mentioned earlier. But the memory pools issue should not be relevant to this bug. It's good to know that you do not experience the problem with delay pools turned off. This means that either your Squid was overloading due to the connection slowdown induced by delay pools, or you hit some unknown bug in the delay pools. Next step is to use valgrind again on a Squid compiled with --with-valgrind-debug as suggested earlier. Then try to trigger the leak and visit the cachemgr Memory Utilization page to have valgrind look for any leaks at runtime. This leak check is much more reliable than the automatic leak check on shutdown which gives many false reports.
(In reply to comment #39) > 3.1.4 is released fixing the difficulties with tuning memory pools mentioned > earlier. But the memory pools issue should not be relevant to this bug. > > It's good to know that you do not experience the problem with delay pools > turned off. This means that either your Squid was overloading due to the > connection slowdown induced by delay pools, or you hit some unknown bug in the > delay pools. > > Next step is to use valgrind again on a Squid compiled with > --with-valgrind-debug as suggested earlier. Then try to trigger the leak and > visit the cachemgr Memory Utilization page to have valgrind look for any leaks > at runtime. This leak check is much more reliable than the automatic leak check > on shutdown which gives many false reports. Ok, I already tried 3.1.4 with delay_pools and memory grows out of bound again. I turned delay_pools off and everything seems to be working fine. As I write this, I'm compiling it with valgrind debug enabled. Will keep you posted.
(In reply to comment #40) > (In reply to comment #39) > > 3.1.4 is released fixing the difficulties with tuning memory pools mentioned > > earlier. But the memory pools issue should not be relevant to this bug. > > > > It's good to know that you do not experience the problem with delay pools > > turned off. This means that either your Squid was overloading due to the > > connection slowdown induced by delay pools, or you hit some unknown bug in the > > delay pools. > > > > Next step is to use valgrind again on a Squid compiled with > > --with-valgrind-debug as suggested earlier. Then try to trigger the leak and > > visit the cachemgr Memory Utilization page to have valgrind look for any leaks > > at runtime. This leak check is much more reliable than the automatic leak check > > on shutdown which gives many false reports. > > Ok, I already tried 3.1.4 with delay_pools and memory grows out of bound again. > I turned delay_pools off and everything seems to be working fine. > > As I write this, I'm compiling it with valgrind debug enabled. > > Will keep you posted. I seem to have an issue with valgrind. root 7150 0.0 1.1 137912 42964 ? Ss 15:27 0:00 valgrind --tool=memcheck --leak-check=full --show-reachable=yes squid squid 7152 0.8 3.0 167736 116636 ? S 15:27 0:15 \_ (squid) squid 7153 0.0 0.0 15608 952 ? S 15:27 0:00 \_ (unlinkd) Will it report the memory leaks from the childs?
Created attachment 2174 [details] New valgrind output. New valgrind output.
(In reply to comment #42) > Created an attachment (id=2174) [details] > New valgrind output. > > New valgrind output. I'm sorry, that one wasn't done with valgrinddebug... I just restarted squid with the proper binaries. Will take a little while before I can provide a new output.
Created attachment 2175 [details] valgrind output from a leaking delay_pools valgrind output from a leaking delay_pools
Created attachment 2176 [details] mgr:mem output from a fast leaking delay_pools mgr:mem output from a fast leaking delay_pools
From the valgrind output there does not appear to be any formal leaks. So it's something that builds up internally in some queue or similar. From the mgr:mem output it's obvious the problem is related to "cbdata CbDataList".
Here are top (squid) lines from attachment #2176 [details]: 11255 squid 15 0 70736 18m 4620 R 0.0 0.5 0:01.08 (squid) 11255 squid 15 0 86852 31m 4656 S 0.0 0.8 0:02.53 (squid) 11255 squid 15 0 106m 55m 4656 S 0.0 1.5 0:04.51 (squid) 11255 squid 15 0 109m 59m 4656 S 0.0 1.6 0:06.63 (squid) 11255 squid 15 0 118m 67m 4656 S 0.0 1.8 0:07.75 (squid) 11255 squid 15 0 142m 91m 4656 S 0.0 2.4 0:09.64 (squid) 11255 squid 15 0 173m 121m 4656 S 0.0 3.2 0:13.85 (squid) 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:23.63 (squid) 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:24.78 (squid) 11255 squid 15 0 209m 159m 4656 S 2.0 4.2 0:26.49 (squid) 11255 squid 15 0 209m 159m 4656 S 2.0 4.2 0:28.86 (squid) 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:29.80 (squid) 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:32.18 (squid) 11255 squid 15 0 964m 911m 4656 S 4.0 24.1 2:09.16 (squid) 11255 squid 18 0 4114m 3.5g 3316 R 85.3 96.0 5:58.60 (squid) 11255 squid 18 0 4370m 3.3g 3200 R 5.8 89.6 6:31.18 (squid) What happened between 17:06:55 GMT and 17:11:56 GMT when memory usage jumped from a stable level of 209 MB to 964 MB? There is also a CPU usage jump. Did Squid start receiving a lot more traffic during those 5 minutes? Or did some "interesting" request, perhaps related to delay pools come? I cannot use the leak finding script against mgr:mem format without memory pools, but the sudden memory usage jump as reported by top looks surprising.
(In reply to comment #47) > Here are top (squid) lines from attachment #2176 [details]: > > 11255 squid 15 0 70736 18m 4620 R 0.0 0.5 0:01.08 (squid) > 11255 squid 15 0 86852 31m 4656 S 0.0 0.8 0:02.53 (squid) > 11255 squid 15 0 106m 55m 4656 S 0.0 1.5 0:04.51 (squid) > 11255 squid 15 0 109m 59m 4656 S 0.0 1.6 0:06.63 (squid) > 11255 squid 15 0 118m 67m 4656 S 0.0 1.8 0:07.75 (squid) > 11255 squid 15 0 142m 91m 4656 S 0.0 2.4 0:09.64 (squid) > 11255 squid 15 0 173m 121m 4656 S 0.0 3.2 0:13.85 (squid) > 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:23.63 (squid) > 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:24.78 (squid) > 11255 squid 15 0 209m 159m 4656 S 2.0 4.2 0:26.49 (squid) > 11255 squid 15 0 209m 159m 4656 S 2.0 4.2 0:28.86 (squid) > 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:29.80 (squid) > 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:32.18 (squid) > 11255 squid 15 0 964m 911m 4656 S 4.0 24.1 2:09.16 (squid) > 11255 squid 18 0 4114m 3.5g 3316 R 85.3 96.0 5:58.60 (squid) > 11255 squid 18 0 4370m 3.3g 3200 R 5.8 89.6 6:31.18 (squid) > > What happened between 17:06:55 GMT and 17:11:56 GMT when memory usage jumped > from a stable level of 209 MB to 964 MB? There is also a CPU usage jump. Did > Squid start receiving a lot more traffic during those 5 minutes? Or did some > "interesting" request, perhaps related to delay pools come? > > I cannot use the leak finding script against mgr:mem format without memory > pools, but the sudden memory usage jump as reported by top looks surprising. Nothing unusual ... Got 1679 request from 47 users at that time. The next day for the same time span we've got 3876 requests. Total transfert size: 22303941 Total requests: 1679 Average size: 13284.1 Minimum size: 0 Maximum size: 4877501 I can't account for what was streaming at that time (youtube, radio, etc) because it's not logged since the connection has not ended yet.
(In reply to comment #48) > (In reply to comment #47) > > Here are top (squid) lines from attachment #2176 [details] [details]: > > > > 11255 squid 15 0 70736 18m 4620 R 0.0 0.5 0:01.08 (squid) > > 11255 squid 15 0 86852 31m 4656 S 0.0 0.8 0:02.53 (squid) > > 11255 squid 15 0 106m 55m 4656 S 0.0 1.5 0:04.51 (squid) > > 11255 squid 15 0 109m 59m 4656 S 0.0 1.6 0:06.63 (squid) > > 11255 squid 15 0 118m 67m 4656 S 0.0 1.8 0:07.75 (squid) > > 11255 squid 15 0 142m 91m 4656 S 0.0 2.4 0:09.64 (squid) > > 11255 squid 15 0 173m 121m 4656 S 0.0 3.2 0:13.85 (squid) > > 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:23.63 (squid) > > 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:24.78 (squid) > > 11255 squid 15 0 209m 159m 4656 S 2.0 4.2 0:26.49 (squid) > > 11255 squid 15 0 209m 159m 4656 S 2.0 4.2 0:28.86 (squid) > > 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:29.80 (squid) > > 11255 squid 15 0 209m 159m 4656 S 0.0 4.2 0:32.18 (squid) > > 11255 squid 15 0 964m 911m 4656 S 4.0 24.1 2:09.16 (squid) > > 11255 squid 18 0 4114m 3.5g 3316 R 85.3 96.0 5:58.60 (squid) > > 11255 squid 18 0 4370m 3.3g 3200 R 5.8 89.6 6:31.18 (squid) > > > > What happened between 17:06:55 GMT and 17:11:56 GMT when memory usage jumped > > from a stable level of 209 MB to 964 MB? There is also a CPU usage jump. Did > > Squid start receiving a lot more traffic during those 5 minutes? Or did some > > "interesting" request, perhaps related to delay pools come? > > > > I cannot use the leak finding script against mgr:mem format without memory > > pools, but the sudden memory usage jump as reported by top looks surprising. > > > Nothing unusual ... > Got 1679 request from 47 users at that time. The next day for the same time > span we've got 3876 requests. > > Total transfert size: 22303941 > Total requests: 1679 > Average size: 13284.1 > Minimum size: 0 > Maximum size: 4877501 > > I can't account for what was streaming at that time (youtube, radio, etc) > because it's not logged since the connection has not ended yet. And just before it jumped: Total transfert size: 39968945 Total requests: 1476 Average size: 27079.2 Minimum size: 0 Maximum size: 5968049
Created attachment 2190 [details] Squid 3.1.4 ram usage. Munin.
Created attachment 2191 [details] Squid 3.1.4 requests. Munin.
Created attachment 2192 [details] Squid 3.1.4 traffic. Munin.
It seems I have this issue too. In squid 3.1.1 - 3.1.4 squid 2.7 didn't had this issue for me. I have ~60-70 users and ~3 users who have rights to download files. I use (and only compiled) ntlm authentification: /usr/bin/ntlm_auth --helper-protocol=squid-2.5-ntlmssp I use 3 delay pools with 3 class. Gentoo x86 stable here and squid 3.1.4 for now.
With --disable-delay-pools 3.1.5 still eat memory. All my config here: # squid -v Squid Cache: Version 3.1.5 configure options: '--prefix=/usr' '--build=i686-pc-linux-gnu' '--host=i686-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--sysconfdir=/etc/squid' '--libexecdir=/usr/libexec/squid' '--localstatedir=/var' '--with-pidfile=/var/run/squid.pid' '--datadir=/usr/share/squid' '--with-logdir=/var/log/squid' '--enable-auth=ntlm' '--enable-removal-policies=lru,heap' '--enable-external-acl-helpers=wbinfo_group,' '--enable-useragent-log' '--enable-cache-digests' '--disable-delay-pools' '--enable-referer-log' '--with-large-files' '--with-filedescriptors=8192' '--disable-strict-error-checking' '--enable-follow-x-forwarded-for' '--disable-ident-lookups' '--with-aio' '--with-default-user=squid' '--with-dl' '--enable-caps' '--disable-ipv6' '--enable-snmp' '--enable-ssl' '--disable-icap-client' '--disable-ecap' '--disable-zph-qos' '--enable-storeio=diskd,aufs' '--enable-linux-tproxy' '--enable-epoll' 'build_alias=i686-pc-linux-gnu' 'host_alias=i686-pc-linux-gnu' 'CC=i686-pc-linux-gnu-gcc' 'CFLAGS=-O2 -march=pentium4 -pipe --param l1-cache-size=8 --param l2-cache-size=512 --param l1-cache-line-size=64 -mmmx -msse -msse2' 'LDFLAGS=-Wl,-O1' 'CXXFLAGS=-O2 -march=pentium4 -pipe --param l1-cache-size=8 --param l2-cache-size=512 --param l1-cache-line-size=64 -mmmx -msse -msse2' --with-squid=/var/tmp/portage/net-proxy/squid-3.1.5/work/squid-3.1.5 --enable-ltdl-convenience Right now it uses: RES=490MB, VIRT=560MB and continues to grow
(In reply to comment #54) > With --disable-delay-pools 3.1.5 still eat memory. Please do not use this bug report for suspected leaks unrelated to delay pools. Open a new bug if you are convinced that your Squid is leaking. You will need to apply posted patches and collect more information about memory growth. See comment #3 for this bug and bug #2971 for some typical questions. See also bug #2964, especially if you have many failing transactions in your workload.
Created attachment 2219 [details] Additional range leak fix (In reply to comment #13) > Created attachment 2166 [details] > Range leak fix With 3.1.4 and this patch, I still have leaking HttpHdrRangeSpec and HttpHdrRange objects. The additional patch seems to fix that (even if Alex's mail to squid-dev seemed to imply it would only affect 3.0). (I am attaching this patch to this bug because the other range fix is also attached here. It has nothing to do with delay pools, though.)
> The additional patch seems to fix that (even if Alex's > mail to squid-dev seemed to imply it would only affect 3.0). I do not remember why I added "3.0 only" to that post subject. Perhaps I did not check whether 3.1 had the same problem, but it was clearly a poor choice of words! I can confirm that the same leak appears present in 3.1.
FYI: All previously posted fixes (attachment #2165 [details], attachment #2166 [details], and an equivalent of attachment #2219 [details]) have been committed to trunk.
Observing the same problem with squid 3.1.8.1 x86_64 (Fedora 13) build with simple delay pools setup: delay_pools 1 delay_class 1 1 delay_access 1 allow localnet1 delay_access 1 deny all delay_parameters 1 16000/16000 Squid consumes more and more RAM, with following output when stopped: Maximum Resident Size: 6382448 KB Page faults with physical i/o: 14927 Memory usage for squid via mallinfo(): total space in arena: 1571032 KB Ordinary blocks: 51905 KB 21467 blks Small blocks: 0 KB 6 blks Holding blocks: 33864 KB 5 blks Free Small blocks: 0 KB Free Ordinary blocks: 1519126 KB Total in use: 85769 KB 5% Total free: 1519126 KB 97%
3.1.8-20101024 still fast leaking with one class 3 pool.
(In reply to comment #60) > 3.1.8-20101024 still fast leaking with one class 3 pool. Bug 3096 could be related to this bug. Can you try if the patch from that bug helps: http://bugs.squid-cache.org/attachment.cgi?id=2310
(In reply to comment #61) > (In reply to comment #60) > > 3.1.8-20101024 still fast leaking with one class 3 pool. > > Bug 3096 could be related to this bug. Can you try if the patch from that bug > helps: > http://bugs.squid-cache.org/attachment.cgi?id=2310 3.1.9-20101102 breaking without notice in cache.log after minutes. I do not try to find more. Leaking was not visible in this time.
(In reply to comment #61) > (In reply to comment #60) > > 3.1.8-20101024 still fast leaking with one class 3 pool. > > Bug 3096 could be related to this bug. Can you try if the patch from that bug > helps: > http://bugs.squid-cache.org/attachment.cgi?id=2310 I saw the same problem reported with this bug and the attachment from Bug 3096 fixed it for me (using one class 1 pool). Haven't seen any sign of leaks since.
Thank you for confirming that. This is expected to be fixed in Squid 3.1.10 release then. *** This bug has been marked as a duplicate of bug 3068 ***
oops. *** This bug has been marked as a duplicate of bug 3096 ***
Hi, (In reply to comment #63) > I saw the same problem reported with this bug and the attachment from Bug 3096 > fixed it for me (using one class 1 pool). Haven't seen any sign of leaks since. FWIW: I just applied the patch on squid 3.1.9, but I still see an increasing memory consumption when running squid with delay pools (Class 1 and Class 3). With the patch the increasting rate seems to be quite a bit lower than before. Cheers, Michael