Bug 3237 - seq fault in free() from rfc1035RRDestroy at rfc1035.c:488
Summary: seq fault in free() from rfc1035RRDestroy at rfc1035.c:488
Status: RESOLVED FIXED
Alias: None
Product: Squid
Classification: Unclassified
Component: other (show other bugs)
Version: 3.1
Hardware: All All
: P2 critical
Assignee: SQUID BUGS ALIAS
URL:
Depends on:
Blocks:
 
Reported: 2011-06-06 03:29 UTC by M.A.Young
Modified: 2012-03-05 08:04 UTC (History)
5 users (show)

See Also:
Browser: ---
Fixed Versions:
Needs:


Attachments
Proposed patch by Jörg Lehrke (664 bytes, application/octet-stream)
2011-09-22 04:48 UTC, Andreas Kobara
Details
Proposed patch by Jörg Lehrke (664 bytes, patch)
2011-09-22 04:48 UTC, Andreas Kobara
Details
Patch to fix the alignment issue in dns_internal.cc (2.33 KB, patch)
2011-09-22 09:03 UTC, Joerg Lehrke
Details
Fix DNS answer handling (1.85 KB, patch)
2011-09-22 14:25 UTC, Joerg Lehrke
Details
Fix DNS answer handling (1.84 KB, patch)
2011-09-26 07:22 UTC, Joerg Lehrke
Details
Fix DNS answer handling (1.43 KB, patch)
2011-09-27 01:34 UTC, Joerg Lehrke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description M.A.Young 2011-06-06 03:29:59 UTC
We got the following traceback on one of our caches running 3.1.12.1-20110523

#0  0x0000003b87230265 in raise () from /lib64/libc.so.6
#1  0x0000003b87231d10 in abort () from /lib64/libc.so.6
#2  0x00000000005445f5 in death (sig=<value optimized out>) at tools.cc:398
#3  <signal handler called>
#4  0x0000003b8727288e in free () from /lib64/libc.so.6
#5  0x00000000005a4517 in rfc1035RRDestroy (rr=0x2732f988, n=1)
    at rfc1035.c:488
#6  0x00000000005a4571 in rfc1035MessageDestroy (msg=0x7fff26d4a440)
    at rfc1035.c:534
#7  0x00000000004be7d2 in idnsGrokReply (buf=<value optimized out>,
    sz=<value optimized out>) at dns_internal.cc:1144
#8  0x00000000004bfece in idnsRead (fd=11, data=<value optimized out>)
    at dns_internal.cc:1229
#9  0x00000000004ae1a5 in comm_select (msec=<value optimized out>)
    at comm_epoll.cc:307
#10 0x000000000055c24d in CommSelectEngine::checkEvents (
    this=<value optimized out>, timeout=321) at comm.cc:2687
#11 0x00000000004c440e in EventLoop::checkEngine (this=0x7fff26d4a650,
    engine=0x7fff26d4a6b0, primary=160) at EventLoop.cc:50
#12 0x00000000004c4563 in EventLoop::runOnce (this=0x7fff26d4a650)
    at EventLoop.cc:124
#13 0x00000000004c4658 in EventLoop::run (this=0x7fff26d4a650)
    at EventLoop.cc:94
#14 0x000000000050adef in SquidMain (argc=<value optimized out>,
    argv=0x7fff26d4a7d8) at main.cc:1416
#15 0x000000000050b3f6 in SquidMainSafe (argc=841888563, argv=0x2)
    at main.cc:1176
#16 main (argc=841888563, argv=0x2) at main.cc:1168
Comment 1 Amos Jeffries 2011-06-20 03:01:22 UTC
This appears to be rare. But segfault when handling DNS replies is a major vulnerability if it can be replicated and tracked to a particular packet sequence. Is there any way you are able to repeat and/or identify what the cause is please?
Comment 2 M.A.Young 2011-06-20 03:28:07 UTC
Isolating the cause would be difficult given how busy the caches are. I do have the cores for the crash above and a second crash (which occurred earlier today). 
I don't have the logs for the first crash, but cache.log entries around the second crash are

2011/06/20 03:25:09| TunnelStateData::Connection::error: FD 113: read/write failure: (110) Connection timed out
2011/06/20 04:55:30| ipcacheParse: No Address records in response to 'l15.sphotos.l3.fbcdn.net'
2011/06/20 04:55:30| ipcacheParse: No Address records in response to 'l15.sphotos.l3.fbcdn.net'
FATAL: Received Segment Violation...dying.
2011/06/20 04:55:30| storeDirWriteCleanLogs: Starting...
2011/06/20 04:55:30| WARNING: Closing open FD   19

The trace from the second crash is
#0  0x0000003b87230265 in raise () from /lib64/libc.so.6
#1  0x0000003b87231d10 in abort () from /lib64/libc.so.6
#2  0x00000000005445f5 in death (sig=<value optimized out>) at tools.cc:398
#3  <signal handler called>
#4  0x0000003b8727288e in free () from /lib64/libc.so.6
#5  0x00000000005a4517 in rfc1035RRDestroy (rr=0x2a554268, n=1)
    at rfc1035.c:488
#6  0x00000000005a4571 in rfc1035MessageDestroy (msg=0x7fff8d563340)
    at rfc1035.c:534
#7  0x00000000004be7d2 in idnsGrokReply (buf=<value optimized out>,
    sz=<value optimized out>) at dns_internal.cc:1144
#8  0x00000000004bfece in idnsRead (fd=8, data=<value optimized out>)
    at dns_internal.cc:1229
#9  0x00000000004ae1a5 in comm_select (msec=<value optimized out>)
    at comm_epoll.cc:307
#10 0x000000000055c24d in CommSelectEngine::checkEvents (
    this=<value optimized out>, timeout=404) at comm.cc:2687
#11 0x00000000004c440e in EventLoop::checkEngine (this=0x7fff8d563550,
    engine=0x7fff8d5635b0, primary=64) at EventLoop.cc:50
#12 0x00000000004c4563 in EventLoop::runOnce (this=0x7fff8d563550)
    at EventLoop.cc:124
#13 0x00000000004c4658 in EventLoop::run (this=0x7fff8d563550)
    at EventLoop.cc:94
#14 0x000000000050adef in SquidMain (argc=<value optimized out>,
    argv=0x7fff8d5636d8) at main.cc:1416
#15 0x000000000050b3f6 in SquidMainSafe (argc=33, argv=0x2) at main.cc:1176
#16 main (argc=33, argv=0x2) at main.cc:1168
Comment 3 Amos Jeffries 2011-06-20 05:01:15 UTC
Hmm. Seems to be when combining two no-IP result sets. I'll try and replicate that here. Meanwhile keep those cores, the packet(s) should be inside the GrokReply buffer details if we need to check anything.
Comment 4 Andreas Kobara 2011-09-22 02:15:15 UTC
The troublesome DNS-Records seem to be a CNAME poiting to another CNAME which is pointing to an empty A-Record.

$ dig l15.sphotos.l3.fbcdn.net

; <<>> DiG 9.2.4 <<>> l15.sphotos.l3.fbcdn.net
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 11625
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;l15.sphotos.l3.fbcdn.net.      IN      A

;; ANSWER SECTION:
l15.sphotos.l3.fbcdn.net. 7032  IN      CNAME   l15.sphotos.l3.fbcdn.net.c.footprint.net.
l15.sphotos.l3.fbcdn.net.c.footprint.net. 62 IN CNAME s1048.dnssecure.nsatc.net.

;; AUTHORITY SECTION:
nsatc.net.              9       IN      SOA     admin.nsatc.net. dl-cdn_infrastructure.level3.com. 1315346467 10800 2700 3600000 900

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Sep 22 10:09:50 2011
;; MSG SIZE  rcvd: 203

$ dig l15.sphotos.l3.fbcdn.net.c.footprint.net

; <<>> DiG 9.2.4 <<>> l15.sphotos.l3.fbcdn.net.c.footprint.net
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 37263
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;l15.sphotos.l3.fbcdn.net.c.footprint.net. IN A

;; ANSWER SECTION:
l15.sphotos.l3.fbcdn.net.c.footprint.net. 57 IN CNAME s1048.dnssecure.nsatc.net.

;; AUTHORITY SECTION:
nsatc.net.              4       IN      SOA     admin.nsatc.net. dl-cdn_infrastructure.level3.com. 1315346467 10800 2700 3600000 900

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Sep 22 10:09:55 2011
;; MSG SIZE  rcvd: 168

$ dig s1048.dnssecure.nsatc.net

; <<>> DiG 9.2.4 <<>> s1048.dnssecure.nsatc.net
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10019
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;s1048.dnssecure.nsatc.net.     IN      A

;; AUTHORITY SECTION:
nsatc.net.              899     IN      SOA     admin.nsatc.net. dl-cdn_infrastructure.level3.com. 1315346467 10800 2700 3600000 900

;; Query time: 6 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Sep 22 10:10:01 2011
;; MSG SIZE  rcvd: 117
Comment 5 Andreas Kobara 2011-09-22 02:38:51 UTC
I can reproduce this crash every time on a 32bit and 64bit build of squid 3.1.8 and 3.1.14 by simply entering the URL http://l15.sphotos.l3.fbcdn.net in a browser using this proxy.

Please consider this a critical DoS potential bug.


Starting Squid:

2011/09/22 10:30:30| Starting Squid Cache version 3.1.8 for i386-redhat-linux-gnu...
2011/09/22 10:30:30| Process ID 24890
2011/09/22 10:30:30| With 16383 file descriptors available
2011/09/22 10:30:30| Initializing IP Cache...
2011/09/22 10:30:30| DNS Socket created at [::], FD 7
2011/09/22 10:30:30| DNS Socket created at 0.0.0.0, FD 8
2011/09/22 10:30:30| Adding nameserver 127.0.0.1 from /etc/resolv.conf
2011/09/22 10:30:30| User-Agent logging is disabled.
2011/09/22 10:30:30| Referer logging is disabled.
2011/09/22 10:30:31| Unlinkd pipe opened on FD 13
2011/09/22 10:30:31| Store logging disabled
2011/09/22 10:30:31| Swap maxSize 163840000 + 131072 KB, estimated 12613159 objects
2011/09/22 10:30:31| Target number of buckets: 630657
2011/09/22 10:30:31| Using 1048576 Store buckets
2011/09/22 10:30:31| Max Mem  size: 131072 KB
2011/09/22 10:30:31| Max Swap size: 163840000 KB
2011/09/22 10:30:31| Version 1 of swap file with LFS support detected...
2011/09/22 10:30:31| Rebuilding storage in /var/spool/squid (CLEAN)
2011/09/22 10:30:31| Using Least Load store dir selection
2011/09/22 10:30:31| Set Current Directory to /var/spool/squid
2011/09/22 10:30:31| Loaded Icons.
2011/09/22 10:30:31| Accepting  HTTP connections at 127.0.0.1:3128, FD 16.
2011/09/22 10:30:31| HTCP Disabled.
2011/09/22 10:30:31| Accepting SNMP messages on 127.0.0.1:3401, FD 17.
2011/09/22 10:30:31| Squid modules loaded: 0
2011/09/22 10:30:31| Adaptation support is off.
2011/09/22 10:30:31| Ready to serve requests.
2011/09/22 10:30:31| Store rebuilding is 6.84% complete
2011/09/22 10:30:31| Done reading /var/spool/squid swaplog (59840 entries)
2011/09/22 10:30:31| Finished rebuilding storage from disk.
2011/09/22 10:30:31|     59840 Entries scanned
2011/09/22 10:30:31|         0 Invalid entries.
2011/09/22 10:30:31|         0 With invalid flags.
2011/09/22 10:30:31|     59840 Objects loaded.
2011/09/22 10:30:31|         0 Objects expired.
2011/09/22 10:30:31|         0 Objects cancelled.
2011/09/22 10:30:31|         0 Duplicate URLs purged.
2011/09/22 10:30:31|         0 Swapfile clashes avoided.
2011/09/22 10:30:31|   Took 0.68 seconds (87973.87 objects/sec).
2011/09/22 10:30:31| Beginning Validation Procedure
2011/09/22 10:30:31|   Completed Validation Procedure
2011/09/22 10:30:31|   Validated 119705 Entries
2011/09/22 10:30:31|   store_swap_size = 1305852
2011/09/22 10:30:32| storeLateRelease: released 0 objects


Entering http://l15.sphotos.l3.fbcdn.net in browser:


2011/09/22 10:30:34| ipcacheParse: No Address records in response to 'l15.sphotos.l3.fbcdn.net'
2011/09/22 10:30:34| ipcacheParse: No Address records in response to 'l15.sphotos.l3.fbcdn.net'
FATAL: Received Segment Violation...dying.
2011/09/22 10:30:34| storeDirWriteCleanLogs: Starting...
2011/09/22 10:30:34| WARNING: Closing open FD   16
2011/09/22 10:30:34|   Finished.  Wrote 59840 entries.
2011/09/22 10:30:34|   Took 0.03 seconds (2373943.75 entries/sec).
CPU Usage: 0.747 seconds = 0.312 user + 0.435 sys
Maximum Resident Size: 83680 KB
Page faults with physical i/o: 9
Memory usage for squid via mallinfo():
        total space in arena:    9460 KB
        Ordinary blocks:         9314 KB     18 blks
        Small blocks:               0 KB      1 blks
        Holding blocks:         11352 KB      9 blks
        Free Small blocks:          0 KB
        Free Ordinary blocks:     145 KB
        Total in use:           20666 KB 218%
        Total free:               145 KB 2%



Squid is restarting (not always successful):


2011/09/22 10:30:37| Starting Squid Cache version 3.1.8 for i386-redhat-linux-gnu...
2011/09/22 10:30:37| Process ID 24893
2011/09/22 10:30:37| With 16383 file descriptors available
2011/09/22 10:30:37| Initializing IP Cache...
2011/09/22 10:30:37| DNS Socket created at [::], FD 7
2011/09/22 10:30:37| DNS Socket created at 0.0.0.0, FD 8
2011/09/22 10:30:37| Adding nameserver 127.0.0.1 from /etc/resolv.conf
2011/09/22 10:30:37| User-Agent logging is disabled.
2011/09/22 10:30:37| Referer logging is disabled.
2011/09/22 10:30:38| Unlinkd pipe opened on FD 13
2011/09/22 10:30:38| Store logging disabled
2011/09/22 10:30:38| Swap maxSize 163840000 + 131072 KB, estimated 12613159 objects
2011/09/22 10:30:38| Target number of buckets: 630657
2011/09/22 10:30:38| Using 1048576 Store buckets
2011/09/22 10:30:38| Max Mem  size: 131072 KB
2011/09/22 10:30:38| Max Swap size: 163840000 KB
2011/09/22 10:30:38| Version 1 of swap file with LFS support detected...
2011/09/22 10:30:38| Rebuilding storage in /var/spool/squid (CLEAN)
2011/09/22 10:30:38| Using Least Load store dir selection
2011/09/22 10:30:38| Set Current Directory to /var/spool/squid
2011/09/22 10:30:38| Loaded Icons.
2011/09/22 10:30:38| Accepting  HTTP connections at 127.0.0.1:3128, FD 16.
2011/09/22 10:30:38| HTCP Disabled.
2011/09/22 10:30:38| Accepting SNMP messages on 127.0.0.1:3401, FD 17.
2011/09/22 10:30:38| Squid modules loaded: 0
2011/09/22 10:30:38| Adaptation support is off.
2011/09/22 10:30:38| Ready to serve requests.
2011/09/22 10:30:38| Store rebuilding is 6.84% complete
2011/09/22 10:30:38| Done reading /var/spool/squid swaplog (59840 entries)
2011/09/22 10:30:38| Finished rebuilding storage from disk.
2011/09/22 10:30:38|     59840 Entries scanned
2011/09/22 10:30:38|         0 Invalid entries.
2011/09/22 10:30:38|         0 With invalid flags.
2011/09/22 10:30:38|     59840 Objects loaded.
2011/09/22 10:30:38|         0 Objects expired.
2011/09/22 10:30:38|         0 Objects cancelled.
2011/09/22 10:30:38|         0 Duplicate URLs purged.
2011/09/22 10:30:38|         0 Swapfile clashes avoided.
2011/09/22 10:30:38|   Took 0.60 seconds (99200.13 objects/sec).
2011/09/22 10:30:38| Beginning Validation Procedure
2011/09/22 10:30:38|   Completed Validation Procedure
2011/09/22 10:30:38|   Validated 119705 Entries
2011/09/22 10:30:38|   store_swap_size = 1305852
2011/09/22 10:30:39| storeLateRelease: released 0 objects
Comment 6 Andreas Kobara 2011-09-22 04:48:35 UTC
Created attachment 2508 [details]
Proposed patch by Jörg Lehrke

added record data length check for memcpy and free clauses
Comment 7 Andreas Kobara 2011-09-22 04:48:43 UTC
Created attachment 2509 [details]
Proposed patch by Jörg Lehrke

added record data length check for memcpy and free clauses
Comment 8 Andreas Kobara 2011-09-22 06:23:59 UTC
Comment on attachment 2509 [details]
Proposed patch by Jörg Lehrke

tested on 3 systems, only working on 1 (don't knwo why), therefore patch removed.
Comment 9 Amos Jeffries 2011-09-22 06:58:09 UTC
Squid x*alloc() functions all perform the input validation and accept negative and zero sizes. xmalloc() is guaranteed to return at least one byte of allocated memory or to abort the Squid process if the system hits OOM.
Comment 10 Joerg Lehrke 2011-09-22 09:03:40 UTC
Created attachment 2510 [details]
Patch to fix the alignment issue in dns_internal.cc

The crash is caused by using memcpy for a complete array of structs. On most current architectures this will cause misaligned memory content. While freeing the structs Squid crashed.
Comment 11 Joerg Lehrke 2011-09-22 13:11:42 UTC
Comment on attachment 2510 [details]
Patch to fix the alignment issue in dns_internal.cc

Does not fix the issue, sorry!
Comment 12 Joerg Lehrke 2011-09-22 14:25:27 UTC
Created attachment 2512 [details]
Fix DNS answer handling

I guess I found the problem within dns_internal.cc now. message->ancount was not updated properly in idnsGrokReply().
Comment 13 Henrik Nordstrom 2011-09-26 05:49:53 UTC
The first two chunks of the patch are wrong.

The first chunk is not needed and I think it may cause unexpected effects on other code. As Amos already explained xmalloc handles zero length allocations fine and always returns some memory.

The first part of the second chunk is wrong (free). If the rr is assigned then it needs to be freed even if the length is 0.

The second part of the second chunk (while loop) is correct but should not be needed unless other code is buggy (see below). I find n-- > 0 easier to read however. The original implicit != 0 is not a very good loop condition.


If n == 0 at start then the while loop does nothing. The change in while condiiton only makes a difference if n should ever start out < 0 due to some other bug. Perhaps there should even be an assertion that n >= 0 here to trap any such bugs.


For the second half of the patch dealing with ancount I am not sure what it really does or when it makes a difference but it do smell like you may be on to something there. The way n and ancount is managed is a little odd where n MAY be less than ancount under some conditions and n is the actual number of answers in populated in the array and ancount gives the total size of the answer array.

But ancount is originally assigned in rfc1035MessageUnpack() when decoding the DNS message. It should only be adjusted by idnsGrokReply if it changes the answer array size.
Comment 14 Joerg Lehrke 2011-09-26 07:22:34 UTC
Created attachment 2513 [details]
Fix DNS answer handling

(In reply to comment #13)

Since you can't call xmalloc(0), you would need the first chunk of my patch, don't you?

The second chunk addresses cases where rr is allocated (!) but the data array length is not greater than zero. In these cases rr->data was never allocated (properly). You would need this for e.g. with my first chunk. idnsGrokReply() does call rfc1035RRDestroy() with negativ length, too.

But your are right, both these changes are not directly related to the issue. I stumbled over this stuff during my review.

The parts belonging to dns_intern.cc are fixing the issue, though. But you are right ancount should only by modified if the array was modified. I considered this with my new version of the patch.

I hope I could clarify my patch and I you see the benefit now.
Comment 15 Henrik Nordstrom 2011-09-26 14:32:53 UTC
+++ lib/rfc1035.c	2011-09-26 15:04:17.000000000 +0200
@@ -430,8 +430,10 @@
         break;
     case RFC1035_TYPE_A:
     default:
-        RR->rdata = (char*)xmalloc(rdlength);
-        memcpy(RR->rdata, buf + (*off), rdlength);
+	if (rdlength) {
+            RR->rdata = (char*)xmalloc(rdlength);
+            memcpy(RR->rdata, buf + (*off), rdlength);
+	}
         break;
     }


this is not needed. Fixed by the change in the first if in rfc1035RRDestroy. Would prefer RR->rdata is allocated even if 0, at least for now.



-    while (n--) {
-        if ((*rr)[n].rdata)
+    while (n-- > 0) {
+        if ((*rr)[n].rdata && (*rr)[n].rdlength)
             xfree((*rr)[n].rdata);
     }

This looks wrong. If rdata is set then it is allocated even if rdlength is 0, and needs to be freed.




+++ src/dns_internal.cc	2011-09-26 15:07:18.000000000 +0200
@@ -1126,16 +1126,17 @@
         /* free the RR object without freeing its child strings (they are now taken by the copy above) */
         safe_free(message->answer);
 
-        message->answer = result;
-        message->ancount += q->initial_AAAA.count;
         n += q->initial_AAAA.count;
-        q->initial_AAAA.count=0;
+        q->initial_AAAA.count = 0;
+        message->answer = result;
+        message->ancount = n;


Hmm.. this still looks somewhat wrong. ancount and n is not entirely the same thing. Same confusion also applies to initial_AAAA.count which is really ancount and not n from the AAAA response.

n is the total number of actual entries in the answer, while ancount is the expected number of entries and also the size of the array. n <= ancount

Perhaps both n and ancount should be saved in initial_AAAA. But in reality it's just n that we are interested in, not the ancount. Any entries in the answer array above n is all empty with no data to be used or freed.

From a quick code review there is also a related memory leak outside your changes. There is no code cleaning up initial_AAAA on aborted queries. Or perhaps even the complete answer message + n should be saved which eases cleanup.

The easiest way to add a cleanup hook is to add a free callback to the idns_query type by using CBDATA_INIT_TYPE_FREECB instead of CBDATA_INIT_TYPE. 

Another alternative would be to convert it to a CBDATA_CLASS class with a destructor taking care of the cleanup. But the usage API is then quite different and much more changes is needed to dns_internal.cc.
Comment 16 Joerg Lehrke 2011-09-27 00:11:40 UTC
(In reply to comment #15)
> +++ lib/rfc1035.c    2011-09-26 15:04:17.000000000 +0200
> @@ -430,8 +430,10 @@
>          break;
>      case RFC1035_TYPE_A:
>      default:
> -        RR->rdata = (char*)xmalloc(rdlength);
> -        memcpy(RR->rdata, buf + (*off), rdlength);
> +    if (rdlength) {
> +            RR->rdata = (char*)xmalloc(rdlength);
> +            memcpy(RR->rdata, buf + (*off), rdlength);
> +    }
>          break;
>      }
> 
> 
> this is not needed. Fixed by the change in the first if in rfc1035RRDestroy.
> Would prefer RR->rdata is allocated even if 0, at least for now.
>
Please check the xmalloc() man page. You are not supposed to call xmalloc(0).  
> 
> 
> -    while (n--) {
> -        if ((*rr)[n].rdata)
> +    while (n-- > 0) {
> +        if ((*rr)[n].rdata && (*rr)[n].rdlength)
>              xfree((*rr)[n].rdata);
>      }
> 
> This looks wrong. If rdata is set then it is allocated even if rdlength is 0,
> and needs to be freed.
Since xmalloc(0) did not allocate a rdatda buffer you should not try to free something here. Since e.g. idnsGrokReply() combines two replies, you have to expect an empty entry in the middle of your array.
> 
> 
> 
> 
> +++ src/dns_internal.cc    2011-09-26 15:07:18.000000000 +0200
> @@ -1126,16 +1126,17 @@
>          /* free the RR object without freeing its child strings (they are now
> taken by the copy above) */
>          safe_free(message->answer);
> 
> -        message->answer = result;
> -        message->ancount += q->initial_AAAA.count;
>          n += q->initial_AAAA.count;
> -        q->initial_AAAA.count=0;
> +        q->initial_AAAA.count = 0;
> +        message->answer = result;
> +        message->ancount = n;
> 
> 
> Hmm.. this still looks somewhat wrong. ancount and n is not entirely the same
> thing. Same confusion also applies to initial_AAAA.count which is really
> ancount and not n from the AAAA response.
Both buffers are freed after the entries were copied (and with them the references to the stings). All you need to free in the end are the string buffers and the new array. With using q->initial_AAAA.count and n you can ensure that. 
> 
> n is the total number of actual entries in the answer, while ancount is the
> expected number of entries and also the size of the array. n <= ancount
> 
If you worry about the empty entries, you could first copy the n entries from the actual message and append the initial_AAAA stuff afterwards. For the sake of this bug it does not matter and I didn't wanted to change the original flow to much.
> Perhaps both n and ancount should be saved in initial_AAAA. But in reality it's
> just n that we are interested in, not the ancount. Any entries in the answer
> array above n is all empty with no data to be used or freed.
Too handle these cases correctly you would need my changes of rfc1035.c, wouldn't you?
> 
> From a quick code review there is also a related memory leak outside your
> changes. There is no code cleaning up initial_AAAA on aborted queries. Or
> perhaps even the complete answer message + n should be saved which eases
> cleanup.
> 
> The easiest way to add a cleanup hook is to add a free callback to the
> idns_query type by using CBDATA_INIT_TYPE_FREECB instead of CBDATA_INIT_TYPE. 
> 
> Another alternative would be to convert it to a CBDATA_CLASS class with a
> destructor taking care of the cleanup. But the usage API is then quite
> different and much more changes is needed to dns_internal.cc.
I consider this but as as severe DoS-vulnerability. It should be fixes asap. A complete code redesign seems not the appropriate action to me. I'm quite sure that the code does not leave any memory leak in this area with my patch applied and the functionality is preserved. Please think it though again and I'm confident you will find it appropriate.
Comment 17 Joerg Lehrke 2011-09-27 00:14:40 UTC
(In reply to comment #15)


> From a quick code review there is also a related memory leak outside your
> changes. There is no code cleaning up initial_AAAA on aborted queries. Or
> perhaps even the complete answer message + n should be saved which eases
> cleanup.
Yes there is cbdataFree(q).
Comment 18 Joerg Lehrke 2011-09-27 00:15:07 UTC
(In reply to comment #15)


> From a quick code review there is also a related memory leak outside your
> changes. There is no code cleaning up initial_AAAA on aborted queries. Or
> perhaps even the complete answer message + n should be saved which eases
> cleanup.
Yes there is: cbdataFree(q).
Comment 19 Amos Jeffries 2011-09-27 00:26:23 UTC
(In reply to comment #16)
> Please check the xmalloc() man page. You are not supposed to call xmalloc(0). 

There is no xmalloc() man page. see comment 9
Comment 20 Joerg Lehrke 2011-09-27 01:34:24 UTC
Created attachment 2514 [details]
Fix DNS answer handling

(In reply to comment #19)

Ok, I skip almost all changes to rfc1035.c. The remaining patch for rfc1035.c is necessary because you want to free rr regardless of n.
Comment 21 Amos Jeffries 2011-10-06 06:58:40 UTC
(In reply to comment #20)
> Created attachment 2514 [details]

This one looks better. A quick look over fails to find anything obviously problematic.  How much testing has this had now?
Comment 22 Joerg Lehrke 2011-10-06 07:02:56 UTC
(In reply to comment #21)
We use this modification on our productive instances without any issue for almost two weeks now (about 400 users).
Comment 23 Amos Jeffries 2011-10-08 23:16:55 UTC
Applied to Squid-3.
Comment 24 Amos Jeffries 2011-10-10 20:08:47 UTC
Applied to 3.2 and 3.1.
Comment 25 Ray 2012-03-05 08:04:24 UTC
(In reply to comment #24)
> Applied to 3.2 and 3.1.

Is there any patches for 2.7? Thanks.