Bug 3806 - Caching responses with "Vary" header
Caching responses with "Vary" header
Status: RESOLVED FIXED
Product: Squid
Classification: Unclassified
Component: other
3.2
PC x86_64 (64-bit) Linux - All
: P2 major
: 3.3
Assigned To: SQUID BUGS ALIAS
: 3799 (view as bug list)
Depends on:
Blocks: 4525
  Show dependency treegraph
 
Reported: 2013-03-13 19:50 UTC by scott.harris
Modified: 2016-05-27 13:17 UTC (History)
7 users (show)

See Also:
Browser: ---
Fixed Versions:
Needs:


Attachments
Set body size 0 on the vary marker object (696 bytes, patch)
2013-03-25 04:54 UTC, Amos Jeffries
Details
proposed temporary fix for trunk (6.22 KB, patch)
2013-11-08 22:41 UTC, Alex Rousskov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description scott.harris 2013-03-13 19:50:48 UTC
When objects retrieved from web server return with a vary header, object is never cached and all future requests are always logged as a MISS.

Disabling the vary header on web server causes the same object to be cached as expected.

Seems to be same issues as discussed here : http://www.squid-cache.org/mail-archive/squid-dev/201207/0100.html
Comment 1 Amos Jeffries 2013-03-25 04:48:31 UTC
*** Bug 3799 has been marked as a duplicate of this bug. ***
Comment 2 Amos Jeffries 2013-03-25 04:54:29 UTC
Created attachment 2854 [details]
Set body size 0 on the vary marker object

I am suspecting that the recent changes to cache max-size is related to this. The internal Vary marker object we use in the cache has no body but was specifying 'unknown' body length instead of none.
 Please test this patch.
Comment 3 Stephen Baynes 2013-03-26 10:39:50 UTC
(In reply to comment #2)
> Created attachment 2854 [details]
> Set body size 0 on the vary marker object
> 
> I am suspecting that the recent changes to cache max-size is related to this.
> The internal Vary marker object we use in the cache has no body but was
> specifying 'unknown' body length instead of none.
>  Please test this patch.

Just tried the patch on Squid 3.2 in the context of bug 3799 and it did not make any difference. Does it require a higher version of Squid?
Comment 4 scott.harris 2013-03-26 16:06:45 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Created attachment 2854 [details]
> > Set body size 0 on the vary marker object
> > 
> > I am suspecting that the recent changes to cache max-size is related to this.
> > The internal Vary marker object we use in the cache has no body but was
> > specifying 'unknown' body length instead of none.
> >  Please test this patch.
> 
> Just tried the patch on Squid 3.2 in the context of bug 3799 and it did not
> make any difference. Does it require a higher version of Squid?

I also have tested with squid 3.3.3 and made no difference, what debugging can I turn on and logs to provide to help diagnose?

Thanks

Scott
Comment 5 Priyanka 2013-04-04 00:35:00 UTC
Please find my analysis on this issue.
Squid 3.2.1
I have tracked this down to the way hash keys are generated and retrieved for requests with vary header.  If you go to the squid source code file src/store_key_md5.cc (around line 100):

const cache_key *
storeKeyPublicByRequestMethod(HttpRequest * request, const HttpRequestMethod& method)
{
    static cache_key digest[SQUID_MD5_DIGEST_LENGTH];
    unsigned char m = (unsigned char) method.id();
    const char *url = urlCanonical(request);
    SquidMD5_CTX M;
    SquidMD5Init(&M);
    SquidMD5Update(&M, &m, sizeof(m));
    SquidMD5Update(&M, (unsigned char *) url, strlen(url));

    if (request->vary_headers)
SquidMD5Update(&M, (unsigned char *) request->vary_headers, strlen(request->vary_headers));
    SquidMD5Final(digest, &M);

    return digest;
}

you can see that MD5-based keys are generated based on request method id, request URL, and request vary header.  The problem seems to be that when hash keys are generated during retrieval, the vary header field is empty and therefore although the resource is cached it cannot be retrieved because of the incorrect hash key.

If you comment out two lines from above as follows:

    //if (request->vary_headers)
//SquidMD5Update(&M, (unsigned char *) request->vary_headers, strlen(request->vary_headers));

and recompile squid you will see that such resources get served from cache indicating that the culprit is the vary header.

The part of the code that seems to be generating the hash keys for retrieval is inside src/store.cc (around line 625):

StoreEntry *
storeGetPublicByRequestMethod(HttpRequest * req, const HttpRequestMethod& method)
{
    return Store::Root().get(storeKeyPublicByRequestMethod(req, method));
}

Here the vary header of the http request object req is empty. Therefore storeKeyPublicByRequestMethod() returns the incorrect hash key for resource retrieval.  Based on my investigation of the code the vary header during request gets stored in mem_obj->request->vary_headers based on response header (mem_obj is a member of StorEentry).  For some reason during cache query the request member of mem_obj does not have valid vary_headers (it is empty). 


An example test case  is:

http://l1.yimg.com/zz/combo?os/mit/media/p/common/prime-dns-min-1137430.js&yui:3.8.1/build/yui/yui-min.js&ss/rapid-3.2.js&os/mit/media/m/base/imageloader-min-875847.js&os/mit/media/m/base/imageloader-bootstrap-min-815727.js&os/mit/media/m/base/viewport-loader-min-993847.js&os/mit/media/p/common/rmp-min-1127070.js

which has the vary header: “Vary:accept-encoding”.  If you go this location in your browser, empty the browser cache, and then try to reload it you will see a cache miss in squid.  If you comment out the two lines mentioned above and rebuild squid you will see that this becomes a cache hit.
Comment 6 Marco Andreatta 2013-05-23 03:50:51 UTC
I just want to report that this bug affects my client's e-commerce. I believe this is pretty serious: it's a common practice to compress js/css and "vary" header is mandatory in this case.
I'm unsure if wait for a fix or revert to 2.x version of squid.
Please, let me know if you plan to fix it asap.
Comment 7 Alex Rousskov 2013-05-23 10:01:36 UTC
(In reply to comment #5)
 
> you can see that MD5-based keys are generated based on request method id,
> request URL, and request vary header.  The problem seems to be that when hash
> keys are generated during retrieval, the vary header field is empty and
> therefore although the resource is cached it cannot be retrieved because of the
> incorrect hash key.

I am not intimately familiar with this code, but I think empty vary_headers are expected on the first lookup. Looking at request alone, Squid does not yet know what request headers are subject to Vary control. To know that, Squid needs to retrieve the "master object" from the cache, find Vary header information there, fill vary_headers based on that information, do a second lookup, and retrieve the matching response from the cache (if any). Only during that second lookup the vary_headers member will be filled.

If your test case exposes the bug, then somebody needs to find out why the "master object" is not found in the cache during the first lookup (the one with empty vary_headers). To do that, one needs to investigate whether that "master object" was cached in the first place (when the original Vary response was received) and, if it was cached, why it was purged later (if it was purged) or why it was stored using a different key (also resulting in a miss).
Comment 8 Eliezer Croitoru 2013-06-12 03:38:48 UTC
(In reply to comment #7)
> (In reply to comment #5)
> 
> > you can see that MD5-based keys are generated based on request method id,
> > request URL, and request vary header.  The problem seems to be that when hash
> > keys are generated during retrieval, the vary header field is empty and
> > therefore although the resource is cached it cannot be retrieved because of the
> > incorrect hash key.
> 
> I am not intimately familiar with this code, but I think empty vary_headers are
> expected on the first lookup. Looking at request alone, Squid does not yet know
> what request headers are subject to Vary control. To know that, Squid needs to
> retrieve the "master object" from the cache, find Vary header information
> there, fill vary_headers based on that information, do a second lookup, and
> retrieve the matching response from the cache (if any). Only during that second
> lookup the vary_headers member will be filled.
> 
> If your test case exposes the bug, then somebody needs to find out why the
> "master object" is not found in the cache during the first lookup (the one with
> empty vary_headers). To do that, one needs to investigate whether that "master
> object" was cached in the first place (when the original Vary response was
> received) and, if it was cached, why it was purged later (if it was purged) or
> why it was stored using a different key (also resulting in a miss).


So we need couple cases:
1. request that has vary header in it(multi or single) and should have in the reply.
2. request that has a vary response in it without having one in the request.
3. request that has vary header in it(multi or single)  and will not have vary header in the response.

The above cases will show how squid react in any situation that exists with vary headers on the request and response.
A vary header in the response should be equal to the request and if not still it should satisfy the request.
In the case which there is a vary header in the response but not in the request it should be ignored since HTTP should work by concept of "one request one response" and this is how a cache should cache the requests in http1.x.

if you do think of another cases put them here.

Eliezer
Comment 9 Alex Rousskov 2013-11-08 22:41:56 UTC
Created attachment 2969 [details]
proposed temporary fix for trunk

This patch contains several Vary caching fixes that make Vary caching work in my limited trunk tests. 

One of the fixes is temporary -- it disables Vary response caching in shared memory cache (other caches should still work with these fixes). The same temporary change may also remove excessive "Vary object loop!" warnings in cache.log for folks using a shared memory cache.

The same patch may apply to earlier Squid versions. If you want to experiment, try patching with --fuzz=20 (if needed) and see whether the patch applies and the patched code compiles.
Comment 10 Amos Jeffries 2013-12-11 22:44:35 UTC
(In reply to comment #9)
> Created attachment 2969 [details]
> proposed temporary fix for trunk

The startWriting() piece seems to be the core bug here and this fixes it.
Applied to Squid-3 for further testing, absent any more progress this will be in 3.5.
Comment 11 Amos Jeffries 2013-12-15 06:00:55 UTC
Applied to 3.4 and 3.3
Comment 12 Eliezer Croitoru 2015-11-23 21:34:05 UTC
(In reply to Amos Jeffries from comment #11)
> Applied to 3.4 and 3.3

To test the issue we can try to use the tiny server I wrote not so long ago.
What do you think?
Comment 13 support 2016-05-27 11:44:39 UTC
Hi All,
We're using the 3.5.19 and the problem is still present.

Traces:
  HTTP/1.1 200 OK
  Date: Fri, 27 May 2016 11:25:54 GMT
  Server: Apache/2.4.6 (CentOS) PHP/7.0.6
  X-Powered-By: PHP/7.0.6
  Content-Disposition: attachment; filename="cover_256.png"
  Vary: User-Agent
  Last-Modified: Sat, 05 Dec 2015 20:57:30 GMT
  ETag: "198be-5262cdf4cb480"
  Content-Length: 104638
  Content-Type: image/png
  X-Cache: MISS from mysquid
  X-Cache-Lookup: MISS from mysquid

URL to reproduce:
http://mld.iranapps.ir/resource/0d7423c0d744c8ec87b127860184716678dd7a35/com.uplayonline.strikersoccereuro_lite/32/cover_256.png

Here the main problem is crazy:
if the "minimum_object_size" is 0 (zero) so the object is cached correctly.
But if the "minimum_object_size" > 0 (ex: 7 KB), so the object is not cached, as you can see from traces (above) the object is around 104kb.
It seems Squid first creates an small object (around 377 bits) then create the complete object. But the first object is smaller than the "minimum_object_size" then the object cannot be cached.

sample of the 377 bit object:
Server: squid
Mime-Version: 1.0
Date: Tue, 24 May 2016 11:01:18 GMT
Content-Type: x-squid-internal/vary
Expires: Wed, 25 May 2016 14:47:58 GMT
Vary: Accept-Encoding
<eof>

Can you have a look, thanks in advance :o)

bye Fred
Comment 14 Amos Jeffries 2016-05-27 13:11:25 UTC
The small object is the cache entry we call a "variant marker object" that tells Squid how to use the Vary (or other) header to construct a cache lookup key that is more complex than just the URL / Store-ID value. It is behind the vary_headers value being set as mentioned in comment 5 and 6.

If you configure Squid such that those objects cannot be cached, then there is no way for Squid to identify the actual response variants in cache. This has the outward appearance of things not being cached (always MISS), even though they actually are stored. It is just that Squid cannot find where they are.

So ... this is not really a bug when you raise the minimum object size as decscribed. Though we should probably make sure these markers get stored somewhere regardless of administrative imits.

You can workaround it for now by allowing either memory cache, or one of the disk caches to store small objects.
Comment 15 Amos Jeffries 2016-05-27 13:17:28 UTC
I've cloned bug 4525 to track the issue of vary marker object havign limits applied to it.
Re-closing this bug report for the already fixed issue.