We have quite busy phpbb based forum. I found that it is quite heavy loaded with bots(Google, AdSense, Yahoo. It seems that AdSense bot duplicate every user's request). I have added confuguration to cache all dynamic requests and return them to bots only(see config below). But I faced strange issue, when page returned from cache - nginx doesn't pass any headers back.
So questions are:
* Are there anything bad that response considered as http 0.9?
* Is it possible configure it to return headers in case when page was returned from cache so response will be considered as 1.1?
* It seems that cache refreshed evertime when page was hit by user(not bot). Is it possible to configure it not to refresh it till it is valid(72h in config). To not produce not necessary IO.
Here is how it looks in logs
{{{ - - [03/Mar/2011:12:02:51 +0200] "GET /viewtopic.php?f=112&t=73799&p=2363346 HTTP/1.1" 009 175820 "-" "Mediapartners-Google"
Here is how response looks in wget -d --user-agent="Mediapartners-Google"...
HTTP request sent, awaiting response...
---response begin---
---response end---
200 No headers, assuming HTTP/0.9
Length: unspecified
Saving to: `index.php.3'
Config sample
set $crawlernocache 1;
if ($http_user_agent ~ ".*Google.*"){
set $crawlernocache 0;
if ($http_user_agent ~ ".*Yandex.*"){
set $crawlernocache 0;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Accept-Encoding "";
proxy_send_timeout 300;
set $delimeter +;
proxy_ignore_headers "Expires" "X-Accel-Expires" "Cache-Control";
proxy_cache_bypass $crawlernocache;
proxy_cache crawlercache;
proxy_cache_key $host$uri$arg_f$delimeter$arg_t$delimeter$arg_start$delimeter$arg_p;
proxy_cache_valid 200 72h;