Welcome! Log In Create A New Profile

Advanced

Can nginx handle/cache this robot handling case?

Ian Evans
June 24, 2013 03:46AM
Hi everyone.

First some background. I'm trying to integrate the method used by
Pixabay to handle Google Image Search's new design which makes it very
easy (one button click) for visitors to see an image outside of the
site's context. This has greatly slammed many sites' traffic and income.

This is how Pixabay got nginx to handle hijacking the button so the
image can be seen in the site's context:

"Hotlinking protection and watermarking for Google Images":
(http://pixabay.com/en/blog/posts/hotlinking-protection-and-watermarking-for-google-32/)

Part of the method is using what they call "trap URLs". i.e. adding "?i"
to img URLs in the source when the page is seen by a human. Bots like
googlebot don't see the ?i and so they get handled differently by nginx:

if ($args = "i") { set $watermark 0; }
if ($watermark = 1) {
add_header Cache-Control "no-cache, must-revalidate";
// rewrite "IMAGE_URL_REGEX" WATERMARK_URL last; <-optional serving of
a watermarked version of image.
}


Pixabay handles creating the "?i" addendum to the img tag in their
templates. I was looking for a method that was a little more
caching/performance friendly, so I suggested using jQuery to append the
"?i" to browsers at the document ready stage so bots would not get the
?i and the page could still be cached because the "?i" was added on the
client side. e.g.

var img= $("img.myimg");
img.attr("src", img.attr("src")+"?i");

The only problem is that the page now hits the server twice. Once when
loading /the/file/location/img.jpg and a second time after jQuery
changes it to loading /the/file/location/img.jpg?i

It was suggested that I could potentially add Varnish to my stack and
strip out the ?i from the URLs for bots then but I didn't want to add
something else to the stack

Can nginx and, say, the fastcgi cache (which I use) handle this
situation natively? Let's say all pages have the "?i" at the end of
their image locations so they're already there for the majority (human
traffic). Is there an efficient way for nginx, upon detecting a bot
agent, to strip the ?i (perhaps with
http://wiki.nginx.org/HttpSubsModule), serve and gzip and cache that
version, while serving/caching the original version to browsers?

It's a long post. (!) I just have an inkling that my fave server can
handle this. Just don't have the experience to configure it.

Thanks for any insight.

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Subject Author Posted

Can nginx handle/cache this robot handling case?

Ian Evans June 24, 2013 03:46AM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 244
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready