Welcome! Log In Create A New Profile

Advanced

RE: nginx doesn't handle different URL encodings well

Pierre-Marie Baty
October 21, 2010 10:32AM
> Date: Thu, 21 Oct 2010 11:29:24 +0400
> From: mdounin@mdounin.ru
> To: nginx@nginx.org
> Subject: Re: nginx doesn't handle different URL encodings well
>
> Hello!
>
> On Thu, Oct 21, 2010 at 03:23:46AM +0200, Pierre-Marie Baty wrote:
>
> >
> > Hello Igor, hello all,
> >
> > Congratulations for your fantastic and neatly programmed web
> > server. It's a pleasure to use it.
> >
> > I have a problem with nginx not serving files with accentuated
> > characters when the sumbitted URL is UTF-8 encoded.
> >
> > Here is my nginx.conf : http://nginx.pastebin.com/aB7XRLM3 It's
> > a home webserver that is primarily used to serve stuff like
> > holiday photos.
> >
> > For example, I have a file called "été-2008.jpg" on my
> > webserver. When I request http://myserver/été-2008.jpg,
> > depending on whether the "Always send URLs as UTF-8" checkbox is
> > checked or not in the Internet Explorer advanced options, the
> > file is correctly served, or not.
> >
> > When the URL is Latin-1 encoded, the request sent is : GET
> > /%e9t%e9-2008.jpg ----> nginx resolves this to "été-2008.jpg",
> > the file is served, OK
> > When the URL is UTF-8 encoded, the request sent is : GET
> > /%C3%A9t%C3%A9-2008.jpg ----> nginx resolves this to
> > "été-2008.jpg", and the file is not served. (file not found)
> >
> > Shouldn't a fallback mechanism be implemented so that when a
> > file isn't found after an URL has been decoded, a second try is
> > made with another encoding ? I believe two RFCs are involved :
> > rfc2396 and rfc3986 (info given by PiotrSikora on IRC). IMO,
> > nginx shouldn't assume the URL it gets are always following the
> > same RFC. From what I know, this ambiguity is resolved in
> > Apache. Maybe they have that sort of fallback mechanism.
>
> The only (related to the question) difference between RFC2396 and
> RFC3986 is that later one recommends using UTF-8 for new URI
> schemes. There is no ambiguity between the two: character set for
> non-US-ASCII characters in http URLs isn't defined (though most
> browsers nowadays use UTF-8 by default).
>
> The only solution is to provide correct URLs, i.e. already
> encoded ones.
>
> If you think that "fallback mechanism" is a good idea - you may
> implement one with "try_files" directive and embedded perl module
> to do recoding between Latin1 and UTF-8. Note though that this
> may lead to unexpected results: "/%C3%A9" may be Latin1 "/é" as
> well as UTF-8 "/é".

Yes, it makes sense. But shouldn't nginx assume a UTF-8 encoding instead of assuming a Latin-1 one ? Since in the future all URI will adopt this encoding method. IMO a request like GET /%C3A9t%C3A9-2008.jpg should translate to /été-2008.jpg - and not the other way around, like it's the case currently.

Currently nginx assumes URLs are encoded in Latin1, whereas it should assume they're UTF-8 first. Don't you think ?

--
Pierre-Marie Baty _______________________________________________
nginx mailing list
nginx@nginx.org
http://nginx.org/mailman/listinfo/nginx
Subject Author Posted

nginx doesn't handle different URL encodings well

Pierre-Marie Baty October 20, 2010 09:28PM

Re: nginx doesn't handle different URL encodings well

helen October 20, 2010 09:57PM

Re: nginx doesn't handle different URL encodings well

helen October 20, 2010 10:32PM

Re: nginx doesn't handle different URL encodings well

Maxim Dounin October 21, 2010 03:34AM

RE: nginx doesn't handle different URL encodings well

Pierre-Marie Baty October 21, 2010 10:32AM

Re: nginx doesn't handle different URL encodings well

Maxim Dounin October 21, 2010 11:16AM

Re: nginx doesn't handle different URL encodings well

edogawaconan October 21, 2010 12:52PM

RE: nginx doesn't handle different URL encodings well

Pierre-Marie Baty October 21, 2010 05:14PM

Re: nginx doesn't handle different URL encodings well

Maxim Dounin October 21, 2010 07:28PM

Re: nginx doesn't handle different URL encodings well

edogawaconan October 21, 2010 08:44PM

RE: nginx doesn't handle different URL encodings well

Pierre-Marie Baty October 22, 2010 04:28AM

RE: nginx doesn't handle different URL encodings well

Pierre-Marie Baty October 22, 2010 05:34AM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 305
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready