Welcome! Log In Create A New Profile

Advanced

Danger to Nginx from raw unicode in paths?

David
January 25, 2015 08:08PM
I was recently wondering if I should filter URL's by characters to only
allow what is standard in applications.

Words, Numbers, and couple characters [.-_/\]. We know the list of
supported URL's and Domains is really just a subset of ASCII
http://perishablepress.com/stop-using-unsafe-characters-in-urls/.

However, I'm not totally sure what nginx does when I pass "µ" to it.

I came up with a simple regular expression to match something that isn't
one of those:

location ~* "(*UTF8)([^\p{L}\p{N}/\.\-\%\\\]+)" ) {
if ($uri ~* "(*UTF8)([^\p{L}\p{N}/\.\-\%\\\]+)" ) {

However, I'm wondering if I actually need to use the UTF-8 matching since
clients should default to URL encoding (%20) or hex encoding (\x23) the
bytes and the actual transfer should be binary anyway.

Here is an example test where I piped almost all 65,000 unicode points to
nginx via curl:

https://gist.github.com/Xeoncross/acca3f09c5aeddac8c9f

For example: $ curl -v http://localhost/与

Basically, is there any point to watching URL's for non-standard sequences
looking for possible attacks?

( FYI: I posted more details that led to this question here:
http://stackoverflow.com/questions/28055909/does-nginx-support-raw-unicode-in-paths
)
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Subject Author Posted

Danger to Nginx from raw unicode in paths?

David January 25, 2015 08:08PM

Re: Danger to Nginx from raw unicode in paths?

Jan-Philip Gehrcke January 26, 2015 09:44AM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 231
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready