I ran into some problems with uri encoding. Problem is multiple and
different implementations of escaping and unescaping uri. And
because different programming language libraries use different ways
of encoding.
unescape_uri:
https://github.com/phusion/nginx/blob/master/src/core/ngx_string.c#L1336
These seems to be the same with each other. They differ from core
one by unescapeing '+' to ' '. I guess nginx conforms RFC 3986 and
external modules tries to be compatible with other programs like
PHP, .NET, Java.
https://github.com/agentzh/set-misc-nginx-module/blob/master/src/ngx_http_set_unescape_uri.c#L46
https://github.com/chaoslawful/lua-nginx-module/blob/master/src/ngx_http_lua_util.c#L1328
PHP encodes ' ' to '+' with urlencode
http://php.net/manual/en/function.urlencode.php
..NET Framework 4 encode ' ' to '+' with HttpUtility.UrlEncode
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx
Java 5-7 at least encode ' ' to '+'
http://download.oracle.com/javase/7/docs/api/java/net/URLEncoder.html
There is way to consolidate of unescape_uri. Add new type and then
add version checks on modules and use core version with proper type.
And extend modules to handle different types. Patch for nginx
attached. 0001-application-x-www-form-urlencoded-compatible-mode.patch
escape_uri:
I guess there was need for different implementations, but it might
be possible to consolidate external modules after this:
http://trac.nginx.org/nginx/changeset/4193/nginx
https://github.com/phusion/nginx/blob/master/src/core/ngx_string.c#L1505
These seems to be be the same. They differ from core somewhat. Core
version of uri_component almost the same as uri on modules
(!$*(),@`). Also args differ slightly (;&).
https://github.com/agentzh/set-misc-nginx-module/blob/master/src/ngx_http_set_escape_uri.c#L57
https://github.com/chaoslawful/lua-nginx-module/blob/master/src/ngx_http_lua_util.c#L1179
Could it be possible for set-misc and lua modules to use nginx core
version of uri_component and args?
This is almost the same as nginx core version of uri_component.
Couple of differences ( *~) and hex is uppercase. Commit message
hints that new encoding was needed for java.
https://github.com/yaoweibin/memc-nginx-module/blob/master/src/ngx_http_memc_request.c#L8
I guess this is for special need and not needed to consider further.
--
Markus Linnala, Chief Systems Architect
Cybercom Finland
Pakkahuoneenaukio 2 A; 33100 Tampere
Mobile +358 40 5919 735
Markus.Linnala@cybercom.com
www.cybercom.fi | www.cybercom.com
From ca8aab7ac68c0d58ab7e7ac736cf6d7d21e80b67 Mon Sep 17 00:00:00 2001
From: Markus Linnala <Markus.Linnala@cybercom.com>
Date: Mon, 7 Nov 2011 20:49:15 +0200
Subject: [PATCH] application/x-www-form-urlencoded compatible mode
Unescape '+' to ' '. Needed, if encoding is application/x-www-form-urlencoded compatible.
---
src/core/ngx_string.c | 7 +++++++
src/core/ngx_string.h | 1 +
2 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/src/core/ngx_string.c b/src/core/ngx_string.c
index 29f8e0d..39add87 100644
--- a/src/core/ngx_string.c
+++ b/src/core/ngx_string.c
@@ -1536,6 +1536,13 @@ ngx_unescape_uri(u_char **dst, u_char **src, size_t size, ngx_uint_t type)
break;
}
+ if (ch == '+'
+ && (type & (NGX_UNESCAPE_FORM_URL)))
+ {
+ *d++ = ' ';
+ break;
+ }
+
*d++ = ch;
break;
diff --git a/src/core/ngx_string.h b/src/core/ngx_string.h
index 2b9c59a..6158c40 100644
--- a/src/core/ngx_string.h
+++ b/src/core/ngx_string.h
@@ -199,6 +199,7 @@ u_char *ngx_utf8_cpystrn(u_char *dst, u_char *src, size_t n, size_t len);
#define NGX_UNESCAPE_URI 1
#define NGX_UNESCAPE_REDIRECT 2
+#define NGX_UNESCAPE_FORM_URL 4
uintptr_t ngx_escape_uri(u_char *dst, u_char *src, size_t size,
ngx_uint_t type);
--
1.7.6.4
_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel