Francis Daly
August 02, 2019 11:06AM
On Fri, Aug 02, 2019 at 03:11:05PM +0200, Vincent M. wrote:

Hi there,

> So I tried in http with empty charset_map:
>         charset_map iso-8859-1 utf-8 { }
> But special characters like é are displayed with ?

It seems to work for me as-is. What is different for you?

"work for me" means "the utf-8 character é becomes the 6 characters
é, which the html-viewer is expected to display as LATIN SMALL
LETTER E WITH ACUTE".

nginx.conf:
===
http {
charset_map iso-8859-1 utf-8 { }
server {
listen 9876;
charset utf-8;
}
server {
listen 9877;
charset iso-8859-1;
override_charset on;
location /x/ {
proxy_pass http://127.0.0.1:9876/;
}
}
}
===

$ cat html/a/index.html
little e: é; big E: É
$ od -bc html/a/index.html
0000000 154 151 164 164 154 145 040 145 072 040 303 251 073 040 142 151
l i t t l e e : 303 251 ; b i
0000020 147 040 105 072 040 303 211 040 012
g E : 303 211 \n
0000031

$ curl -i http://127.0.0.10:9876/a/ # headers edited
HTTP/1.1 200 OK
Server: nginx/1.17.2
Content-Type: text/html; charset=utf-8

little e: é; big E: É

$ curl -i http://127.0.0.10:9877/x/a/ # headers edited
HTTP/1.1 200 OK
Server: nginx/1.17.2
Content-Type: text/html; charset=iso-8859-1

little e: é; big E: É


And when I change nginx.conf to include a partial "correct" charset map:

===
charset_map iso-8859-1 utf-8 {
E9 C3A9;
}
===

$ curl -i http://127.0.0.10:9877/x/a/
HTTP/1.1 200 OK
Server: nginx/1.17.2
Content-Type: text/html; charset=iso-8859-1

little e: �; big E: É

$ curl -i http://127.0.0.10:9877/x/a/ | tail -n 1 | od -bc
0000000 154 151 164 164 154 145 040 145 072 040 351 073 040 142 151 147
l i t t l e e : 351 ; b i g
0000020 040 105 072 040 046 043 062 060 061 073 040 012
E : & # 2 0 1 ; \n
0000034

The utf-8 e-acute was changed to the correct iso-8859-1 octet (octal
351/hex e9/decimal 233), which my terminal renders as "unknown" because
it is invalid utf-8.

> Where to find a charset_map?

It should not be necessary, according to the nginx docs, due to the
html-replacement; but if you want one, you can find-or-create one.

Basically, every octet from A0 to FF maps to the utf-8 equivalent from
C2A0 to C2BF and from C380 to C3BF.

The format matches the three example charset-map files that nginx
provides.

Oh - as one other wrinkle -- it is possible that the visual character
e-acute is *not* sent as the octets C3A9; but is instead sent as the
octets 65CC81 (e, following by a combining acute accent) -- and off-hand,
I don't know nginx will convert that. Possibly é, which might not
render very nicely in your html viewer.

But before you worry about that extra wrinkle, see what octets are sent,
and see where the problem comes in that makes something show as the ?

Cheers,

f
--
Francis Daly francis@daoine.org
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Subject Author Posted

Setting Charset on Nginx PHP virtual host

Vincent M. July 31, 2019 11:30AM

Re: Setting Charset on Nginx PHP virtual host

Francis Daly August 01, 2019 04:08PM

Re: Setting Charset on Nginx PHP virtual host

Vincent M. August 02, 2019 09:12AM

Re: Setting Charset on Nginx PHP virtual host

Francis Daly August 02, 2019 11:06AM

Re: Setting Charset on Nginx PHP virtual host

Vincent M. August 04, 2019 09:12AM

Re: Setting Charset on Nginx PHP virtual host

Francis Daly August 04, 2019 05:58PM

Re: Setting Charset on Nginx PHP virtual host

Vincent M. August 05, 2019 08:32AM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 291
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready