Maxim Dounin
November 08, 2022 04:52AM
Hello!

On Sun, Nov 06, 2022 at 11:50:38PM +0100, Alejandro Colomar wrote:

> On 11/5/22 03:39, Maxim Dounin wrote:
> > The ngx_memcpy() macro was introduced in 107:b5be4b0448d3
> > (nginx-0.0.1-2003-07-01-19:00:03 import) as an alias to memcmp(),
> > with the following explanation why it's just an alias:
> >
> > +/* msvc and icc compile memcmp() to inline loop */
> > +#define ngx_memcmp memcmp
> >
> > This was along with the first use of memcmp in nginx, so clearly
> > the "ngx_" prefix is not an accident.
>
> Hmm, interesting point.
>
> >
> > Indeed, the prefix is used in multiple places to provide optimized
> > or instrumented functions, even if normally they map to generally
> > available functions such as memcmp(). For example, ngx_memcpy()
> > usually maps to memcpy(), but instrumented to detect large copies
> > if NGX_MEMCPY_LIMIT is defined.
>
> Yeah, I can understand a wrapper over memcpy() and some others for those reasons.
>
> These days of heavily optimized assembly specializations in libc, it's not so
> easy to outperform the libc implementation, but it might still be interesting to
> check when there are large bottlenecks.
>
> memcmp(3) is likely to be less of a candidate for instrumenting and even less
> for optimization, but might be useful to have it around if its already there. I
> would not write a wrapper if it didn't exist, but probably I wouldn't remove it
> now that it exists. That makes some sense to me.
>
>
> > Even if the particular function
> > is not instrumented, the prefix is important to make it possible
> > to change things without huge modifications of the code, like the
> > one in your patch. Further, the prefix is also important for code
> > consistency.
>
> You may be interested in replacing these few that I just found, for added
> consistency:
>
> src/core/ngx_proxy_protocol.c:397:
> memcpy(&src_sockaddr.sockaddr_in.sin_addr, in->src_addr, 4);
> src/core/ngx_proxy_protocol.c:401:
> memcpy(&dst_sockaddr.sockaddr_in.sin_addr, in->dst_addr, 4);
> src/core/ngx_proxy_protocol.c:424:
> memcpy(&src_sockaddr.sockaddr_in6.sin6_addr, in6->src_addr, 16);
> src/core/ngx_proxy_protocol.c:428:
> memcpy(&dst_sockaddr.sockaddr_in6.sin6_addr, in6->dst_addr, 16);

Sure, seems to be slipped in in the 7251:416953ef0428 commit
introducing PROXY protocol v2 support. I've submitted a patch to
fix this.

> > And, last but not least, it makes it possible to add
> > casts if needed on some platforms
> > (or remove them if they are
> > considered wrong).
> >
> > In particular, casts to (const char *) for ngx_memcmp() were added
> > in 2007 by this commit (nginx 0.6.18):
> >
> > changeset: 1648:89a47f19b9ec
> > user: Igor Sysoev <igor@sysoev.ru>
> > date: Fri Nov 23 17:00:11 2007 +0000
> > summary: update ngx_memcmp()
> >
> > Unfortunately, this commit contains no details about the
> > particular system which needed the cast, though it looks like
> > there were some at least at that time.
>
> That date can make sense. 'void *' implementations were added after ANSI/ISO
> C89. Systems designed before that year, would mostly have 'char *'. Since some
> systems live up to 20+ years, by 2007 there would still be some pre-C89 systems
> around, which nginx had to support with the casts.
>
> Now in 2022 it's been 33 years after ISO C89, so all systems that lack 'void *'
> are officially dead. Nginx might still support some dead systems, though, so it
> merits a bit more investigation to confirm.
>
> Since memcmp(3) comes implemented together with the other mem...(3) functions,
> checking those should help:
>
> $ grepc ngx_memset
> ./src/core/ngx_string.h:89:
> #define ngx_memset(buf, c, n) (void) memset(buf, c, n)
> $ grepc ngx_cpymem
> ./src/core/ngx_string.h:97:
> #define ngx_cpymem(dst, src, n) (((u_char *) ngx_memcpy(dst, src, n)) + (n))
>
>
> ./src/core/ngx_string.h:107:
> #define ngx_cpymem(dst, src, n) (((u_char *) memcpy(dst, src, n)) + (n))
> $ grepc ngx_memmove
> ./src/core/ngx_string.h:143:
> #define ngx_memmove(dst, src, n) (void) memmove(dst, src, n)
>
>
> These have no casts, so we can bet that if those work without a cast, it's
> because they are implemented in the system with 'void *'. However, let's
> confirm that; after all, it could be that we only call them with types
> compatible with 'char *'.
>
> `grep -rn ngx_memcpy.*data.*len` should find cases where ngx_str_t is being used
> with ngx_str_t. It finds, among others, the following line:
>
> src/stream/ngx_stream_upstream_zone_module.c:300:
> ngx_memcpy(dst->server.data, src->server.data, src->server.len);
>
> ngx_str_t uses the incompatible 'u_char *', so a 'char *' memcpy(3) wouldn't
> work without a cast, so memcpy(3) definitely must be implemented with 'void *',
> which implies that all mem...(3) functions, including memcmp(3), are, with a
> very strong certainty, implemented following ISO C prototypes (i.e., they use
> 'void *').
>
> Just to confirm, both 'dst' and 'src' are of type
> 'ngx_stream_upstream_rr_peer_t', which is implemented as:
>
> $ grepc ngx_stream_upstream_rr_peer_t
> ./src/stream/ngx_stream_upstream_round_robin.h:17:
> typedef struct ngx_stream_upstream_rr_peer_s ngx_stream_upstream_rr_peer_t;
>
> $ grepc ngx_stream_upstream_rr_peer_s
> ./src/stream/ngx_stream_upstream_round_robin.h:19:
> struct ngx_stream_upstream_rr_peer_s {
> [...]
> ngx_str_t server;
>
> [...]
> };
>
> So that 'server' is definitely a ngx_str_t, which confirms that there's u_char*
> involved.

The "stream" module isn't a good example, given that it's not
built by default. But I generally agree that given no casts in
ngx_memcpy() it is highly unlikely that casts in ngx_memcmp() will
do anything meaningful, at least with current nginx code. Even if
these casts were needed on some platforms in 2007.

Another possibility is that these casts were simply added based on
the ngx_str*() macro definitions and weren't actually needed.
Especially given that ngx_memcmp() was only used by ngx_memn2cmp()
at that time, which is actually a string function.

> > Similar casts are used by almost all nginx string macro
> > definitions, since most C library functions accept "char *", while
> > nginx uses "u_char *" for strings, and this results in useless
> > warnings.
>
> That's true of str...(3) functions from <string.h>, but the mem...(3) functions
> from <string.h> use 'void *', so conversions are automatic, and casts are
> unnecessary (after ANSI/ISO C89; previously, K&R C did use 'char *').

Sure. The point is: similar casts are used by ngx_str*() macro
definitions, so it might be copied from there even if not really
needed. And another point is: avoiding casts in ngx_memcmp()
won't change anything, since there are lots of places with similar
casts.

> > While it might be seen as something better to avoid,
>
> Even if it's a bit off-topic, I'm very curious about the reason for using
> u_char. It definitely requires a lot of extra work compared to 'char *': casts,
> type-safety, reviewing that code just works when workarounding/disabling the
> compiler warnings. I'm guessing it was also some workaround for broken old
> implementations and it has just continued like that for consistency, but am
> curious if there are other better reasons. Certainly, ASCII characters behave
> well (at least nowadays) independently of the signedness of char, and usually
> one doesn't do arithmetic with characters in strings.

Using signed chars for strings simply does not work as long as you
consider 8-bit strings. It results in wrong sorting unless you do
care to compare characters as unsigned, requires careful handling
of all range comparisons such as "ch <= 0x20", does not permit
things like "ch < 0x80" or "c >= 0xc0", makes impossible to use
table lookups such as "basis64[s[0]]" (all snippets are from nginx
code).

The fact that signedness of "char" is not known adds even more
fun: you can't really do anything without casting it to either
unsigned char or signed char.

In general, using "char" for strings is a well known source of
troubles at least in the Cyrillic world. Writing the code which
works with arbitrary chars is tricky and error-prone as long as
you are doing anything more complex than just calling libc
functions. On the other hand, casts for external functions can be
easily abstracted in most cases, and always trivial.

> > the only alternative would be to provide wrapper functions, which
> > might not be a good idea for performance and readability reasons.
>
> Yeah, GNU always_inline or C99 inline functions could be better than macros for
> performance, but those are not always available for systems supported by nginx,
> so macros are the way to go here.
>
> > On the other hand, macro definitions with casts are easily
> > readable, have no impact on performance, and provide limited to
> > no effect on the code quality as long as proper coding, testing
> > and review process is implemented.
>
> Yes, they are the least evil, or even the only way, so that makes sense.
>
> >
> > Summing the above, certainly removing the "ngx_" prefix is a bad
> > idea.
>
> It can make sense, especially considering that nginx has always had it.
>
> > Removing casts from the particular macro might be
> > considered, though in general it looks like a quest to prove these
> > are not needed (or no longer needed) given the wide range of
> > platforms nginx supports.
>
> I don't know how far that support goes. Maybe Windows XP? Is that range even
> documented?

Not really. We certainly do support Windows XP, but nginx also
happened to work on HP-UX, AIX, and even on QNX at some point
(though there were warnings due to unsigned time_t, and likely
associated bugs). Some platforms are documented here:

http://nginx.org/en/#tested_os_and_platforms

And we certainly at least try to support anything reasonable /
usable.

On the other hand, it might be good enough to require -Wno-error
or an equivalent on ancient platforms.

> However, as far as the quest, I think the rationale above might be enough to
> prove it even without investigating specific systems. It could certainly be
> possible that some system had a very brain-damaged <string.h> where memcmp(3)
> used a 'char *' parameter and all other mem...(3) functions used 'void *', but I
> don't think that's likely at all.

I tend to agree that we can drop casts from the ngx_memcmp()
macro.

Note though that it's mostly nop change for the reasons given
above.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx-devel mailing list -- nginx-devel@nginx.org
To unsubscribe send an email to nginx-devel-leave@nginx.org
Subject Author Views Posted

[PATCH] Removed the unsafe ngx_memcmp() wrapper for memcmp(3)

Alejandro Colomar 517 November 04, 2022 11:26AM

Re: [PATCH] Removed the unsafe ngx_memcmp() wrapper for memcmp(3)

Alejandro Colomar 118 November 04, 2022 11:30AM

Re: [PATCH] Removed the unsafe ngx_memcmp() wrapper for memcmp(3)

Maxim Dounin 120 November 04, 2022 10:40PM

Re: [PATCH] Removed the unsafe ngx_memcmp() wrapper for memcmp(3)

Alejandro Colomar 107 November 06, 2022 05:52PM

Re: [PATCH] Removed the unsafe ngx_memcmp() wrapper for memcmp(3)

Maxim Dounin 104 November 08, 2022 04:52AM

Re: [PATCH] Removed the unsafe ngx_memcmp() wrapper for memcmp(3)

Alejandro Colomar 149 November 08, 2022 05:50AM

[PATCH v2] Removed the casts within ngx_memcmp()

Alejandro Colomar 98 November 08, 2022 05:58AM

Re: [PATCH v2] Removed the casts within ngx_memcmp()

Maxim Dounin 97 November 09, 2022 10:04AM

Re: [PATCH v2] Removed the casts within ngx_memcmp()

Maxim Dounin 116 November 29, 2022 11:24PM

Re: [PATCH v2] Removed the casts within ngx_memcmp()

Alex Colomar 85 November 30, 2022 05:50AM

Re: [PATCH v2] Removed the casts within ngx_memcmp()

Maxim Dounin 132 November 30, 2022 08:22PM

Re: [PATCH v2] Removed the casts within ngx_memcmp()

Sergey Kandaurov 113 November 30, 2022 07:18AM

Re: [PATCH v2] Removed the casts within ngx_memcmp()

Maxim Dounin 90 November 30, 2022 08:08PM

u_char vs char (was: [PATCH] Removed the unsafe ngx_memcmp() wrapper for memcmp(3))

Alejandro Colomar 101 November 08, 2022 06:18AM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 161
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready