Welcome! Log In Create A New Profile

Advanced

Re: How does Nginx look-up cached resource?

Maxim Dounin
September 07, 2015 12:20PM
Hello!

On Mon, Sep 07, 2015 at 03:34:27PM +0200, Sergey Brester wrote:

> On 06.09.2015 02:08, Maxim Dounin wrote:
>
> >Well, not, I don't confuse anything. For sure, brute force attack
> >on a 128 bit hash requires approximately 2^64 attempts.
>
> >That is, a single nginx instance with 2^64 cached resources will
> >likely show up a collision. But that's not a number of resources
> >you'll be able to store on a single node - in particular, because
> >64-bit address space wouldn't be enough to address that many
> >cached items.
>
> >To obtain a collision of a 128-bit hash with at least 1%
> >probability, you'll need more than 10^18 resources cached on a
> >single node, which is not even close to a something possible as
> >well.
>
> >Assuming 1 billion of keys (which is way more than a single nginx
> >node can handle now, and will require about 125G of memory for a
> >cache keys zone), probability of a collision is less than 10^(-20).
>
> >Quoting https://en.wikipedia.org/wiki/Birthday_attack [2]:
>
> >For comparison, 10^(-18) to 10^(-15) is the uncorrectable bit
> >error rate of a typical hard disk.
>
> 1) I will try to explain you, that is not quite true with a small
> approximation: let our hash value be exact one byte large (8 bit), it can
> obtain 2^8 = 256 different hash values (and let it be perfect distributed).
> The relative frequency to encounter a collision - same hash for any random
> another inside this interval (Hv0 - Hv255) will be also 256, because circa
> each 256-th char sequence will obtain the same hash value (Hv0).

Sure.

> Will be such hash safe? Of course not, never.

That depends on the use case, actually, but it's certainly not a
good hash to use for identification of multiple documents.

> But if we will hash any character sequences with max length 16 bytes, we
> will have 256^16 (~= 10^38) different variations of binary string (keys).
> The relation (and the probability (Pc) to have a collision for two any
> random strings) would be only (10^38/256 - 1)/10^38 * (10^38/256 - 2)/(10^38
> - 1) ~= 0.000015.

As long as you have two random 16-byte (128-bit) strings, they
will collide with probablility 1/(2^128): one string can't
collide with itself, and another one will collide if happens to be
the same as the first one.

Hash values of these two strings will collide with probablility of
1/256. It only depends on the size of the hash output, not sizes
of input strings.

I have no idea how did you get numbers you are claiming, they look
wrong.

> Small, right? But the relative frequency (Hrc) is still 256! This can be
> explained with really large count of different sequences, and so with large
> count of hash values (Hv1-Hv255) that are not equal with (Hv0).

See above, hash values of two random strings will collide with
probability of 1/256.

> But let us resolve the approximation: the hash value obtain 2^128 (~=
> 10^38), but how many values maximum should be hashed? It's unknown. Let our
> keys contain maximum 100 bytes, the count of variations of all possible
> strings will be 256^100 (~= 10^240). The probability to encounter of a
> collision and the relative frequency to encounter a collision will be a
> several order smaller (1e-78), but the relation between Hrc and Pc is
> comparable to example above (in the sense of relation between of both). And
> in this relation is similar (un)safe. Yes less likely (well 8 vs 128 bits)
> but still "unsafe".

The Wikipedia article about "Birthday attack" I linked
explains how to calculate probabilities of collisions depending on
hash output size and number of values hashed. Please read it
carefully:

https://en.wikipedia.org/wiki/Birthday_attack

Trying to say that numbers calculated are "unknown" isn't
constructive.

[...]

> 2) For the *caching* it's at all not required to have such "safe" hash
> functions:
> - The hash function should create reasonably perfect distributed values;
> - The hash function should be fast as possible (we can get MurmurHash or
> something similar, significantly faster than md5);
> - We should always compare the keys, after cache entry with hash value was
> found, to be sure exact the same key was found; But that does not make our
> cache slower, because the generation of hash value can be faster through
> algorithms faster as md5 (for example the MMH3 is up to 90% faster as MD5);
> - In the very less likely case of collision we will just forget the cached
> entry with previous key or save it as array for serial access (not really
> expected by caching and large hash value, because rare and that is a cache -
> not a database that should always hold an entry).
>
> I want implement that and post a PR, and can make it configurable (something
> like `fastcgi_cache_safe = on`) then you can compare the performance of it -
> would be faster as your md5/crc32 implementation.

Apart from changing md5 to "something faster", this approach is
identical to just adding the full key check. I already explained
why this isn't needed.

--
Maxim Dounin
http://nginx.org/

_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Subject Author Views Posted

How does Nginx look-up cached resource?

Shuxin Yang 942 September 03, 2015 09:40PM

Re: How does Nginx look-up cached resource?

Maxim Dounin 306 September 04, 2015 09:24AM

Re: How does Nginx look-up cached resource?

Sergey Brester 348 September 04, 2015 11:38AM

Re: How does Nginx look-up cached resource?

Maxim Dounin 338 September 04, 2015 02:12PM

Re: How does Nginx look-up cached resource?

Sergey Brester 447 September 04, 2015 02:58PM

Re: How does Nginx look-up cached resource?

Maxim Dounin 297 September 04, 2015 03:44PM

Re: How does Nginx look-up cached resource?

Sergey Brester 334 September 04, 2015 05:02PM

Re: How does Nginx look-up cached resource?

Maxim Dounin 342 September 05, 2015 08:10PM

Re: How does Nginx look-up cached resource?

Sergey Brester 385 September 07, 2015 09:36AM

Re: How does Nginx look-up cached resource?

Maxim Dounin 361 September 07, 2015 12:20PM

Re: How does Nginx look-up cached resource?

Sergey Brester 302 September 07, 2015 12:34PM

Re: How does Nginx look-up cached resource?

Gena Makhomed 363 September 04, 2015 05:22PM

Re: How does Nginx look-up cached resource?

Maxim Dounin 327 September 05, 2015 09:58PM

Re: How does Nginx look-up cached resource?

Gena Makhomed 307 September 07, 2015 10:46AM

Re: How does Nginx look-up cached resource?

Maxim Dounin 293 September 07, 2015 01:00PM

Re: How does Nginx look-up cached resource?

Gena Makhomed 431 September 07, 2015 03:30PM

Re: How does Nginx look-up cached resource?

Sergey Brester 405 September 07, 2015 05:24PM

Re: How does Nginx look-up cached resource?

Gena Makhomed 558 September 07, 2015 07:20PM

Re: How does Nginx look-up cached resource?

Maxim Dounin 478 September 07, 2015 09:42PM

Re: How does Nginx look-up cached resource?

Gena Makhomed 441 September 08, 2015 05:08PM

Re: How does Nginx look-up cached resource?

Maxim Dounin 467 September 09, 2015 01:18PM

Re: How does Nginx look-up cached resource?

Sergey Brester 338 September 10, 2015 05:58AM

Re: How does Nginx look-up cached resource?

Sergey Brester 332 September 10, 2015 08:56AM

Re: How does Nginx look-up cached resource?

Maxim Dounin 357 September 10, 2015 10:48AM

Re: How does Nginx look-up cached resource?

Sergey Brester 319 September 10, 2015 11:08AM

Re: How does Nginx look-up cached resource?

Maxim Dounin 322 September 10, 2015 11:34AM

Re: How does Nginx look-up cached resource?

Sergey Brester 332 September 10, 2015 11:56AM

Re: How does Nginx look-up cached resource?

Maxim Dounin 346 September 10, 2015 01:00PM

Re: How does Nginx look-up cached resource?

Sergey Brester 611 September 10, 2015 04:54PM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 247
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready