Hello,
Several companies have made zlib significantly faster by using modern CPU instructions (on the x86-64 arch) – typically, the CRC32 instructions, carryless multiplication (PCLMULQDQ), maybe some window rolling and SSE 4.x instructions.
We're having trouble using these zlib forks with nginx because of the hardcoding of the legacy zlib structures. You can see a summary of the issues on this thread: https://community.centminmod.com/threads/please-add-single-threaded-gzip-optimization.8166/
Maxim helped with some issues someone had using the Intel fork here: https://forum.nginx.org/read.php?2,252113,252113#msg-252113
Cloudflare fork: https://github.com/cloudflare/zlib
Intel fork: https://github.com/jtkukunas/zlib
zlib-ng: https://github.com/Dead2/zlib-ng (I think zlib-ng integrated both the Cloudflare and Intel forks, where they didn't overlap, and has done more work beyond those)
Intel post and paper: https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data
Cloudflare post: https://blog.cloudflare.com/cloudflare-fights-cancer/
The CPU hardware required for these patches is quite conservative, and pretty old – Intel's Westmere (2010), which was before Sandy Bridge, and AMD's Bulldozer (2011). It might make sense to integrate an optimized zlib into nginx at this point – at the very least it would help if nginx were updated to make integration with the optimized forks easier.
Thanks!