I am looking for ways to target every Nth request into a very busy proxy within an nginx configuration. This particular proxy is extremely busy and receives POSTs to a single URI, and taking an approach like sharding by IP would not be the kind of traffic sample we’re after.
The long term goal here is to replay some small amount (like 0.05%) of requests into a separate test environment. Currently I’m logging the entire request to ramdisk and using an every minute logrotation script in python to get the small proportion of requests I need, then using python ‘requests’ to replay them against the separate environment. This works, but the proxy underperforms its neighbors in the dns pool noticeably, and the RAM requirement is just too high for this to be sustainable long-term.
I’d much prefer to find some way to have nginx only log the data that is necessary. I’ve seen that there is an http_mirror command that came out very recently which is nearly perfect for my needs, but that leaves the problem of only mirroring a percentage of the traffic.
Thanks for your suggestions.