Thanks for the response.
I understand what you are saying about the worker processes. We have only a single worker process.
I have 2 upstream servers.
To validate:
I am sending a request for a LARGE file. I see it hit server1. Server1 is now serving that request for the next couple of minutes.
I send a request for a very tiny file. I see it hit server2. It finishes (server1 is still processing request #1)
I send a request for a very tiny file. I see it hit server1 (even tho it is still serving request #1 and server2 is not serving anything)
I repeat that over and over, and I'll see all the subsequent requests being routed to server1, then 2, then 1 then 2.
If I submit another LARGE file request, if the last request went to server2, then now I have 2 LARGE file requests being served by server1.
If I submit more requests, they all continue to be equally distributed to server1 and server2 (even though server1 has 2 active things it is serving).
Is there some sort of a 'fudge factor' or threshold? That there has to be n number of requests that one server is handling more than another server? I wouldn't think so, but I'm at a loss.