Running some experiments, I noticed that the Gunicorn server distributes incoming requests to workers without checking if those workers have free threads.
For example, if I have 2 workers, 1 thread per worker and worker_class=gthread, I can send two requests that take 10s each. It's possible for both requests to land on the same worker. This means that processing the two requests takes 20s instead of 10s.
After checking the listen socket of the server, I realized that the workers are immediately accept()ing the sockets from the backlog even when their threads are all busy.
Is there a way to prevent this behavior and prioritize workers with free threads?