Tuesday, February 22, 2011

Day 25 - testing BIG request bodies, request body handler

Remember, on Day 19 - Echo module and where nginx stops with posts (or not), I pointed out a forum thread explaining how you should proceed to retrieve data POSTed. Basically it boils down to:
  1. In your content handler, when the request you are processing is a post, register a request body handler with something like:
    ngx_http_read_client_request_body(r, ngx_http_rrd_body_received);
  2. Once the body has been retrieved, your new request body handler (ngx_http_rrd_body_received in my example above) will be called and you can do whatever you want with it.

nginx being asynchronous and all, that left me with a question: is the request body handler called once when the body has been fully loaded or is it called many times as the "chunks" become available. And I know only one way to figure it out: to test it.

Testing this specific case has proven to be more difficult than I thought and I spent quite some time on this. I'll try to walk you through my thought process and the results I got.
  1. I started simple just by POSTing some meaningful value. I quickly realized that by doing that, the body was so small that header+body were fitting in one IP packet and everything was read by nginx in one gulp. As a consequence, in this situation, the call to ngx_http_read_client_request_body() actually results in a call to the request body handler (ngx_http_rrd_body_received). This is nice but not really what I had in mind.
  2. So, I figured that if I increased the size of the body, things would start to get interesting. So, I created a BIG request body and posted it. And, I got stuck with an HTTP status of 413 (I had to lookup the translation on google ; it means "Request Entity too large"). Didn't even know that was possible. But after soem search, I discovered that there is even a directive in nginx to control it : client_max_body_size. I had no reason to change it, so I did not.
  3. Instead, I went for a POSTed entity that would be BIG (bigger than an IP packet) but still fit in client_max_body_size. And...I saw the exact same behavior that I had seen with small entities: ngx_http_read_client_request_body() directly calls the request body handler (ngx_http_rrd_body_received here). I guess this is because everything is local and there is no network latency to slow down things. So, I went for a coffee under my fig tree... ;)
  4. And I came back with an interesting idea: what if I just create the latency myself by waiting between sendign the header and the entity. That led me to researching a bit more the python libraries and I ended up with some testing code like that:
    conn = httplib.HTTPConnection("localhost", 8000, None, 20)
    params = urllib.urlencode({'value' : 'N:12:34:56:78' * 20000})
    conn.putrequest("POST", "/tutu")
    conn.putheader('Content-Type', "application/x-www-form-urlencoded")
    conn.putheader('Content-Length', str(len(params) * 2))
    conn.putheader('Accept', "text/plain")
    conn.endheaders()
    time.sleep(4)
    conn.send(params)
    time.sleep(4)
    conn.send(params)
    response = conn.getresponse()
    self.assertEqual(response.status, 200)
    data = response.read();
    conn.close()
    self.assertRegexpMatches(data, ".*Robin.*");
    The two sleeps are really to figure out whether the handler is called once or twice. With that, I managed to test what I wanted.


Now, you can run it yourself and figure out how many times the request body handler is called. No, I'm kidding, I'll tell you. Here is the sequence as I saw it in my favorite debugger:
  1. The header arrives, nginx fires the content handler.
  2. The content handler registers the request body handler and returns.
  3. The first chunk of the entity arrives: none of the two handlers is called.
  4. The second (and last) chunk arrives
  5. The request body handler is called.
Conclusion: the request body handler is called only once...

This is fairly different from the behavior with small entities:
  1. The whole request arrives, nginx fires the content handler.
  2. The content handler registers the request body handler.
  3. As a consequence, the content handler calls the request body handler.
  4. The request body handler returns.
  5. The request handler returns.

Now, something weird about this request body handler is that its signature forbids you to return anything. This is really weird as the "regular" content handler expects you to return a ngx_int_t (which is pretty much supposed to be the one returned by ngx_http_output_filter()). The only way to ship back an error is by writing the appropriate header/body. The problem is: what can you do if the ngx_http_output_filter fails? Or returns NGX_AGAIN? There is no way to guarantee a module will not return NGX_AGAIN.

That's where we are going tomorrow...

No comments:

Post a Comment