nginx discovery journey: Day 28 - POST body buffers and first look at logging

You remember, yesterday I mentioned that when you are coding a request body handler you should loop through all the request->request_body->bufs. What I did not tell you is what is in there and why there is more than one buffer. It all boils down to the fact that nginx tries to have the most minimal impact on the planet by saving resources and doing as little as it can. That plus the fact that the network does pretty much what it wants.

To make my point as clear as I can I decided to walk you through an example of what happens when you force the network to do things unnatural. To do so, I wrote a test case:

def test_POST_show_buffers(self):
    conn = httplib.HTTPConnection("localhost", 8000, None, 20)
    params = urllib.urlencode({'value' : 'N:12345:678'+'678'*300})
    conn.putrequest("POST", "/tutu")
    conn.putheader('Content-Type', "application/x-www-form-urlencoded")
    conn.putheader('Content-Length', str(len(params)))
    conn.putheader('Accept', "text/plain")
    conn.endheaders()
    conn.send(params[0:6])
    time.sleep(1)
    conn.send(params[6:15])
    time.sleep(1)
    conn.send(params[15:])
    response = conn.getresponse()
    self.assertEqual(response.status, 200)
    data = response.read();
    conn.close()
    self.assertNotRegexpMatches(data, "Robin");

You got the idea: I want to send the body value=N:12345:678678678... with 678 being repeated 301 times (that's actually value=N%3A12345%3A678678678... after encoding the parameters) in three parts with a little pause between them to make sure the network layer sends them separately.

First part is value= and goes with the headers
Second part is N%3A12345
Third and last part is N%3A678678678...

My nginx being configured with maximum logging (i.e.

error_log  logs/error.log  debug;

), I'll walk you through the interesting parts of the log...

22:16:20 [debug] 5222#0: *22 accept: 127.0.0.1 fd:3

nginx accepted the connection. Note the *22 which is an identifier assigned to this connection. This is what you'll use to keep track of what's going on for this customer in the rest of the log. This is very convenient. I guess Igor coming from a sysadmin background thought of making sysadmins' lives easier, a thing developers tend not to do (as a counterpart, he did not make developers' lives easier but that's another story). So, with this *22 we look a bit further:

22:16:20 [debug] 5222#0: *22 http process request line
22:16:20 [debug] 5222#0: *22 recv: fd:3 168 of 1024
22:16:20 [debug] 5222#0: *22 http request line: "POST /tutu HTTP/1.1"
22:16:20 [debug] 5222#0: *22 http uri: "/tutu"
22:16:20 [debug] 5222#0: *22 http args: ""
22:16:20 [debug] 5222#0: *22 http exten: ""
22:16:20 [debug] 5222#0: *22 http process request header line
22:16:20 [debug] 5222#0: *22 http header: "Host: localhost:8000"
22:16:20 [debug] 5222#0: *22 http header: "Accept-Encoding: identity"
22:16:20 [debug] 5222#0: *22 http header: "Content-Type:
application/x-www-form-urlencoded"
22:16:20 [debug] 5222#0: *22 http header: "Content-Length: 921"
22:16:20 [debug] 5222#0: *22 http header: "Accept: text/plain"
22:16:20 [debug] 5222#0: *22 http header done

And we see that nginx read 168 bytes (for those of you who don't know, recv is the system call to read data from a file descriptor: file descriptor 3 in our case which is the one for the socket resulting of the accept system call we saw earlier). 168 bytes were enough to get the request line plus all the headers. Now, looking even further, we find:

22:16:20 [debug] 5222#0: *22 http client request body preread 6
22:16:20 [debug] 5222#0: *22 http read client request body
22:16:20 [debug] 5222#0: *22 recv: fd:3 -1 of 915
22:16:20 [debug] 5222#0: *22 recv() not ready (11: Resource
temporarily unavailable)
22:16:20 [debug] 5222#0: *22 http client request body recv -2
22:16:20 [debug] 5222#0: *22 http client request body rest 915

nginx knows it has already read 6 bytes. Which is exactly the size of the first part of the body (value=). And, as you can see it knows that it still needs 915 more bytes to read the rest of the body. This is when it figures out that it won't be able to fit the rest of the 1024 buffer it allocated for processing the request line plus its headers. But it would be a shame to throw away all this good memory. So, it keeps the first 6 bytes as the first buffer for the request body and creates a new one for the remaining 915 bytes. A bit later:

22:16:21 [debug] 5222#0: *22 http run request: "/tutu?"
22:16:21 [debug] 5222#0: *22 http read client request body
22:16:21 [debug] 5222#0: *22 recv: fd:3 9 of 915
22:16:21 [debug] 5222#0: *22 http client request body recv 9
22:16:21 [debug] 5222#0: *22 http client request body rest 906

This little web server is good at math: it received the second part (9 bytes), copied them in the second buffer and knows that it needs only 906 more to be done. And finally:

22:16:22 [debug] 5222#0: *22 http run request: "/tutu?"
22:16:22 [debug] 5222#0: *22 http read client request body
22:16:22 [debug] 5222#0: *22 recv: fd:3 906 of 906
22:16:22 [debug] 5222#0: *22 http client request body recv 906
22:16:22 [debug] 5222#0: *22 http client request body rest 0

We are done. And with your favorite debugger you can see that for once request->request_body->bufs is not NULL.

As you noticed, I had to go through quite some effort to exercise this situation, but it can happen. And it gave me an opportunity to walk you through some logging...

Friday, February 25, 2011

Day 28 - POST body buffers and first look at logging

No comments:

Post a Comment