Thursday, February 24, 2011

Day 27 - POST body in files and ngx_unescape_uri

I told you yesterday that I would talk about retrieving values form a POST for processing. As I'm a stand-up guy, here we go.

Back in the days when I started this journey, I told you buffers (ngx_buf_t) were interesting beast that were the foundation of this little web server. It's even more true than I thought and I don't think I'm at the end of my surprises with them. They are basically used for almost anything that involves processing data streams (and you do a lot of that in a web server). POST methods are no exception. Basically, whenever nginx is reading data from a client and making a HTTP request (ngx_http_request_t) from it, it is using buffers. And roughly, it all goes like that:
  1. Accept connection from client (SYN<->ACK/SYN<->ACK, if you see what I mean).
  2. Start reading data from client
  3. Parse client request (method, uri, etc.) and header lines as they arrive
  4. Store all that in the ngx_http_request_t structure
  5. When all headers are processed, call the content handler (there is actually a lot more going on at this point but if we start digging here we'll get lost, plus there is a lot I still don't know).
  6. At this point, the header_in field of the ngx_http_request_t structure points to a buffer that contains the "raw" headers as they were read from the client.

This holds true for all HTTP requests (GET, POST, HEAD, etc.). Now, as you know some HTTP methods accept bodies. It is the case for POST. However, the HTTP protocol being nice and everything has a header to tell the server how much data will be in the body. And nginx makes good use of this Content-length header and continues its processing with:
  1. Read all data in the body.
  2. Then trigger the request body handler (I told you at length about this guy yesterday).
  3. At this point, the body is available in request->request_body->bufs where request is a pointer to the ngx_http_request_t structure.
The attentive reader will have noticed the s at the end of bufs. Yes, there can be many buffers. And there are really two situations:
  • Either the body is really small and you have only one buffer.
  • Or, it is big and you have more than one
The point is: your code should loop through this chain and process all data there. There is one more thing you should be aware of: buffers can be files. And if your body is bug enough you can be sure they will be. I tell you because I read about this then completely forgot about it and looked stupid when I realised my nginx segfaulted when I was trying to access the memory of a buffer which was a "file buffer". So, you remember all this was to retrieve data encoded in "application/x-www-form-urlencoded". Of course, data has to be urldecoded. nginx is nice enough to help with the decoding (and the encoding) by providing a pair of functions: ngx_unescape_uri and ngx_escape_uri. Now, that's all good, except that they operate on u_char *, not on buffers. This means the body request MUST be loaded in memory. So, I went ahead and used ngx_read_file(). Unfortunately from what I can see this is blocking. This should not be a problem in my specific scenario because most of the time data will fit in memory and never go to disk. But this is against the principles of nginx. So, I made a note in my code (with a TODO) and will have a look at it another day. May be I'll find some inspiration with Valery's nginx upload module.

No comments:

Post a Comment