Tuesday, April 19, 2011

Day 46 - how to debug location in nginx

Last time I told you about the different kinds of locations in nginx. You've been a good student, understood everything and started putting together the perfect configuration for your perfect web site. Now you have this huge configuration file and you don't get why this damn location is not being hit. The coffee machine ran out of coffee 3 hours ago and, unlike me, you don't have a fig tree to sit under and think about it (by the way, you should really consider buying one... ;)). Instead, take a deep breath and read this post.

If you ask for help on the mailing list, the response you'll get is in two steps:

  1. Show us your full config (you cannot imagine the number of people who send just a snippet and at the end realize the problem is "somewhere else").
  2. Look at the debug logs.

And we are going to do exactly that, with a bit of commenting... For those of you who want to experience it (and not just read lazily from their couch), here is how you turn on debugging log. And here is the configuration we are going to play with (server part is left as an exercise to the reader or just copy/paste this into the default config, it should work):

    location /foo {
        echo foo;
    }
    location /fuu {
        echo fuu;
    }
    location /fuu/bar {
        echo fuu-bar;
    }
    location = /foo {
        echo foo-eq;
    }
    location ~ /foo/bar {
        echo foo-bar-regexp;
    }
    location @/foo/bari {
        echo foo-bar-internal;
    }
    location ^~ /foo/bar {
        echo foo-bar-starts-before-regexp;
    }

Now, let's see what happens with GET /foo/bari. The logs look something like:

18:58:48 [debug] 12569#0: *1 http request line: "GET /foo/bari HTTP/1.1"
18:58:48 [debug] 12569#0: *1 http uri: "/foo/bari"
18:58:48 [debug] 12569#0: *1 http args: ""
[...]
18:58:48 [debug] 12569#0: *1 test location: "/fuu"
18:58:48 [debug] 12569#0: *1 test location: "/foo"
18:58:48 [debug] 12569#0: *1 test location: "/bar"
18:58:48 [debug] 12569#0: *1 using configuration "/foo/bar"

There is already a whole lot of information here so let's have a closer look. The most obvious is of course using configuration "/foo/bar". nginx tells you which configuration it is using (/foo/bar). First thing to notice is that it does not tell you which type of location it is using. This is kind of annoying actually (yes, I'm a big nginx fan but positive criticism is allowed, even for a russian web server ;)). In our example we have two "/foo/bar" locations. Which one is nginx referring to ? Well, here it's fairly easy to tell: the one it did test. ;)

That's where the test location lines become handy. Let's have a look at them. nginx started testing the URI against the "/fuu" location and figured it did not match. So, it went for the "/foo" location which matches. But it did not stop and went for the "/bar" location. Hold on, there is no "/bar" location ! You see, you should buy a fig tree... ;) More seriously (although this has to do with trees), what you are seeing here is actually nginx walking down the red-black (or binary) tree where it stores the locations. So, here is what our configuration translates into, tree-wise:

 [/fuu](s)--*
   |  \\    
   |   \\
   |   [/bar](s)--*
   |     |  \\
   |     |   \\
   |     *    *
   |
 [/foo](s,e)---*
   |  \\     
   |   \\
   *   [/bar](s)---*
         |  \\
         |   \\
         *    *

First, I would like to take a moment to admire the beauty of this. It deserves to enter the hall of fame of ascii-art (at least ;)). Trust me, it looks better than the (open|libre)office presentation. But it still deserves some kind of comments/legend:

  • The [] denotes the "test" for a node in the tree. If you're much into reading code, this is the name field of the ngx_http_location_tree_node_s structure. Based on the result of comparing the URI with this name, you will end-up going left, going right or staying at this node in the tree (may be going down the sub-tree).
  • The () denotes that this node is also a location. It can be of two different types:
    • s for "starts with", also known as inclusive (which is a pointer to the location configuration structure) in the source code.
    • e for "exact", also known as exact (a pointer to the location configuration structure) in the source code
  • - and | denote respectively the left and right branches that grow out of the node. Basically, if what you are testing is "before" the content of [], you go right (or |). If it is "after", you go left (or -).
  • \\ denotes the path (also known as the tree in the source code) you should take if you matched exactly and still have some URI left-over.
  • * denotes an empty node. Think of it as /dev/null, or better: don't think at all. ;)

Now, first thing of interest is that there are only 4 nodes and 5 locations here (/foo is both "starts with" and "exact"). Now, this makes perfect sense and confirms that this red-black tree is used only for static locations (no "regexp locations" and no "named locations"). Also note that there is no special mark for "starts with before regexp" locations. It also means that there is no way to have a location with the same value be both "starts with" and "starts with before regexp" (if you don't trust me, try it for yourself).

Now let's get back to looking at our logs and at the walk down the tree for GET /foo/bari:

  1. nginx took right at [/fuu]
  2. Then it tried [/foo] which happened to be a match.
  3. So, it went straight (neither left nor right). Unfortunately there is no indicator of this in the logs, so you have to look at the test and the URI to figure it out. :(
  4. First node in the "sub-tree" is [/bar] which is tested (logs say so). And it matches.
  5. So, nginx tries to go straight but the "sub-tree" is empty and it stops here (hence the "using configuration /foo/bar").

Easy, no? ;) So, let's have a look at other requests for the fun of it. GET /foo gives the following logs:

18:58:48 [debug] 12569#0: *2 test location: "/fuu"
18:58:48 [debug] 12569#0: *2 test location: "/foo"
18:58:48 [debug] 12569#0: *2 using configuration "=/foo"
  1. Test "/fuu". It doesn't work, so take a right.
  2. Test "/foo". It matches.
  3. There is nothing left in the URI and there is an "exact" location (denoted e) at this node. So, it's a final match.

Notice how nginx is a good boy and tells us it found an exact location (the = before /foo).

GET /foo/ba is another interesting case:

18:58:48 [debug] 12569#0: *3 test location: "/fuu"
18:58:48 [debug] 12569#0: *3 test location: "/foo"
18:58:48 [debug] 12569#0: *3 test location: "/bar"
18:58:48 [debug] 12569#0: *3 test location: ~ "/foo/bar"
18:58:48 [debug] 12569#0: *3 using configuration "/foo"
  1. Test "/fuu" and take a right as usual (by now you should know the way ;)).
  2. Test "/foo" and go straight.
  3. Test "/bar" which does not match. So, stop here.
  4. Give a shot at the "regexp" location (which fails).

First regexp location test. Now, the real question is "why haven't we seen it before?". Well, GET /foo was an exact location match. In such a situation there is no point in trying regexp. GET /foo/bari was a "starts with before regexp" location, so once it is found to be a valid location for the request there is no point in trying the "regexp" locations. You've noticed how the "before regexp" part doesn't show in the tree above. This truly reflects the code: it is a property of the location configuration structure (namely the noregex field) that regexp should not even be tried if the location matches. Now that you know why we haven't seen it before you also know why we are seeing it now: the "/bar" test did not work, so the matched location after walking the tree is the "/foo" location which is a simple "starts with" (noregex = 0). Therefore, nginx tries the regexp locations. A useless attempt here, so the configuration actually used is "/foo", just like the logs tell us.

To try the regexp and see what it does, let's use GET /not/foo/bar:

18:58:48 [debug] 12569#0: *4 test location: "/fuu"
18:58:48 [debug] 12569#0: *4 test location: ~ "/foo/bar"
18:58:48 [debug] 12569#0: *4 using configuration "/foo/bar"
  1. Test "/fuu" and for once, let's take a left.
  2. Ooops, it's a dead-end. So we stop walking the static locations tree. And we have not found a matching location. :(
  3. nginx gives a shot at the "regexp" location which, lucky us, matches.

Bottom line, the logs tell us which configuration is used: "/foo/bar". Too bad there is no way to figure out this is the "regexp" location and not the "starts with" one... :( Yes, it is very unlikely for regular expression to look like a "normal" URI. But one is enough to have you spend hours trying to figure out what happens.

Hope you enjoyed this and for those of you who care, I have a little exercise: what is the algorithmic complexity of finding the location that applies to a request? Feel free to post your responses so we can start arguing who's right ans who's wrong. ;)

No comments:

Post a Comment