Tuesday, March 29, 2011

Day 43 - module config and auto/feature

Looking at other people modules gave me an opportunity to look closer at the config file they are providing. Most of them (especially the modules written by Igor himself) use the file auto/feature to figure out whether a library is available on your system and to figure out the directories where it's deployed. This way, it will work under the various flavours of Linux (Fedora/Redhat/CentOS or Debian/Ubuntu) or under the BSDs of the world.

Good thing is that I have my rrd-nginx-module handy. And I did not do a very good job on the config file. So, now is a good time to review/refactor this and do somethign nicer. Let's first have a look at what I have in there for now:

ngx_addon_name=rrd-nginx-module
HTTP_MODULES="$HTTP_MODULES ngx_http_rrd_module"
NGX_ADDON_SRCS="$NGX_ADDON_SRCS $ngx_addon_dir/ngx_http_rrd_module.c"
CORE_LIBS="$CORE_LIBS -lrrd_th"
CFLAGS="${CFLAGS/-Werror/-Wno-deprecated-declarations -Werror}"

Yes, this is ugly. But it works (at least on my Fedora Core 14) and there is even worse than that: it's nto int he right place: it's in the src directory whereas the "standard" is to put it in the root of the module. I told you: never paid much attention to this part of the process... But before we start improving this, let me justify/explain what I'm doing:

  • ngx_addon_name appears during the configuration process to let you know that your configuration script was invoked:
    configuring additional modules
    adding module in ../rrd-nginx-module/
     + rrd-nginx-module was configured
    
  • HTTP_MODULES is where you should add the name of the ngx_module_t variable defining your module that is exported by your code.
  • NGX_ADDON_SRCS is fairly easy: where your source code is.
  • CORE_LIBS adds the extra library to link with. I use rrd_th which is the thread-safe version of the RRDtool library.
  • CFLAGS are altered because I use the rrd_init, rrd_open, rrd_close and rrd_free which are deprecated. They are deprecated but the replacement function (rrd_info_r) does a lot more than what I need. I might change my mind about this (and use the non-deprecated function) but for the beauty of it, let's assume there is no other way and the CFLAGS MUST be altered.

Now, my first discovery was to realise that the nginx distribution comes with a file named auto/feature which is used by Igor's modules to display the lines like the following:

checking for system md library ... not found
checking for system md5 library ... not found
checking for OpenSSL md5 crypto library ... found

This little piece of code will actually try to compile and run a piece of code you provide to it and based on the result will set variables so that you can decide what to do. But the best is probably to look at an example:

ngx_feature='RRD library'
ngx_feature_name=
ngx_feature_run=yes
ngx_feature_incs='#include '
ngx_feature_path=
ngx_feature_libs='-lrrd_th'
ngx_feature_test='rrd_info_r("/tmp/invalid_rrd");'
. auto/feature

if [ $ngx_found = yes ]; then
    CORE_LIBS="$CORE_LIBS $ngx_feature_libs"
else
    cat << END

$0: error: the RRD module requires the RRD multi-threaded
library. You can either not enable the module or install the library.

END
    exit 1
fi

High-level, this creates a "temporary" C program with the content provided and tries to compile and run it. If everything went smoothly, the ngx_found variable is set to yes and the ngx_feature_libs is set to the appropriate options for linking with the library. All this is nice and nifty but if, let's say you want your module to be available for your buddy who's a "FreeBSD-only" dude it won't be enough. Why? Just because the way the RRD tool is packaged on FreeBSD is to put the rrd.h file in /usr/local/include. Which means that we should handle this in our script with something like:

if [ $ngx_found = no ]; then
    # FreeBSD port
    ngx_feature="RRD library in /usr/local/"
    ngx_feature_path="/usr/local/include"

    if [ $NGX_RPATH = YES ]; then
        ngx_feature_libs="-R/usr/local/lib -L/usr/local/lib -lrrd_th"
    else
        ngx_feature_libs="-L/usr/local/lib -lrrd_th"
    fi
    . auto/feature
fi

That works. But if you have a lot of friends, you have to do the same for NetBSD, MacOS, etc. A guy who probably had a lot of friends actually wrote a script to do that "automatically" for you. His name is Marcus Clyne and his tool is called the Nginx Auto Lib module (although this is not a module in the traditional sense of the term). I strongly recommend you RTFM (or rather you RTF README_AUTO_LIB): it is very good and it even documents all the ngx_feature_* variables supported by auto/config and by ngx_auto_lib_core.

I was honestly about to move to this method. The only problem is that it does not work with my Fedora 64bits. It checks too many things. In particular, it checks the path of the library that was actually used to build the test program and if it's not the one it expects it fails. So, in our case, the program compiles (and runs) because gcc looks for libraries in /usr/lib and /usr/lib64 (that's where it is found). ngx_auto_lib issues the following compilation command (reformatted for the pleasure of the eye):

gcc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -I /usr/include \
    -o objs/autotest objs/autotest.c -L/usr/lib -lrrd_th \
    -Wl,--rpath -Wl,/usr/lib ;
for l in rrd_th; do 
    o=\`ldd objs/autotest | grep /usr/lib/lib\$l\\.so\`;
    if [ ! \"\$o\" ]; then
        chmod -x $NGX_AUTOTEST;
        echo Linker does not link to correct version
    else
        chmod +x $NGX_AUTOTEST;
    fi
done

And if you do just the ldd, you'll realise that the RRD library is included as /usr/lib64/librrd_th.so.4 (on my system):

librrd_th.so.4 => /usr/lib64/librrd_th.so.4 (0x0000003a95200000)

Till Marcus fixes the problem, I won't make friends that are on the "exotic" platforms.

Day 42 - Setting up ngx_openresty : the missing directives

I told you, I'm trying to setup a test environment for Test::Nginx and to do so I tried to get the tests from all the modules in ngx_openresty to pass. In the previous episode we managed to get a lot through using the magic TEST_NGINX_IGNORE_MISSING_DIRECTIVES=1. Now, let's get rid of it and try to take care of the missing directives... First one to break is:

[emerg]: unknown directive "eval_subrequest_in_memory" in
.../ngx_openresty-0.8.54.2/t/servroot/conf/nginx.conf:30

So, we need to add the EvalRequestModule http://github.com/agentzh/nginx-eval-module (originally written by Valery , forked by agentzh).

cd bundle
git clone https://github.com/agentzh/nginx-eval-module.git
cd ..
./configure --add-module=../nginx-eval-module/
make

Unfortunately, even with this, testing fails with (only first failure, there are quite a few other):

#   Failed test 'TEST 1: eval - response_body - response is expected
# @@ -1,2 +1 @@
# -!!! [BEFORE
# +!!! [hi]
# -hi]
# '
#   at /[...]/lib/Test/Nginx/Socket.pm line 398.

It appears that for the eval module to work correctly, you need to add it in a certain order (i.e. before the modules you use in the eval directives). And by default when you add it amnually it's added at the end. The trick is therefore to disable the echo module (used in the eval directive) and to add it again after the eval one:

./configure --without-http_echo_module \
            --add-module=../nginx-eval-module/ \
            --add-module=../echo-nginx-module-0.36rc2
make  # From now on, I wont tell you to make just after configure

This solves the problem with the tests of the eval module. Next one to break is iconv. Actually if you look at the config file in the ngx_openresty-0.8.54.2 directory, you'll noticethe module is 'disabled'. All you have to do is to force it in:

./configure --without-http_echo_module \
            --with-http_iconv_module \
            --add-module=../nginx-eval-module/ \
            --add-module=../echo-nginx-module-0.36rc2

The module that caused me the most trouble was actually ngx_postgres. Not due to the module itself but due to the fact that it was my first PostgreSQL install and that their default authentication model is quite different from the ones (Oracle, MySQL, SQLServer) I'm used to (see below). Since, you might be as well, here is the recipe for Fedora Core 14 (should work on pretty much everything yum-based):

yum install postgresql postgresql-devel postgresql-server
service postgresql initdb
service postgresql start

Now, for the module tests to work, you need to create a ngx_test database on which the ngx_test user (with same password) has all permissions. With the standard installation of PostgreSQL the only way to log into the database server is to use the postgres user that was created by the package installation. And luckily this user can create other users and databases. So, from root, it looks something like:

su - postgres
psql # Should start the Postgres SQL client
create database ngx_test;
create user ngx_test with password 'ngx_test';
grant all privileges on database ngx_test to ngx_test;

Now, the database and users are created but there is no way you can connect to this user. A psql -U ngx_test -d ngx_test -h 127.0.0.1 will systematically return a:

psql: FATAL:  Ident authentication failed for user "ngx_test"

If you do really want, you can even look at the logs from nginx: it does not work. The problem here is that by default PostgreSQL authorizes only one authentication method: ident. Basically, we have setup a user with a password but we cannot connect with it because the database server does not accept password-based authentication. Luckily, this can be changed in the /var/lib/pgsql/data/pg_hba.conf file by changing the last field of the line for "IPv4 local connections" from ident to password. If you're interested, the explanations on the file format are here: The pg_hba.conf file. And if you're not, here is the mantra to make it work:

sed -i.bkp 's/\(127.0.0.1\/32 \+\)ident/\1password/' \
                 /var/lib/pgsql/data/pg_hba.conf
service postgresql restart

As a side note: at this point, all connections going through TCP/IP from localhost must be authenticated using password and not ident. As the postgres user has no password, you will never be able to connect to the database with something like psql -U postgres -h 127.0.0.1. Luckily, the simpler psql issued by the postgres unix user uses domain sockets and will get you through (just in case you would have to perform some admin tasks on the database).

At this point, things look better but the tests from ngx_postgres complain about:

[emerg]: unknown "remote_passwd" variable

This is actually provided by the ngx_coolkit module. So, from the bundle directory:

git clone https://github.com/FRiCKLE/ngx_coolkit.git
cd ..
./configure --without-http_echo_module \
            --with-http_iconv_module \
            --with-http_postgres_module \
            --add-module=../nginx-eval-module/ \
            --add-module=../echo-nginx-module-0.36rc2 \
            --add-module=../ngx_coolkit

At this point, the only failing tests are the tests from Maxim Dounim (which were not written using Test::Nginx). You can get rid of them with this:

rm -rf bundle/auth-request-nginx-module-0.2/t/ \
       bundle/upstream-keepalive-nginx-module-0.3/t/

And see the following command proudly return the "All test successful" we have all been looking for.

PATH="$PATH:build/nginx-0.8.54/objs/" TEST_NGINX_NO_SHUFFLE=1 \
      TEST_NGINX_LUA_PACKAGE_CPATH="$(pwd)/bundle/lua-yajl/?.so" \
      prove -r bundle/*/t

Friday, March 25, 2011

Day 41 - Setting up ngx_openresty (WAS: testing Test::Nginx)

A bit of background first. As I told you in yesterday's post, I started working on changing Test::Nginx (as a result of my frustration trying to move the tests I wrote to test my RRD module from my original "let's use python" approach to a more community-friendly "let's work with agentzh's Test::Nginx approach). But I was scared to break everything (especially with one of the features I want to introduce that changes the way requests and expected responses work). So, I did the smart thing (for once): tried to figure out a way to test what I was about to do. And more specifically to perform regression testing. But to do so, I needed more tests written with Test::Nginx than the ones I wrote myself for my RRD module.

I had read at some point in the mailing list that agentzh packages a bunch of "3rd party nginx modules" into ngx_openresty : Turning Nginx into a Full-fledged Web App Server. So, I figured agentzh would have all his modules there with all the tests. That sounded like a good place to look for tests for Test::Nginx. Actually, he confirmed it would be «the easiest approach though some of the dependencies may be missing"». He also said (in a follow-up email): «But we ourselves usually use the util/build.sh script to build separate nginx instance for the current module».

As dependencies scare neither me nor my good old pal yum I went for the first approach that would give me a simple way to perform a lot of tests (the ones from all the modules) in one go.

It all started with:

wget 'http://agentzh.org/misc/nginx/ngx_openresty-0.8.54.2.tar.gz' 
tar xvzf ngx_openresty-0.8.54.2.tar.gz
cd ngx_openresty-0.8.54.2/
./configure
make

Of course, I had to install a few things:

yum install readline-devel libdrizzlesee comments for why this is deleted

At this point everything pretty much looked fine (which honestly surprised me). So, I went for the ultimate in testing:

LD_LIBRARY_PATH="build/libdrizzle-root/usr/local/openresty/libdrizzle/lib:build/lua-root/usr/local/openresty/lua/lib" \
    PATH="build/nginx-0.8.54/objs/:$PATH" TEST_NGINX_NO_SHUFFLE=1 \
    prove -I ~/nginx/test-nginx/lib/ -r bundle/*/t

Let me explain a little bit what is going on here:

  • PATH="build/nginx-0.8.54/objs/:$PATH" makes sure you are using the nginx we just built (i.e. the openresty one).
  • TEST_NGINX_NO_SHUFFLE=1 is my personal favorite (I hate when I can't predict the order in which tests are run: it makes my head spin.
  • -I ~/nginx/test-nginx/lib/ is to use the Test::Nginx I'm working on (not a CPN installed one). You might not need this if using the CPAN version installed normaly.
  • -r bundle/*/t because it's where all the tests I'm interested in are (and some of them have subdirectories, so I want to recurse through that).
  • LD_LIBRARY_PATH="build/..." tells where to look for the drizzle and lua libraries (which are part of openresty). You might want to look at the comments to understand why I need this. This line being way too long, I won't mention it anymore.

At this point, the thing blew up in my face and I had to "fix" the obvious with: mkdir -p t/servroot and set TEST_NGINX_IGNORE_MISSING_DIRECTIVES to 1 to avoid the problems with the uninstalled modules. So, I started using something like:

PATH="build/nginx-0.8.54/objs/:$PATH" TEST_NGINX_NO_SHUFFLE=1 \
    TEST_NGINX_IGNORE_MISSING_DIRECTIVES=1 \
    prove -I ~/nginx/test-nginx/lib/ -r bundle/*/t

At this point, I started having tests that worked (not all of them, but most of them). I still needed a few more things:

yum install memcached redis
service mysqld start
service memcached start
service redis start

Also make sure you log into your MySQL DB and create the ngx_test user and database (for the tests of the drizzle module to work):

create user 'ngx_test'@localhost identified by 'ngx_test';
create database ngx_test;
grant all on ngx_test.* to 'ngx_test'@localhost;

That's when things got hairy: tests for ngx_lua refused to work. Lua is packaged with openresty but not everything in lua. Namely, you don't have yajl-lua (Yet Another JSON Library for LUA). So, as yajl-lua is not packaged as a rpm on my platform, I went (wait before you do the same):

yum install yajl yajl-devel cmake
cd bundle
git clone https://github.com/brimworks/lua-yajl.git
cd lua-yajl
LUA_DIR=../lua-5.1.4/src cmake .

Unfortunately this will not get you anywhere as it's version 1.0.7 of yajl that is packaged for Fedora Core 14 and you need at least 1.0.9... :( So, I also had to get yajl compiled:

cd .. #now you should be in bundle
git clone git://github.com/lloyd/yajl
cd yajl
cmake .
make
cd ../lua-yajl
LUA_DIR=../lua-5.1*/src CMAKE_PREFIX_PATH=../yajl/yajl-1.0.* cmake .
make

Still, test bundle/ngx_lua-0.1.6rc2/t/005-exit.t (and quite a few others) was failing because LUA could not find the yajl library. The only way I managed to fix this was by adding a new TEST_NGINX_LUA_PACKAGE_CPATH variable and change the test file itself:

sed -i.bkp s/lua_package_cpath.\*\$/lua_package_cpath\ \'\$TEST_NGINX_LUA_PACKAGE_CPATH\'\;/g bundle/ngx_lua-0.1.6rc2/t/005-exit.t
sed -i.bkp s/lua_package_cpath.\*\$/lua_package_cpath\ \'\$TEST_NGINX_LUA_PACKAGE_CPATH\'\;/g bundle/ngx_lua-0.1.6rc2/t/*/exit.t

Yes, the line gets out of the blog, so get a bigger screen. ;) And yes, this is a lot of backslahes (and the line gets out of the blog), but I never manage to remember what double-quote escaping does in bash. So, for me it's either single-quote escaping (and no expansion happens but you cannot have single-quotes) or single character escaping (with backslash) and as soon as a character looks weird, I escape it... YMMV, of course.

From this point on, I was running my tests with:

PATH="$PATH:build/nginx-0.8.54/objs/" TEST_NGINX_NO_SHUFFLE=1 \
     TEST_NGINX_LUA_PACKAGE_CPATH="$(pwd)/bundle/lua-yajl/?.so" \
    TEST_NGINX_IGNORE_MISSING_DIRECTIVES=1 \
     prove -I ~/nginx/test-nginx/lib/ -r bundle/*/t/

And to be completely honest, at this point, I suspect I'm running the LUA that comes installed with Fedora Core 14 (not the one from openresty). I'll try to check that at some point.

The good news is that at this point, there are not many tests that fail. The most obvious ones are the tests from auth-request and upstream-keepalive. So, I looked at the perl code and it doesn't look at all like tests for Test::Nginx, although the tests have a use Test::Nginx;. They are not using the data-driven paradigm and invoking Test::Nginx->new() (and there has never been such a thing as new on Test::Nginx). As both modules were developed by Maxim Dounim, I guess he is using a "different" Test::Nginx. Could be fun to rewrite those tests with agentzh's Test::Nginx just to see if its expressive power is enough for what Maxim wants to test...

So, that will be it for today (that was one hell of a long post) as we are at a point where the only tests that fail use a different Test::Nginx or are for unknown directives. Next step, I'll try to get rid of the TEST_NGINX_IGNORE_MISSING_DIRECTIVES=1 in my prove.

Thursday, March 24, 2011

Day 40 - Test::Nginx new features

I told you about my frustration with Test::Nginx in previous posts (Day 33 and Day 32) and managed to discuss them with agentzh (the module maintainer/author).

Here are the evolutions I offered to contribute:

  1. Be able to have multiple requests in each test. As a replacement for this, I had to use pipelined_requests and matching on the response.
  2. Better control over the stop/start sequence. The idea here is to restart the nginx server only when the config changes between two tests. With a TEST_NGINX_FORCE_RESTART_ON_TEST variable to preserve the current behavior.
  3. Improve the documentation of the possible test sections.

And I started working on the second one because first one is likely to have a HUGE impact on the code base and I'm scared to break everything. From what I can tell, all the modules developed by agentzh and his friends at Taobao/Alibaba extensively rely on Test::Nginx for testing. This includes the famous Echo module (very useful for debugging configurations) and heavy-lifting modules like the LUA module (embeds a LUA scripting engine in nginx) or the Drizzle module (non-blocking access to you MySQL DB, a must have if you want to scale like nobody else). And I wouldn't want to break their work.

So, I started easy with just adding the following:

  1. Support for environment variable TEST_NGINX_NO_NGINX_MANAGER (defaults to 0) which disables the nginx management code (stop/config/start). Very useful when you want to run tests on an already running NGINX (set TEST_NGINX_NO_NGINX_MANAGER to 1 and TEST_NGINX_CLIENT_PORT to the port your running nginx is listening on). Of course, this could be abused in every conceivable way (for example to test another web server ;)). As far as I'm concerned, the main purpose of this was to be able to run my tests on a nginx that I had started with debug on plus a few breakpoints).
  2. Support for environment variable TEST_NGINX_FORCE_RESTART_ON_TEST (defaults to 1). If you don't provide any config section in your test (or if it does not change between two successive tests), the nginx server will not be restarted (if TEST_NGINX_FORCE_RESTART_ON_TEST is explicitely set to 0, of course). This way, using TEST_NGINX_FORCE_RESTART_ON_TEST=1 with TEST_NGINX_NO_SHUFFLE=1 and having a config section only on the first test, you can have all tests use the same configuration. I find this very useful to avoid one of the annoying features of Test::Nginx, namely that it removes all logs between two runs. Hopefully, at some point in the future, I'll convince agentzh to have shuffling and force restart turned off by default.

While I was working on this, I realised the code doing the stop/config/start could use some refactoring/rewriting. So, I think this is one more thing to add to my TODO list regarding Test::Nginx.

For those of you who are interested, my commits are on the devel branch of Test::Nginx. Feedback and testing quite welcome (as usual with FOSS)...

Wednesday, March 23, 2011

Day 39 - GET parameters from your module

First, google analytics says that I got a visitor from Islamabad. So, I want to say hello to all our fellow pakistani nginx lovers. OK, there is only one for now, but I wasn't counting on seeing any visitor from there. It looks like nginx is getting truly international...


My RRD module is working fine now and I'm getting plenty of nice graphs. Unfortunately, they are all pretty boring. Looking something like that:

Most of the time I'm only interested with what happened in the last hour or so and that might not be very easy to read if the values from the last hour are very different from the ones in the last 24 hours. Like in the picture above: the values at around 1:30-2:00 make the latest values completely impossible to read. So, I decided to use a start parameter to be able to easily change the starting point on the time axis of my graph. That's already a parameter supported by the rrdtool graph command and all I needed was to parse it from the query string.


First thing to know is that nginx Core module already parses the request and makes the values available as $arg_PARAMETER (see arg_PARAMETER section of the HttpCoreModule documentation). So, really there is no point in doing the job twice. But there are unfortunately a couple of places to look at before you get the right way of doing this (or at least what I consider so far the best way to do it: I might change my mind at any point in the future):

  • request->variables sounds like a perfectly reasonable place to look at. Well, well, well. You see, the variables in here are not the kind of variables you are looking for. They are less "variable" variables ;) than the one we are looking for. Basically the variables in this list are all the variables that exist regardless of what the request is. Things like $scheme (also called the protocol: http vs. https) or $is_args (are there arguments in the query string). It does not include the dynamic variables like $arg_PARAMETER or $cookie_COOKIE because before parsing the request, it doesn't even know what the variables are going to be.
  • ngx_http_variable_argument function. Except that it's static and you cannot access it from your module. So, not a good candidate although it does exactly what we want.
  • ngx_http_arg is a good choice. Except that this is really low-level and just does the parsing. You have to do all the allocations required around it and I got sick of doing the memory allocations by myself (and even more of checking the results, but that's a different story). So, I did not go for this.
  • I went for ngx_http_get_variable which does probably too much for what I want to do (the variable can be one of the "dynamic" variables but it can also be one of the more static ones, it will use the right way to extract it). But it offers a nice simple interface easy to remember:
    ngx_http_variable_value_t *
    ngx_http_get_variable(ngx_http_request_t *r, ngx_str_t *name,
                          ngx_uint_t key)
    

So, I was left pondering what the key parameter could be. After a little bit of research I found out it is a hash of the name parameter. The two are made distinct probably for optimisation reasons. This way, the code inside ngx_http_get_variable doesn't have to recompute the hash on every call. As my name is really a constant, I figured out that I would play nice with this optimization and keep both the name and the key as constants:

static ngx_str_t ARG_START = ngx_string("arg_start");
static ngx_uint_t ARG_START_KEY = ngx_hash_key(ARG_START.data,
                                               ARG_START.len);

And that is exactly when the compiler started barking at me with initializer element is not constant. Not a nice guy this compiler, I tell you. Everything is constant from a logical standpoint but it doesn't like the call to a function in the initialization (it cannot optimize it to compile-time). Please note that it does not complain at ngx_string("arg_start") because this is actually a macro expanded by the pre-processor. So, I had to go for something like this:

static ngx_str_t ARG_START = ngx_string("arg_start");
static ngx_uint_t ARG_START_KEY;

static ngx_int_t ngx_http_rrd_init_process(ngx_cycle_t *cycle) {
    ARG_START_KEY = ngx_hash_key(ARG_START.data, ARG_START.len);
[...]
    return NGX_OK;
}

And here is the result with an extra ?start=now-5h (after fixing a few stupid bugs, of course... ;)):

Looks good, no?

Tuesday, March 22, 2011

Day 38 - reading code is good for you

I am no Richard Stallman (I shave, he does not ;)), but there is one good thing about open-source: you can read other people code. You can read it before you write your own and copy/paste. Or you can read it after, fishing for good ideas or better ways to do things. It's kind of like tutorials: you can read them before learning something or after. The benefit you get is not the same: if you read first, you'll get to the point where you want to be faster. If you read after, you might feel like you lost time trying to do things a certain way which wasn't the best one. The point is: there is always benefit to reading source code from others.

So, today I decided to have a look at two things:


mod_rrd_graph. Evan's approach to the integration of rrd and nignx is completely different from mine. In a few words:

  • My module assumes there is a RRD setup on a server (that might change in the future but that's another story) and gives you write and read access to it. With a very basic read access where you cannot specify anything in terms of data selected, colors, etc. My module tries to figure out something that makes sense.
  • Evan's module also assumes there is a RRD setup on a server and provides a completely customizable read access. Basically, you have as much power at your fingertips as you would with the command line: rrdtool graph.

The approaches are very different and when I realized it I was "well, it's so different, there is not much I'll get from reading his code". And I was wrong. You know why? Because I did not RTFM well enough. Evan is using the rrd_graphv function where I was using rrd_graph. The main difference is that with rrd_graphv rrd won't necessarily write to a temporary file. And that was quite a revelation to me. So, I spent most of my day getting rid of code: the code to implement the rrd_image_temp_path directive, code to create the temporary file, code to retrieve the temporary file information and make it a ngx_buf_t. Quite a lot of code flushed down the toilet. So, I was pretty happy to get rid of all this stuff but of course, I was not happy with myself for nto reading more carefully the RRD manual.

nginx-upload-module. All that did not leave me much time to read Valery's work. But enough to realize that this module is very peculiar in the sense that it completely bypasses (or rewrites) the standard nginx code to read a POST entity. My module (like most modules using POST, I guess) waits for nginx to tell it that it is done with reading the body from the client and that the data is available in the request->request_body buffers. Valery's module, after receiving the headers does not handover back to the http core module of nginx, instead it hands over to the event module. This is really hardcore, but the direct benefit (which is the whole point of this module) is that this way it can be notified as data packets arrive and take appropriate action (store it in the appropriate file and manage partial uploads and resumes). And I must say I find it pretty awesome that you can do something like that with nginx: completely rewrite a part of the thing. It's a proof that the design is very modular. Now, I suspect this might break some other modules but at some point you have to make a choice (and that's fairly easy since the configuration mechanisms pretty much let you turn on and off modules at the location level).

Monday, March 21, 2011

Day 37 - the end of the last buffer or should it be the other way round ?

One of the mysteries of nginx I still have not figured out (and I think there are a lot of them) are buffers. Those little things called ngx_buf_t. I got the basics: it's a way to point at an existing zone of memory (or file) that already exist. The main objective (as far I understand it) is to avoid copying stuff (especially big chunks of memory) around. If you look at the Development of modules for nginx I translated you will realise the important fields in the structure are:
    u_char          *pos;
    off_t            file_pos;
    u_char          *last;
    off_t            file_last;

    u_char          *start;         /* start of buffer */
    u_char          *end;           /* end of buffer */
Most of the time, all you'll do is:
  1. Allocate memory (let's call the pointer to this zone p)
  2. Create a buffer pointing to this zone:pos=start=p and last=end=p+size_needed
This is so common, there is even ngx_create_temp_buf doing that for you. So, most of the time you don't even realize that last and end are not the same and the naming probably doesn't help: there is usually nothing left after the end and nobody after the last one... ;) The thing is : with nginx, after the last comes the end.

Now, there is only one situation I saw end != last and that was with very specific buffers: the ones I crafted on Day 28 - POST body buffers... to show you that the body of a request could be split in two buffers. In this very specific situation, the first buffer is actually the one pre-allocated by nginx to read the request method, uri, headers, etc. So, the buffer end is set even before the request starts arriving and its last is determined by how much was read from the network. Hence, you end up with a buffer in which there is room between the last and the end.

Before I let you go with this revelation, I want to tell you about one more thing that surprised me recently regarding buffers. So, I was trying to migrate from my python testing to agentzh's Test::Nginx. As there is no support for multiple requests (in the traditional sense) per test in the current version of Test::Nginx (we're working on it and this might actually be my first contribution to this project, but that's another story) I used pipelined_requests to simulate this. pipelined_requests were intended to test HTTP/1.1 so they send all the requests in pretty much one go. And that caused my rrd module to crash. Why, would you ask ? Pretty simple: I was assuming the body of the request to end with the last of request_body->bufs->buf. And I was wrong! Here the buffer goes all the way up to the last byte read from the network, which in my test happened to be the end of the second pipelined request. So, my module was basically considering the body of request one to be body of request 1+request 2+headers of request 2+body of request 2. Needless to say, inserting this data in my RRD did not work. Of course, I fixed this by refusing to look beyond r->headers_in.content_length_n (the length of the body as announced in the request header for those of you who are not familiar with the nginx requests attributes yet).

One more thing (not related to buffers at all): today I tried to setup my nginx to run some php. I found tons of posts/articles/forums entries telling that PHP had to be recompiled for fastcgi to be enabled on a Fedora 14 (and others as well). I even found people pointing as repositories just for that. And, you know what? The standard install (yum install php) comes with fastcgi installed. Yes, it takes to look at the configuration used by the packagers with php -i. Yes, it takes to Read The Fucking Manual. But if you do so, all you have to do to run you php as a fastcgi server is: php-cgi -b 127.0.0.1:9000. And no need to say that this work perfectly with nginx FastCGI module.

Friday, March 18, 2011

Day 36 - nginx debug and valgrind from Eclipse

Today I'm breaking one of the rules of the blog and not telling you exactly what I did but rather telling you about something I have been doing over time, a little bit at a time. But I still think this is valuable to anybody who's really interested in learning the internals of nginx: debugging the code and running it through valgrind.

As I'm a lazy kind of guy, I use my favorite IDE (you know it's Eclipse) and that's what I'm going to tell you about. However, the principles apply if you are using more "standard" tools like gdb or the plain valgrind command. As a matter of fact, Eclipse integration of those tools is pretty minimal and quite often it is merely a preferences dialog box and a log parser.

When you want to put nginx under the microscope, there are two ways: the easy way and the hard way.

The easy way. I guess at some point Igor got sick of having to follow generation after generation of nginx processes to pursue a bug and decided that for development it would be easier to have only one process. Or maybe it is not the reason why but the fact and the matter is that if you set the following lines in your nginx.conf:
daemon  off;
master_process  off;
then the process that you start by typing nginx will actually be serving HTTP requests (or mail requests if you configured it so, but let's not even get there).
  • The daemon directive tells nginx to act as a daemon. So, the first nginx process spawns another itself, detaches it and commit suicide. This is the usual way of daemonizing a process
  • The master_process directive tells nginx to keep a "master process". This master process spawns the worker processes (the one actually handling the requests) and watches them. If one of them dies, it will receive a signal and restart a new one. As is mentioned in the documentation: this should always be "on" in production.
This being said, debugging or valgrindind with this configuration under Eclipse is nothing but pressing the big buttons. nginx acts like the good old "Hello world!" program.

The hard way. Sometimes (thank god, not too often) you are interested in what really happens in the forking/signal-handling process of the real thing. For example, I wanted to figure out if the init_process callback of the ngx_module_t was actually called in the first process, master process or worker process. Of course, you cannot figure this out if both daemon and master_process are set to off. So, you set them back to on and that's when things get dirty...

First of all, don't even think about using valgrind from Eclipse: as the valgrind page on the Linux Tools Project explains, the "Child silent after fork" cannot be unset. So, you are good to go back to running valgrind manually. This is probably no big deal: the leaks in code that is executed only when processes get forked are usually not big enough and don't happen often enough to give anybody any discomfort. Not a big loss, I would say.

Now, debugging is kind of tricky. By default, Eclipse will not follow forked children of a process you are debugging. However, since it uses gdb as its backend and gdb lets you change this behavior you can do it. Unfortunately the process is not that obvious (even after reading how to do it). So, I'll give it my own shot:
  1. Start your debug configuration like usual by pressing the little green bug.
  2. Eclipse stops at the beginning of the main function.
  3. This is usually a good time to tell nginx that you want to follow future children of the current process instead of sticking with the father.
  4. Click on the gdb process in the Debug View.
  5. In the console (which is now the GDB console), type set follow-fork-mode child and hit enter:
  6. You can confirm the setting with show follow-fork-mode
  7. From now on, whenever there is a call to fork or vfork the debugger will follow the child (and not the parent).

Of course, if you are only interested in the processing of an http request you usually don't need to go the hard way. If you still want to, you can attach your debugging session to the right nginx process (this tend to be the one with the highest PID, the other one usually being the master process). And unlike when you are trying to debug an Apache mod you don't have to figure out where your request is going to land as there is only one process handling your requests (of course you could change that in your configuration too, but given nginx resource consumption it's not something you're likely to do any time soon ;) ).

Now, all you have to figure out is when the init_process callback/handler is called.

Have fun.

Thursday, March 17, 2011

Day 35 - memory management, buffers and why you should use ngx_calloc

I don't remember my first C lesson very well, but I'm sure they told me then "watch out and initialize your variables". Most of the time I'm a good boy and do that. Sometimes you are just doing something without knowing exactly what you are doing. And today was one of those days (as Linus would put it, I probably had forgotten to take my medication).

So, let me tell you about it and may this be a lesson you learn, remember and (unlike me), recognize even when it's in disguise.

So, it all started with good news as I got my first RRD graph served by nginx in my browser. I would have made a screenshot (or just saved the file) but there is nothing really interesting there: just one red dot on a so typical RRD graph. It's funny how technology can fail to be impressive...

I was so happy with my nice graph and started putting more data in the RRD (I am trying to stop my bad habit to talk about RRD database: the D in RRD already means database...) to transform this magnificent dot into a red line. So, I put data in there, wait 5 minutes (the minimum step in my RRD), put some more data, etc. Then, I go back to my favorite browser, find the tab with my red dot and hit the Ctrl+R: blank page. I try a couple of times: same result. Now, I'm really pissed off: I go to my location bar, hit the Enter key and...I got the magnificent red line. At this point I felt like Alexander The Great after undoing the Gordian Knot. And just like him, I still did not know how to untie it. And I hate it when I don't understand things.

So I took a deep breath and tried to confirm the scenario that was causing the problem. So, I tried to run the same request twice with curl and wget. With both of them, things were fine. So, my browser was doing something special. A little wiresharking later, I found that on the second request my browser was sending an extra header:
Cache-Control:max-age=0
So, I tried with curl I managed to reproduce the problem: I was on the right track. All I had to do was to start nginx under a debugger and see what was the root of all evil. High-level, here is what happens:
  1. Browser sends its first request.
  2. ngx_http_rrd module gets invoked, creates the graph (in a temporary file), creates a buffer pointing to the temporary file.
  3. nginx sends the content of the temporary file, removes the file and makes all the memory that was used during the processing of this request (it was all allocated from the same pool) available for subsequent processing.
  4. I hit Ctrl+R
  5. Browser sends the request with the extra Cache-control
  6. ngx_http_rrd module gets invoked, creates the graph (in a temporary file), creates a buffer pointing to the temporary file but does not initialize it.
  7. nginx tries to send what is indicated in the buffer but this is corrupted and it ends up sending nothing

Now, why is the initialization problem showing only with the extra header? On the first request, memory is clean (i.e. full of \x0). The request fills some of this memory with its content, then the buffer allocation uses another "chunk" that has never been used before (and is therefore full of \x0). So, on the first request everybody is fine. If you replay exactly the same request, you are still fine as you set the same memory bytes as you did on the first run (metaphorically, you are walking in your own steps). The extra header (or actually pretty much anything else) breaks this nice balance and the buffer is allocated slightly higher in memory, ending up in a zone that is not full of \x0 which end up corrupting the buffer logic ad producing the unwanted result.

Now, I told you how stupid of me it was not to initialize correctly the buffer, but nginx API is not really making things easier: it is ngx_calloc_buf you should call, not ngx_alloc_buf. One little 'c' is all the difference there is...

This actually goes back to one of my pet topics when designing an API: if you cannot avoid publishing a "dangerous" API (or function), make sure the name conveys the "dangerosity" of this. In our example, I would have called the functions ngx_alloc_buf and ngx_alloc_buf_not_initialized, making the person using this function well aware of what he/she is doing. The first example of this kind of design I saw and loved was the _dont_implement_Matcher___instead_extend_BaseMatcher_ method of the org.hamcrest.Matcher class and this was a revelation to me (and not the only one from the guys at org.hamcrest).

Wednesday, March 16, 2011

Day 34 - rrd_graph needs ngx_create_temp_file, right values for ngx_command_t

So, to draw my pictures (the RRD graph) I need to call the rrd_graph fucntion with the arguments you usually give rrd on the command line. So, I need to create a temporary file for this. Luckily enough, nginx seems to have a nice function handy for that:
ngx_int_t
ngx_create_temp_file(ngx_file_t *file, ngx_path_t *path, ngx_pool_t *pool,
    ngx_uint_t persistent, ngx_uint_t clean, ngx_uint_t access)
  • The access argument is simple: it's what you would use as argument to chmod the temporary file.
  • The clean argument tells nginx whether it should clean the file when it does not need it anymore (which is usually at the end of the request). Interestingly enough the cleanup mechanism relies on the pool-cleanup feature and the pool argument. So, the cleanup could happen pretty much at any time you see fit provided you hand over the right pool
  • persistent seems to be atrick by which a file becomes effectively invisible to everybody else but its creator if this creator calls unlink just after calling open with O_CREAT
  • ngx_path_t is probably some kind of path but I actually got a headache trying to figure out how to create a ngx_path_t. So, instead of trying to figure it out, I decided to use the ngx_conf_set_path_slot function that lets you configure a path in a directive (the brand new rrd_image_temp_path directive in my case). I love it when everything falls in place...

While I was at it, I decided to review my good old rrd configuration directive. So far, it had been relying on ngx_conf_set_str to handle the string that indicates where the rrd db is. As a result the object stored in the ngx_http_rrd_conf_loc_t is a ngx_str_t. Unfortunately, most of the time, I need this object as a C-style (null terminated) string to call the rrd functions. Therefore, I decided to drop using ngx_conf_str_t and to perform the "conversion" to c-style once and for all at configuration time.

After that I tried to put together the new directive. And, once again, I tried to be smarter than I should have. Once again, it backfired... :( Here is the story...

The new directive had no reason to be limited to the location scope. It could as well be in the server or even the main scope. So, I decided to configure the corresponding ngx_command_t accordingly:
{ ngx_string("rrd_image_temp_path"),
      NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_TAKE1234,
      ngx_conf_set_path_slot,
Note, that pretty much summarizes all I said so far: directive is rrd_image_temp_path; it is allowed at main, server and location scopes; it takes 1 to 4 arguments which are handled by ngx_conf_set_path_slot (what I call the "ngx_path black magic").

And then, I had to figure out what should be the other fields of the structure:
ngx_uint_t            conf;
      ngx_uint_t            offset;
      void                 *post;
  • post was easy: no post-processing, so use NULL.
  • offset was the offset where the variable would be in my configuration object. So something like: offsetof(ngx_http_rrd_loc_conf_t, rrd_image_temp_path).
  • conf was a bit trickier and got me wondering. As the directive could be up to the main scope, I set it to NGX_HTTP_MAIN_CONF_OFFSET. And as Yoda would put it "That is why you fail"...

But before realising I had failed, I wrote my nice little configuration merging function:
static char *
ngx_http_rrd_merge_loc_conf(ngx_conf_t *cf, void *parent, void *child) {
    ngx_http_rrd_loc_conf_t* plcf = parent;
    ngx_http_rrd_loc_conf_t* clcf = child;

    if (ngx_conf_merge_path_value(cf, &clcf->rrd_image_temp_path,
                              plcf->rrd_image_temp_path,
                              &ngx_http_rrd_temp_path)
        != NGX_OK)
    {
        return NGX_CONF_ERROR;
    }
    return NGX_CONF_OK;
}
Pretty easy as long as you use the "out-of-the-box" tools of nginx like ngx_conf_merge_path_value.

Except that after getting the thing to compile, nothing was working. My nginx server would not even start. And after quite a bit of research I figured the culprit was the conf. What nginx expects here has absolutely nothing to do with where the directive is allowed (this is handled in one place only: the type field of ngx_command_t) it has to do with where the data is stored: the location configuration structure, the server configuration structure or the main configuration structure. In our example, as we want to be able to set this to the location in certain situations, the data must be in the location configuration structure. So, the right value is NGX_HTTP_LOC_CONF_OFFSET. So, with conf and offset, nginx is able to figure out where exactly it should put the data it read from the configuration file.

That's all folks.

Tuesday, March 15, 2011

Day 33 - Test::Nginx, pipelined_requests, raw_request and other testing tricks

Hello and welcome back to this terrific blog... ;) First and foremost for thos who care: snow was not good but weather was awesome. Now, back to the treacherous slopes of Nginx and more specifically one of its minions: Test::Nginx.

Previously on nginx discovery, I told you I was about to move from my python-based testing to agentzh's Test::Nginx. I was half-way through and the other half was nto the easiest one, so I figured I would share with you the pitfalls I ran into (just to avoid you reading through the perl code like I had to.

URL encoding. Yes, I am a lazy guy and I refuse to commit to memory the hexa code of : (that is if I remember that : should be url encoded, which I tend to forget as there is only one in urls, just after the http). So, when I write a test, I like to keep my arguments simple (e.g. N:12345), not crippled with percent signs (e.g. N%3A12345). Which is not that simple when your test framework is purely declarative. Now, that's where the dynamic aspects of a language like perl comes handy: you can use a perl expression in you data (instead of a regular constant). To be correct, this is actually possible because it is supported by both the language and the testing framework. Here is what it looks like:
=== TEST 4: POST
The main case (when everything submitted is GOOD and the
DB should be updated).
--- config
    location /rrd/taratata {
        rrd /var/rrd/taratata.rrd;
    }
--- more_headers
Content-type: application/x-www-form-urlencoded
--- request eval
use URI::Escape;
"POST /rrd/taratata
value=".uri_escape("N:12345")
--- response_body_like: go round.*Robin
--- error_code: 200
The magic lies in request eval which basically says that the content of request should be applied (or filtered by as the FILTERS section of Test::Base would say) the eval function before being handed over to the tests runner. This is actually a handy feature that I had to dig out of Test::Base. You can apply any kind of perl function on this. Gives you the power of perl without breaking the "declarative" paradigm.

One test, multiple requests. As I mentioned before, one of my python tests was specifically built to show that once a RRD request was in error, all the following requests would be too. It went like:
  1. POST with a correct value: response is OK
  2. POST with a bad value: server barks out with a message
  3. POST with the same value as in step 1: response is OK
The only way I found to do so is to use the pipelined_requests. Here is the relevant snippet:
--- more_headers
Content-type: application/x-www-form-urlencoded
--- pipelined_requests eval
use URI::Escape;
["POST /rrd/taratata
value=".uri_escape("N:12345"), "POST /rrd/taratata
value=".uri_escape("N:whatever"), "POST /rrd/taratata
value=".uri_escape("N:12345")]
--- response_body_like: Robin.*Problem.*Robin
--- error_code: 200
This is not strictly equivalent to what I was doing for a couple of reasons:
  1. The requests are all performed on the same TCP connection (from what I can see, pipelined_requests was intended to test HTTP/1.1 cases).
  2. Checking the results is not as easy as it was in python (or as one could hope), so you end up testing only the last error_code and using a pattern Robin.*Problem.*Robin to make sure the responses were OK/KO/OK.
  3. The client sends all the requests in one gulp whereas with python I was waiting for request one to complete before sending request 2 etc. This is not a problem in our case because there is only one nginx process handling all the requests and they are handled sequentially. But this is not as natural as what I used to do with python.

Controling the buffers. I talked about the buffer problems you have to face/understand when dealing with POST requests and I even made a post about this: Post body buffers. There is a way to finely control what is sent by the test and when. But it takes you to manually craft the request yourself. Something most people don't like to do because it requires a good understanding of the HTTP protocol. On the other side, if you are trying to test nginx buffering special cases, there is a good chance HTTP doesn't scare you:
=== TEST 10: POST show buffers
Request specially crafted to produce 2 buffers in the body received
handler (one with the beginning of the entity read with the headers
and one with the rest of the body).
--- config
    location /rrd/taratata {
        rrd /var/rrd/taratata.rrd;
    }
--- raw_request eval
use URI::Escape;
my $val="value=".uri_escape("N:12345:678").("678"x300);
["POST /rrd/taratata HTTP/1.1\r
Host: localhost\r
Connection: Close\r
Content-Type: application/x-www-form-urlencoded\r
Content-Length:".length($val)."\r\n\r\n",
substr($val, 0, 6),
substr($val, 6, 15),
substr($val, 21)]
--- raw_request_middle_delay
1
--- response_body_like: Problem .*updating
--- error_code: 500
The two things worth noting here are the raw_request and raw_request_middle_delay. The first one provides the "chunks" that constitute the request where as the second one indicates how much the test should wait between two chunks.

Now, I managed to move all my testing to Test::Nginx. And there are still things I think should be improved. But one of the things I did not like in my previous post (namely the inability to run only one test) is solved by Test::Base (using --- ONLY to have it run only the specified test).

I'll try to chat with agentzh to see if there is any chance we could improve things.

Wednesday, March 2, 2011

Day 32 - Moving to Test::Nginx

I told you yesterday I was going to move to Test::Nginx (a perl module contributed by agentzh) to test my RRD module. So far, I had been testing it with Python and just wanted to move my tests (10 or so) to this testing framework.

First thing first, I tried to do a relatively clean install of the module for my distro. The problem is, of course that there is no Test::Nginx packaged for my Fedora Core 14 but there are a lot of packages for other CPAN modules Test::Nginx needs. So, it all ended up like that (as user root or with sudos, whatever suits you):
yum install perl-CPAN perl-Test-Base perl-Test-LongString
yum install perl-List-MoreUtils
perl -MCPAN -e 'install Test::Nginx'

And then, the nightmare started. I haven't touched perl since 1994. I don't remember very well, but I think that back then perl 5 was brand new and kind of the hot thing (a bit like Ruby nowadays). Just to tell you that I have been out of touch and that might explain some of my difficulties. But not all...

The first thing that took me off-guard is that there are actually 2 modules in Test::Nginx:
  1. Test::Nginx::LWP
  2. Test::Nginx::Socket
Except for the fact that Socket is non-blocking and that LWP is based on LWP, you cannot really tell the differences. So, I didn't even know which one to use. Piotr Sikora pointed me in the right direction with this: description of Test::Nginx in the source file. That is in the git repository but not in the CPAN yet.

I'll try to list the problems I ran into and how I fixed them.

How do you send a body ?. Test::Nginx::LWP has a request_body parameter but Test::Nginx::Socket does not. Some requests (like PUT) must have one. The trick is to actually put the body in the request_eval parameter. Maybe there is another trick but at least this one seems to work:
=== TEST 1: PUT is not allowed
--- config
    location /rrd/taratata {
        rrd /var/rrd/taratata.rrd;
    }
--- request_eval
"PUT /rrd/taratata

name=daniel"
--- response_body_like: support.*GET.*POST
--- error_code: 405

Headers. Second problem I fought with: how do you set a specific header. In my case I wanted to test that requests for POST with anything but application/x-www-urlencoded failed with the appropriate message. There, you have to use the more_headers parameter in something like:
=== TEST 3: POST bad content type
--- config
    location /rrd/taratata {
        rrd /var/rrd/taratata.rrd;
    }
--- more_headers
Content-Type: text/plain
--- request
POST /rrd/taratata
--- response_body_like: content type
--- error_code: 405

All this is pretty much lack of documentation of Test::Nginx and I'll get used to it and maybe even give agentzh a hand in documenting the stuff.

Now, a few things that really annoyed me:
  1. I could not find a way to run only a specific test. Right now my file has about 7 tests but I cannot find a way to tell it to test only one of them with prove. I tried redirecting the input in 10 different ways but it looks to me like all the tests in the file are run whatever you do. It was so easy in python (sniff): python test.py TestMethods.test_POST_BIG
  2. I have the same nginx config for all my tests. I would really love to write it only once (as opposed to putting it in every test). I cannot complain as I did not have this functionality in python. Room for improvement on the module.
  3. The nginx server stops and starts each time. Now, this plus the fact that the log file gets reset at each restart+the fact that you cannot select which test to run+the fact that tests are run in random order makes figuring out things a bit consuming. Here again, there is room for improvement, I think.

Now, there are still a few things I need to figure out:
  • How do you introduce delay when you want to send data as chunks ?
  • How do you get multiple requests in one test ? You see, I had this problem where running any request after an error resulted in an error. I was not calling rrd_clear_error, stupid me. But the fact and the matter is that I wrote a test for this in python and don't want to let it go in the migration.

So, the migration was not a breeze and I don't know if it was worth it. I guess I'll get used to it and it would really be not "open-source" to do my own testing thing in python when people already invested time and effort in another tool. So, I'll try to help improving this one.

Tuesday, March 1, 2011

Day 31 - ngx_str_t vs. char * requiem

That's quite a title, isn't it. I wonder how google indexer is going to like this. Anyway, that's not the point. The point is that I already wanted to tell you about this yesterday but got distracted by agentzh hints telling me to review my development chain...

As already mentioned before here (or at least in my translation of Valery's work: Development of modules for nginx), nginx uses its own version of strings. It can be tracked to a Pascal-like version of the thing because a string is basically a "chunk" of data and a length. So, in C, it gives something like:
typedef struct {
    size_t      len;
    u_char     *data;
} ngx_str_t;

On the other side of the ring, the current champion: the good old C-style '\x0' terminated chunk of data:
char *;

Now, why would Igor go through so much pain to reinvent the concept of string. I personally see two possible reasons:
  1. Save memory
  2. Save cpu cycles

God save the memory. I see eyebrows rising: is he out of his mind ? A pointer plus a size_t is always going to be bigger than just a pointer. Yes, but one bull is heavier that two frogs, isn't it ? Let me spell it out for you... Let's say you are parsing a file (it works with data coming out of socket, too). If you are a bit into optimization you actually loaded the file in memory with a mmap or something similar. Now, if you are using C-style strings, each time you want to store the configuration directive (the header and its value) you must make a copy of the original data and add a '\x0' at the end of it. At this point, you end-up with a process that is storing two versions of the same string in memory: one from the file and one with the trailing '\x0'. On the other side, with ngx_str_t you can point to the same memory area that you used to read the file and "limit" the size with the length parameter of your structure. It all comes down to how you are planning to use it. And when you are building a web server, being able to reduce the number of copies of the processed data is definitely a good idea.

God save the cpu cycles. I don't know about you, but I can tell you that I have seen a lot of C crippled with strlen which figures out the size of any given string at least 2 or 3 times without even noticing. And the buffer overflow exploits have made it worse. Once people realise they are using non-safe versions of the functions, they switch to the safe version: I'll grep all my strcmp and replace by strncmp. Don't get em wrong, its the right thing to do but most of the time, you end up with a bunch of extra calls to strlen as a result... You could argue that this is true of beginners and we all know Igor is no beginner. But he knows that one of the principles of the HTTP protocol is basically to "say" how long data is going to be before sending it. As there is a lot of length manipulation, I think having the length handy all the time when working with a string is definitely a good idea.

On the other hand, of course, whenever you have to interact with "traditional" C code, you have to convert it. But: this is a small price to pay. And this is also the reason why most of the string manipulation functions you know and love have their ngx_* counterpart. This way you don't have to convert before comparing two strings.

So, ngx_str_t looks like a winner (at least for the work at hand). But...

One Ring to bring them all and in the darkness light bind them. Where did you expect a geek to find his references talking to other geeks? ;) My personal favorite solution would have been to avoid deciding and bring together the best of the two worlds with an object that could present a c-style interface and a Pascal-like one. I haven't looked recently at the implementation of the string object in C++, but almost 15 years ago, they already had reference counting, length lazy-caching and copy-on-write. But, for some reason (don't ask me why, I have no clue), nginx is C although a lot of time with the callbacks and handlers and modules it feels object-like.