Tuesday, February 1, 2011

Day 10 - nginx and regular expressions - a design failure to me

Today, I worked on the "regex" (regular expressions, of course) chapter and I am definitely disappointed. I was expecting this to have more "meat". As a matter of fact, nginx regexes are built on top of the PCRE library (if you read me since the beginning you could have guessed because my post Day 2 - building nginx... mentions this is a requirement). That they use it is fine with me. What I find disappointing is that nginx is wrapping an existing library into its own but is not adding anything to it. I can think of a number of reasons on how this happened but none of them is satisfying to me:
  • Igor thought he could do an awesome job on this subject too but, after spending a few months working on a better implementation, he realised this is even more hardcore than developing the fastest web server around. Regexes have been around for a long time. A lot of people worked on it and a lot of computer science/mathematical theory hides behind the problem and the implementation. If you don't believe me, just have a look at the article on Wikipedia regular expressions and explain to me what a Kleene operator is and how it is related to regexes. So, at some point in time he realizes is not going to do a better job than PCRE and decides to stick with this. Then he should have gone one step further and completely remove ngx_regex_t (which is now just adding complexity to the code and not bringing any value).
  • Igor wanted to add some flexibility and be able to switch from one library to the other. If that was his original intention, then I consider there is a problem with the API as it is designed. It fails to abstract some of the concepts that are tied to PCRE. More specifically, the ngx_regex_exec function has a captures argument which is clearly a copycat of the ovector argument of the pcre_exec family of functions.
  • Igor wanted to "hide" the complexity of the PCRE library (the man page for pcreapi is almost 2000 lines long). That, I can buy. But, if that was his original intent he failed to provide a simple API on the ngx_regex_exec: the captures argument should have been "translated" into something like an array of ngx_str_t. Just understanding how this parameter works took me at least half-an-hour.
Or...I completely missed something and in a few days/weeks/months/years, I will publicly apologize. But, at this point, I feel like the regex API provided by nginx is not bringing any value and it would make the world a better place just to get rid of it...

No comments:

Post a Comment