Regex Limits, or, Should You Read Mastering Regular Expressions?

By Xah Lee. Date: . Last updated: .

Should you read O'Reilly's Mastering Regular Expressions? Buy at amazon Every tech geeker will probably tell you that you should run to read it. It's a excellent book, and regex is a widely used tool, but there are hundreds of languages, tools, out there today. For a busy programer, you only have time to read a few books a year. Should it be this book?

On , David wrote:

Go read O'Reilly's Mastering Regular Expressions by Jeffrey Friedl. … good price, and explained a great deal.

I read the first edition in 1999. [see Perl Book Reviews]

As of , the latest is the 3rd edition in 2006. They dropped coverage on emacs regex.

In general, i don't recommend the book if all you need is to master a regex for practical coding. I recommend the book highly if regex research is part of your job. For example, you need to implement regex, or understand its history, theory, and available implementations.

The book gives a intro to the history and a bit of its original theory, but the large part is practical intro to regex engines as in unix grep, Perl, PHP, Java, “.NET”.

Regex is useful for matching simple words or phrases. When your need for text pattern matching is slightly more complex than phrases, such as parsing snippets of computer language source code, it quickly go beyond what regex is capable. For example, if your language contains nesting such as in lisp or HTML, XML, or if you frequently need to pattern match a chunk of text that span multiple lines, or you need to CORRECTLY search a pattern with many variations such as email address.

I've also came across a article that heavily criticize the book, and showing another regex engine that's much faster. [Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, etc) By Russ Cox. At http://swtch.com/~rsc/regexp/regexp1.html , accessed on 2012-08-30 ] (i haven't verified it or read it in depth)

Finally, any discussion of regular expressions would be incomplete without mentioning Jeffrey Friedl's book Mastering Regular Expressions, perhaps the most popular reference among today's programmers. Friedl's book teaches programmers how best to use today's regular expression implementations, but not how best to implement them. What little text it devotes to implementation issues perpetuates the widespread belief that recursive backtracking is the only way to simulate an NFA. Friedl makes it clear that he neither understands nor respects the underlying theory.

Also, today there's lots new techniques or tools for searching text pattern. One i recommend is Parsing Expression Grammar. There are 2 emacs draft implementations (on emacswiki.org), but both are hard to use and lack much documentation. (the “regular expression” we know today since unix grep of 1990s or earlier, is derived by happenstance from 4 decade old theory on parsing, based on then so-called theories of so-called automata)

If you need to use regex in emacs frequently, i just recommend reading the emacs info page on its regex in detail.

Similarly, if you need to use regex well in Perl, Python, PHP, i recommend their documentation. I have re-wrote the python one here: Python: Regex Reference .

If you wish to know some basic history and theory for curiosity, i recommend Wikipedia: Regular expression .

For some discussion on the limits of regex, see: Pattern Matching vs Grammar Specification.