Regular expressions are a wonderful tool, but as many other powerful tools they tend to be misused. If I had a penny for everytime I recommended against using regular expressions for parsing HTML fragments …. yeah you know what I mean.
This time made me think a bit about my own use of regular expressions. Do I ever fall in the trap of misusing regular expressions myself? Looking at my current code base I couldn’t find any glaring misuses, if I could only get a list of all the regular expressions in my current project.
Regular expressions to the rescue? No, that would probably be quite an horrible adventure. Luckily I have tools to parse Perl: PPI to the rescue:
A few hundred regular expressions. Most of them just matching simple substrings (in some cases case insensitive) or just short hand for testing a handful of equalities in one go. The only zero-width assertions we are using are the word boundary and quantifiers are mostly used on simple character classes like \d, \s, [0-9a-f] and a few “anything but this one or two characters” ([^>]).
Lessons learned: 1) PPI is cool. 2) I really do use regular expressions as I preach.