Reviewing my own regular expressions

Regular expressions are a wonderful tool, but as many other powerful tools they tend to be misused. If I had a penny for everytime I recommended against using regular expressions for parsing HTML fragments …. yeah you know what I mean.

This time made me think a bit about my own use of regular expressions. Do I ever fall in the trap of misusing regular expressions myself? Looking at my current code base I couldn’t find any glaring misuses, if I could only get a list of all the regular expressions in my current project.

Regular expressions to the rescue? No, that would probably be quite an horrible adventure. Luckily I have tools to parse Perl: PPI to the rescue:

A few hundred regular expressions. Most of them just matching simple substrings (in some cases case insensitive) or just short hand for testing a handful of equalities in one go. The only zero-width assertions we are using are the word boundary and quantifiers are mostly used on simple character classes like \d, \s, [0-9a-f] and a few “anything but this one or two characters” ([^>]).

Lessons learned: 1) PPI is cool. 2) I really do use regular expressions as I preach.

2 Comments »

  1. Peter Makholm said,

    April 11, 2013 @ 10:16 am

    Improved script at https://gist.github.com/pmakholm/5362247 – still missing some documentation.

    Remember to search for Token::QuoteLike::Regexp too

  2. Andy Lester said,

    April 15, 2013 @ 12:55 pm

    That looks pretty handy. Thumbs up on code auditing.

    In case your readers aren’t aware of it, File::Next has a file_filter argument that does the filtering at the iterator level, so your don’t have to deal with filtering in the while loop.

    In your case, I would do this as:

    my $files = File::Next::files( { file_filter => sub { /\.(?:p[ml]|t)$/ } }, $basedir );

    and then remove the line that says

    next unless $file =~ /\.(?:pm|pl|t)$/;

RSS feed for comments on this post · TrackBack URI

Leave a Comment