Creating a negative match RegExp object

It is a frequently asked Perl question how to write a regular expression that matches anything but “something”. The usual answer is that it can’t be done or that you need to use the !~ operator instead of the =~ operator. But when you are using API’s that passes regular expressions around as objects this might not be entirely helpful answers.

By digging a bit around in the scarier parts of the perlre manual page I worked out this quite Perl specific solution: qr/(?:(?:something(*COMMIT)(*FAIL))|.)*/.

I am not sure that it works in all possible corner cases and I certainly don’t want to write the much needed comment that makes the code using the regexp maintainable.

In the end it might be possible, but it is just one of those “Now you have two problems” scenarios.

5 Comments »

  1. Philipp Hahn said,

    September 3, 2012 @ 12:22 pm

    In Python (and vim) there are also the (negative) zero-with matches:
    Python/Perl: ^(?!something).
    vim: ^\(something\)\@!.
    They have some limitations on their own, but make some regular expressions a lot easier.

  2. Peter Pentchev said,

    September 3, 2012 @ 12:41 pm

    Hm, if you have looked at the perlre page and you are using the Perl-specific “?:” extension anyway, why not just bite the bullet and use “?!” to do it? qr/(?!something)/ should do the trick…

    Or is “?:” more portable than I’m aware of?

  3. Peter Makholm said,

    September 3, 2012 @ 1:18 pm

    Actually the regexp qr/(?!something)/ matches the string “something”. Try it out:

    $ perl -E ‘”something” =~ /(?!something)/g and say “pos: ” . length($`)’
    pos: 1
    $

    The zero-width negative look-ahead assertion matches at any position not followed by the string “something”. In this case it matches after the “s” which is obviously not followed by “something”.

    So in any case we need an heavier hammer for that nail.

    Replacing non-capturing parenthesis with ordinary capturing parenthesis is not important and most modern regexp engines support the basic look-around assertions. But I don’t think there are widespread support for anything like the verbs perl provides to control back-tracking.

    A solution only using simple look-around would be preferable as it would be quite portable modulo small variations. But I don’t think I have seen such solution.

  4. Alexander Hartmaier (abraxxa) said,

    September 8, 2012 @ 7:49 am

    Why not !~ /something/ ?

  5. Peter Makholm said,

    September 9, 2012 @ 9:00 am

    Because “!~ /something/” is not an object you can pass around in a API.

    This just shows that regular expressions is a bad primitive for use in defining API’s. But in reality many API’s are defined using regular expressions as the standard way to match strings.

RSS feed for comments on this post · TrackBack URI

Leave a Comment