Archive for Perl

Private methods in Perl5

It is common knowledge that you can’t have private functions and methods in Perl5. But it turns out that you can do it, one way is to use namespace::clean. Using this module you can either declare all the names of private functions at the top or use a serie of non-obvious “use namespace::clean“, “no namespace::clean” calls.

Wouldn’t it be much nicer just to be able to write:


sub foo :Private {
    ...
}

You can, with my brand new Sub::Private module. It is actually quite simple:


use Attribute::Handlers;

use namespace::clean     qw();
use B::Hooks::EndOfScope qw(on_scope_end);
use Sub::Identify        qw(get_code_info);

sub UNIVERSAL::Private :ATTR(CODE,BEGIN) {
    my ($package, $symbol, $referent, $attr, $data) = @_;

    on_scope_end {
        namespace::clean->clean_subroutines( get_code_info( $referent ) );
    }
}

Putting the attribute handler in the UNIVERSAL namespace isn’t nice. I have to find a solution for that for the next version.

Comments (8)

Regular Expressions: Beyond the fundamentals

At this years YAPC::EU I’m giving a talk about some of the more advanced features of regular expressions.

Slides are available at http://hacking.dk/talks/yapceu2009/ – There is a pretty regexp validating dates (including February 29th) in it!

Comments (2)

Shell hacks: bash functions in files

Over the years I have collected a few shell snippets that have to be sourced or called as shell functions to work correctly. Having all these function definitions in a monolithic file sourced from my .bashrc isn’t cool, so for some time I’ve had the following in my .bashrc:

for i in $HOME/.lib.sh/* ; do
    source $i
done

This has at least one obvious downside. Each time I change a function definition I have to reload the function i each and every open shell if I want a consistent work environment. Today I came up with a nifty solution. Each file in ~/.lib.sh which previously contained a function definition isedited to only contain the function body. The I add the following new function called load_functions:

FUNCTIONS=$HOME/.lib.sh

eval $(
    find $FUNCTIONS -type f | \
        while read file ; do echo function $(basename $file)\(\) { source $file\; }\; ; done
)

For each file under ~/.lib.sh a new function is automatically defined which just sources the real file. This means I can edit the files without having to reload the function in every open shell. ~/.lib.sh/load_files itself is just sourced from my .bashrc.

One of my functions wraps the wonderful Perl5 module local::lib. This module makes it easy to manage multiple sandboxes with locally installed Perl modules. It works by setting a couple of environment variables including $PERL5LIB. Instead of using local::lib directly I used a function called perllibs:

# Defaults:
PERL_LOCAL_LIB="home"
DIR=$HOME/.perl

if [ -n "$1" ]; then
    PERL_LOCAL_LIB=$1
fi

export PERL_LOCAL_LIB

case $PERL_LOCAL_LIB in
    alpha)
        DIR=$HOME/projects/alpha/ ;;
    mercury)
        DIR=$HOME/subversion/mercury/perl ;;
    tmp)
        mkdir -p /tmp/makholm/perl
        DIR=/tmp/makholm/perl ;;
esac

eval $( perl -Mlocal::lib=$DIR )

Adding new projects to this list would be a hassle if I had to reload the file in multiple shells each time.

Comments

Why write an editor in Perl?

I’m a happy vim-user and I don’t see that change any time soon. But recently I have been hacking around with Padre, an editor writen in Perl and for Perl programmers. And even though I like my non-GUI editor I think Padre has one advantage: It is quite hackable inPerl – my favourite language for hacking.

One thing I wanted to play with was debugging from my editor. It is probably posible to write in vim-perl scripts, but having easy access to all the internals without language barriers made the task of writing a debugger plugin much easier and more importantly, much more fun.

So if you have ever thought about nice features you wanted while editing Perl, then please join the Padre project and keep hacking perl while improving you tools.

Comments (1)

The evil of a global $_

I often write code like this:

    $_->store() for @objects;

and quite often it actually works. But suddenly a piece of code doing this in a loop broke with this error message: Can’t call method “store” without a package or object reference. And quite right, sometimes @objects would contain plain integers.

Unfortunately it wouldn’t be quite easy to track down the relevant change, so enter the perl debugger. Declaring a watch on ‘@objects’ isn’t useful as it triggers each time @objects enters or leaves scope. But saving a reference to @objects in $my::objects and the watching $my::objects->[0] worked.

I had reimplemented the store() method using Data::Walk for walking some structure instead of doing it by hand. And Data::Walk sets $_ to the current node. A due to the aliasing implied in the for-modifier each element in my @objects array was garbled.

Two solutions: The store() method can localize $_ by adding local $_; before using Data::Walk – this works in legacy perl interpreters. The routine looping through @objects can make $_ a lexical variable by adding my $_; before the loop – This is a new feature in Perl 5.10.

An even more robust solution would be to have Data::Walk localize $_ itself AND use a lexical $_ in code where $_ is aliased to important data. See RT #47309.

Comments (2)

New version of the Padre debugger plugin

I’ve just uploaded a new version of Padre-Plugin-Debugger to CPAN. It has been stuck in github for a while as I meant to write some documentation and start calling it version 1.0. But as I haven’t written any documentation yet, it is still just a puny 0.3 version.

The major update is how the interpreter is called. Now it will actually find the modules you are using, if you add the correct directories to ‘Edit -> Preferences -> Run Paramerters’

Comments

Benchmarking serialization modules

First you have to optimize for correctness, then you can optimize for speed. At the moment I’m working on a project where I do a lot of serialization, and to ensure that I could debug the correctness I have chosen an easy readable serialization format: YAML.

But running Devel::NYTProf on my code showed that an awful lot of time was spend in the YAML code. Changing a few lines to use Storable instead showed an improvement in speed going from 1000s to 200s for my test data. This is a significant win even while developing, but of course I shouldn’t have used the old YAML-implementation to begin with.

Mentioning this to the local Perl Mongers group I got referred to a benchmark of different serialization modules by Christian Hansen (I think). I’ve updated the script and the interesting part is that JSON::XS outruns Storable on both tests.

Thank you Devel::NYTProf for showing me that 80% of my run time was spent in old pure perl serialization code.

Comments (2)

More fun with DESTROY

When I wrote about exception handling in Perl I mentioned briefly that DESTROY() methods should localize $? too.

Try this script:

#!/usr/bin/perl

package Foo;

sub new {
    my $class = shift;
    open my $fh, "-|", "cat /dev/zero";
    return bless { fh => $fh }, $class;
}

sub DESTROY {
    my $self = shift;
    close $self->{fh}
}

package main;

my $foo = Foo->new();
exit 42;

You would expect it to return with status code 42, but when I run it the status code is 13. Localizing $? in the DESTROY() method gives the expected 42 status code.

The most common usage of $? is explained in the perlvar manual page:

The status returned by the last pipe close, backtick (““”) command, successful call to wait() or waitpid(), or from the system() operator.

And the above destructor makes a ‘pipe close’ changing the value of $?. But the perlvar manual also mentions an secondary usage of $?:

Inside an “END” subroutine $? contains the value that is going to be given to “exit()”. You can modify $? in an “END” subroutine to change the exit status of your program.

But this doesn’t just holds for END block, it is true for any code run after the exit() invokation – including destructors.

And then to a mystery. Given the above Foo package it isn’t very surprising that this script have the status code 13:

use Foo;
my $foo = Foo->new();

But why does the following change make the script have the status code 0?

use Foo;
my $foo = Foo->new();
$? = 42;

Comments (1)

When perfection isn’t good enough

One of the many possible problems with using CPAN modules is that they often just implement the parts of the problems the author needed a solution for. But once in a while you run into the opposite problem: The module implements parts of a standard that you just don’t want.

Some time ago I had to parse parts of the WebDAV protocol. WebDAV properties are transmitted using XML with namespaces, which is one thing I think XML::Simple is particular bad for. So I turned to XML::LibXML (which is becoming my XML module of choice either way).

So, the WebDAV RFC have examples like this:

     <?xml version="1.0" encoding="utf-8" ?>
     <D:propfind xmlns:D="DAV:">

       <D:prop xmlns:R="http://ns.example.com/boxschema/">
         <R:bigbox/>
         <R:author/>
         <R:DingALing/>
         <R:Random/>
       </D:prop>

     </D:propfind>

Unfortunately XML::LibXML insists on namespace URI’s to conform to the URI specification, which DAV: doesn’t. Due to XML::LibXML’s perfection I’m not able to just use it. Solution:

sub escapeNamespace {
    $_[0] =~ s/(xmlns(?::\w+)?)="(?!urn|http)([^"]+)"/$1="urn:xxx:$2"/g;
    $_[0] =~ s/(xmlns(?::\w+)?)=""/$1="urn:xxx:nonamespace"/g;
}

I’m not quite sure that the second substitution is needed by the standard, but the Litmus webdav test suite needs it…

Comments (3)

Benchmarking is hard

Benchmarking is hard to do right. Recently Jason Switzer found an old benchmark of the smart match operator written by Michael Schwern. Jason Switcher comments:

If I were to publish a paper with such a gaping hole like that, it would never be taken serious. In each test, he’s generating a random number and a random character and storing the result! That’s not part of the test. The $needles should each be generated outside the timing loop. This is adding tons of additional instructions that are not considered part of the test. These results are basically useless and should be redone.

After doing his test, Jason concludes: “Those results are astonishing!”. Yes, it clearly shows that the grep solutions are 25% faster than using first. This is astonishing and surprising — and probably warrants an explanation.

Jason’s methodology consists of generating a single random array and needle for each piece of code he wants to benchmark. Running his benchmark on my machine shows that first is 15 times faster than smart match and about 28 times faster than grep. Of course, my random arrays might have been more biased against ‘first’ and randomly placing the needle at the start.

So Jason’s test are not to be taken seriously either. Even when benchmarking with random data, we have to benchmark all our code pieces against the same data and not just on one data set.

Update: I’ve been running some more benchmarks in the background. It seems that the general pattern is that ‘first’ from List::Util and any from List::MoreUtils is about 25% slower than grep, but with some spikes where ‘first’/'any’ is blazingly fast. Is being a native opcode that much faster than XS-code?

Comments (2)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »