Archive for Open source (english)

Splicing two sockets in Perl (request for help)

A couple of times I have written code which basically does this:

  1. accept() an incoming connection
  2. do some magic to get another socket
  3. start passing bit between the two sockets
  4. perform a correct close down sequence

An example use case of this is a proxy implementing the HTTP CONNECT method, but in for some known hostnames it will log a message and mangle the hostname before proceeding. This has been used as a legacy fall back solution while changing a network setup. But my uses are not restricted to HTTP proxies I have done the same for a few legacy protocols where the magic has been of different complexities.

The two final steps are quite general and it would be nice to have a module doing just that. Take two sockets and make it easy (or even automatic) to pass bytes between them.

The naïve non-blocking solution would use a scalar string buffer for each direction and perform a select loop while maintaining the write vector depending on which buffers contain data. I have written this code multiple times. In development this is usually quite successful, in production less so. While Perl might be quite suited for the magic in step 2, the naïve way of passing bytes have quite an overhead for the buffer management.

A less naïve way would use a array of strings for buffers, but I’m not quite sure if this would be a win in all cases. You might be able to get away with some string operations on the read side of the buffer, but it might be more expensive on the write side. I have not benchmarked this.

Most of the time I don’t care about Perl level IO handles. I know that there is a real C level file descriptor beneath. So an even better POSIX compliant solution might be to use XS to have plain C strings and use readv()/writev() and a iovec structure as buffer.

Can we do even better? At least on Linux we can. With the Linux splice() specific system call it is possible to us a pipe as buffer and never to have to copy data from and to user space.

I have not been able to find any off the shelf solution on CPAN. So I think I need to write it myself, but what would the nice and general API be? I guess the basic interface would be something like:

    my $chain = IO::Splice->new($fh1, $fh2);

    $chain->pump(); # read and write from both handles if possible and needed
    $chain->read($fh1);  # read to buffer from one specific handle
    $chain->write($fh2); # write from buffer to one specific handle
    $chain->can_write(); # returns the handles it needs to write to

but it might be simpler to have two callbacks for setting a file handle in write or no-write state:

    my $readset  = IO::Select->new( $fh1, $fh2);
    my $writeset = IO::Select->new();
    my $chain = IO::Splice->new( $fh1, $fh2,
        writable => sub {  $writeset->add( shift ) },
        unwritable => sub { $writeset->remove( shift ) }
    );

    while ( ... select ... ) {
        $chain->pump();
    }

As said, I think I have plenty of implementations of the naïve way but before releasing some code it would be nice to get some input on the API. But the best feedback would be a module that already have a usable API but might not implement the Linux specific way. That would allow me to steal the interface…

Comments (4)

Wrapping mod_perl in Plack

Plack is one of these wonderful adventures in the modern Perl world that makes it fun to write Web applications in Perl again. But I have a few applications written with Apache/mod_perl and they are not fun to work with. So what would you do?

One option would be to take the long road and port these apps to use Plack instead of messing around with Apache2::RequestRec. For this to work you might need to review the full code base before seeing any signs of progress.

Another option is to mock your Apache2::RequestRec object using Plack. This is the road explored by Plack::App::WrapApacheReq. With very little work this enables you to run your mod_perl application with any Plack handler you want. You can run your application as a stand alone server or serve it with nginx trough FastCGI.

But the fun doesn’t stop here. Debugging and profiling mod_perl have always been a PITA, but with Plack::App::WrapApacheReq it is easy. Just take the generic.psgi example and it enables you to run the Perl debugger easy or just to use NYTProf on your request handling code.

The initial ideas for writing Plack::App::WrapApacheRec came from my mocking code to test a legacy mod_perl application. Even this gets more fun by using Plack. I havn’t tested it yet, but with Plack::Test and Plack::Test::ExternalServer it should be trivial to run the same set of tests directly in a single process test suite or against your deployed server.

Plack::App::WrapApacheRec is still in it’s infancy. It only mocks as much of Apache2::RequestRec as I need to run a single legacy application and to run Plack::Handler::Apache2 (as an absurd example, but yes we are self hosting). But I think that with very little work we should be able to run most mod_perl applications. Take it out for a ride, if it complains about a missing method please report it with CPAN’s RequestTracker. Patches would be appreciated (or pull requests on github), but just a list of unimplemented method you need would be excellent.

Plack is fun, now working with mod_perl applications might become fun too.

Comments (5)

Reading Twitter with generic RSS reader

For a long time I have mainly been reading Twitter with a generic RSS reader (newsbeuter). Recently this stopped working when twitter.com disabled ‘HTTP Basic Auth’ in preference of using OAuth. Now I can say a lot of good and bad things about OAuth, but to keep a short story short: It doesn’t work with generic RSS readers.

Instead of getting the RSS feed from a URL, my RSS reader can run a script that prints an RSS feed. So if I could just write a script that does the OAuth dance to convince twitter.com to give me an RSS feed?

Perl to the rescue! Or rather CPAN to the rescue: Net::Twitter::Lite have a nice simple example for setting up OAuth with a desktop application. It also provides access to most of Twitters REST API, but no method for retrieving generic URLs.

Using some Jedi mind tricks and reading the source I found a private and undocumented method _oauth_authenticated_request which does exactly what I need. It is quite simple:

  1. Get my script from gist.github.com/585710 and install the dependencies (Net::Twitter:Lite).
  2. Register you own app at dev.twitter.com to get a Consumer Key and Consumer Secret.
  3. Run the script once to setup access and get an Access Token and Access Token Secret.
  4. Run the script with an RSS feed url as parameter to get the RSS feeds.

Most interesting feed to follow is probably http://api.twitter.com/1/statuses/home_timeline.atom, which show the same as the home page on twitter.com would show you, and http://api.twitter.com/1/statuses/mentions.atom, which shows any tweet that mentions you.

Comments (1)

New Module: Benchmark::Serialize

Tim Bunce mentioned my blogpost about Benchmarking serialization modules in a post on the perl5-porters mailing list. He wished that someone would make that benchmark into a distribution on CPAN.

How can I refuse. So here it is, Benchmark::Serialize is just uploaded to CPAN. (Might be some time before it appears).

Besides making the script into a module I also added a list of the size of the serialized data to the output. A replacement of the original script is available in the examples directory.

(I planned on naming the module Benchmark::Serialization, but my fingers slipped. Should I rename it?)

Comments

Benchmarking some ORLite variants

It would be nice if I could get my ORLite performance boost without really changing the (undocumented) API. So I got one more idea: Do the slicing in the ORLite generated code. It’s available in a new branch on GitHub.

To benchmark all three solutions I used a variant of CPANDB::Dependecy::csv():

sub csv {
    my $class = shift;
    for my $edge ( $class->select ) {
        my $foo = $edge->distribution . "\t" . $edge->dependency . "\n";
    }
}

My::Plain->begin;
My::Unsliced->begin;
My::SelfSlice->begin;

cmpthese( -30, {
    plain => sub { csv("My::Plain::Dependency") },
    unsliced => sub { csv("My::Unsliced::Dependency") },
    selfslice => sub { csv("My::SelfSlice::Dependency") },
});

Unfortunately it seems like both having DBI doing the slicing and doing it myself costs roughly the same:

            Rate selfslice     plain  unsliced
selfslice 1.61/s        --       -1%      -41%
plain     1.64/s        2%        --      -40%
unsliced  2.71/s       68%       66%        --

So I probably end up making some sort of ORLite subclass as Adam Kennedy suggested in a comment.

Comments

Optimizing ORLite.pm

While profiling some code making heavy use of Adam Kennedy’s ORLite module for accessing a SQLite database I found that most of my time was spend in DBI.pm.

Sorted by inclusive time (ie. including time spent in subroutines) two non-XS functions stood out: selectall_arrayref and fetchall_arrayref. Looking a the code both of these functions had a comment stating that a C implementation existed in Drivers.xst and the comment at selectall_arrayref further said that the Perl version is used as a fallback if a slice is given

So could I get away with the slicing?

Turned out to quite easy. Just use array refs as objects instead of the usual hash refs.

The run time of the select() method provided by ORLite went from 344µs/call to 162µs/call on average. And as my test data makes at roughly 100000 select() calls this is a quite noticeable speedup. All included my running time (under Devel::NYTProf) improved from 400 seconds to 340 seconds.

Unfortunately I have to be able to update the state of my ORLite generated objects. This was easy while the objects was blessed has refs. Array refs are not as easy to update. The easy solution was to make the simple non-fk accessors to be lvalue subroutines.

Comments (1)

HTTP::Engine is great

For simple web-based services I usually just use Perl and CGI.pm. After having read a bit about HTTP::Engine I tried it for a simple project yesterday. Beside my own logic it only took a few line of code to have a stand alone HTTP server for my service.

My colleague needed it to be served from the same Apache server as the rest of his webapplication. Some tiny changes and my stand alone server was transformed into a plain CGI script. When we going to deploy the script I’m guessing we make some tiny changes and have it running as a mod_perl module.

Try it for you next project! Even if you usually just use CGI.pm.

Comments (1)

Private methods in Perl5

It is common knowledge that you can’t have private functions and methods in Perl5. But it turns out that you can do it, one way is to use namespace::clean. Using this module you can either declare all the names of private functions at the top or use a serie of non-obvious “use namespace::clean“, “no namespace::clean” calls.

Wouldn’t it be much nicer just to be able to write:


sub foo :Private {
    ...
}

You can, with my brand new Sub::Private module. It is actually quite simple:


use Attribute::Handlers;

use namespace::clean     qw();
use B::Hooks::EndOfScope qw(on_scope_end);
use Sub::Identify        qw(get_code_info);

sub UNIVERSAL::Private :ATTR(CODE,BEGIN) {
    my ($package, $symbol, $referent, $attr, $data) = @_;

    on_scope_end {
        namespace::clean->clean_subroutines( get_code_info( $referent ) );
    }
}

Putting the attribute handler in the UNIVERSAL namespace isn’t nice. I have to find a solution for that for the next version.

Comments (8)

Awesome is awesome!

At work I have a dual-head setup with two screens. Previously it seemed to me that you had to choose:

  • Either you could set the screens up as separate entities in xorg.conf, which means separate workspaces but not being able to move windows from one screen to the other
  • or you set it up as one big virtual screen with you workspace spanning both screens.

I like workspaces and use them a lot on my primary screen, but on my secondary screen I mostly want my browser and my instant messaging (IRC). Some solves this by making the windows on the secondary screen sticky, so the appears on every workspace. That doesn’t work for me – I still want my worksapces on the secondary screen, I just want the workspaces to behave separately from the workspaces on the primary screen.

For a long time I thought this impossible, so I used the first option with separate entries in my xorg.conf. I havn’t really missed moving windows from screen to screen, but Firefox is a bother. I want to be able to have Firefox windows on both screen without having to keep two profiles in sync.

Awesome3 to the rescue. I finally upgraded my window manager to awesome3 and after using xrandr to configure my screes, awesome works out of the box as I need. X only provides one display, which means that I can move windows freely around and Firefox can open windows on both screens, but Awesome automatically gives me separated workspaces on the two screens.

Of course I had do make some changes in the standard configuration for awesome. Other menus, click to focus (remove the awful.hooks.mouse_enter.register call), enforce floating layout with a proper window placement (some changes in the awful.hooks.manage.register call). View my configuration on GitHub Gists.

Comments (1)

Regular Expressions: Beyond the fundamentals

At this years YAPC::EU I’m giving a talk about some of the more advanced features of regular expressions.

Slides are available at http://hacking.dk/talks/yapceu2009/ – There is a pretty regexp validating dates (including February 29th) in it!

Comments (2)

« Previous entries Next Page » Next Page »