Splicing two sockets in Perl (request for help)

A couple of times I have written code which basically does this:

  1. accept() an incoming connection
  2. do some magic to get another socket
  3. start passing bit between the two sockets
  4. perform a correct close down sequence

An example use case of this is a proxy implementing the HTTP CONNECT method, but in for some known hostnames it will log a message and mangle the hostname before proceeding. This has been used as a legacy fall back solution while changing a network setup. But my uses are not restricted to HTTP proxies I have done the same for a few legacy protocols where the magic has been of different complexities.

The two final steps are quite general and it would be nice to have a module doing just that. Take two sockets and make it easy (or even automatic) to pass bytes between them.

The naïve non-blocking solution would use a scalar string buffer for each direction and perform a select loop while maintaining the write vector depending on which buffers contain data. I have written this code multiple times. In development this is usually quite successful, in production less so. While Perl might be quite suited for the magic in step 2, the naïve way of passing bytes have quite an overhead for the buffer management.

A less naïve way would use a array of strings for buffers, but I’m not quite sure if this would be a win in all cases. You might be able to get away with some string operations on the read side of the buffer, but it might be more expensive on the write side. I have not benchmarked this.

Most of the time I don’t care about Perl level IO handles. I know that there is a real C level file descriptor beneath. So an even better POSIX compliant solution might be to use XS to have plain C strings and use readv()/writev() and a iovec structure as buffer.

Can we do even better? At least on Linux we can. With the Linux splice() specific system call it is possible to us a pipe as buffer and never to have to copy data from and to user space.

I have not been able to find any off the shelf solution on CPAN. So I think I need to write it myself, but what would the nice and general API be? I guess the basic interface would be something like:

    my $chain = IO::Splice->new($fh1, $fh2);

    $chain->pump(); # read and write from both handles if possible and needed
    $chain->read($fh1);  # read to buffer from one specific handle
    $chain->write($fh2); # write from buffer to one specific handle
    $chain->can_write(); # returns the handles it needs to write to

but it might be simpler to have two callbacks for setting a file handle in write or no-write state:

    my $readset  = IO::Select->new( $fh1, $fh2);
    my $writeset = IO::Select->new();
    my $chain = IO::Splice->new( $fh1, $fh2, 
        writable => sub {  $writeset->add( shift ) },
        unwritable => sub { $writeset->remove( shift ) }
    );

    while ( ... select ... ) {
        $chain->pump();
    }

As said, I think I have plenty of implementations of the naïve way but before releasing some code it would be nice to get some input on the API. But the best feedback would be a module that already have a usable API but might not implement the Linux specific way. That would allow me to steal the interface…

4 Comments »

  1. Salva said,

    October 18, 2011 @ 9:14 am

    IO::Socket::Forwarder does that proxy thing though it does not use anything like splice, mostly because it also supports SSL sockets that can’t be handled at the kernel level.

    If I had to do it nowadays I would probably do it on top of AnyEvent.

  2. Peter Makholm said,

    October 18, 2011 @ 12:43 pm

    I’ve pushed some code to https://github.com/pmakholm/IO-Splice-perl with a naïve implementation.

    Based on my short time looking at IO::Socket::Forward it looks like a implementation of the simple approach, but with an internal select loop. This means that it requires a process per open connection.

    I’m still a bit undecided about how much I like AnyEvent. Never the less I think that it would be quite easy to use my IO::Splice code inside the AnyEvent loop instead of a select loop.

  3. Paul "LeoNerd" Evans said,

    October 18, 2011 @ 2:54 pm

    I too would be quite interested in having a Perl-level wrapping of splice(2); this would make the simple stream-to-stream copy case a lot nicer in IO::Async. I’ve been pondering on making it nicely OS-portable by providing a stream-to-stream copy method on the Loop object itself; which can be provided at the base level as a simple read/writeability test plus sysread/syswrite. OSes can then provide nicer solutions using e.g. splice(2) on Linux. So having this wrapped nicely by Perl is a good start to that.

  4. Steffen Ullrich said,

    October 22, 2011 @ 9:12 pm

    To get it right it’s not that easy, even with AnyEvent.

    You have to handle one-sided shutdown (quite common with HTTP that client shutdowns writing after the request, but still waits for response). The sockets can only be fully closed after you got shutdown for both directions.

    You might want to handle RST from one side and forward it to the other side (some broken FTP servers do this sometimes).

    You might need to handle out-of-band data (when forwarding telnet or ftp).

    You need to handle the case, where writing is slow but reading is fast. So you have to stop reading until you can write or you might crash because out of memory.

    And there are probably some more cases.

RSS feed for comments on this post · TrackBack URI

Leave a Comment