Fahhem's Blog – Cap'n proto alternative RPC API in the works

Cap'n proto alternative RPC API in the works

Having started a new project that requires RPC, but without the benefits (and restrictions) imposed by my previous employer, I've gone on a detour towards building another RPC layer. However, knowing that building the entire layer from scratch is both a waste of time for me and would just add noise to any community I sent this to, I decided to only create the missing parts and rely on existing technologies wherever I could.

Excuses

Now, I have many reasons for creating my own (read: yet another) RPC layer, but I'm sure you don't want to read them all, so feel free to skip this section.

First, there's the RPC schema declaration syntax itself. Looking around, I saw three major contenders for RPC systems in general, Protocol Buffers (google.protobuf), Thrift, and a newcomer Cap'n Proto (sometimes capnproto). When I saw capnproto's 'interface' was a first-class citizen (ie, it could be specified and passed back and forth across the wire), I was very intrigued. While it has some big claims on being infinitely faster than protobuf and that capabilities and pipelining can make computing travel back in time (intended to be tongue-in-cheek), I saw the real benefit in being the ability to send classes (and their methods) around.

I've written a post before on sending functions around, but that was both solely in Python and highly insecure (pickling code!). In capnproto, you can send an 'interface' around without necessarily pickling the actual code, only enough information to then call it on the sending machine. I saw many uses for this and actually started to think about my project's separate pieces in this manner. While I don't have to have/use this feature for the project, it's a natural fit and I'd highly prefer it.

Secondly, there's the actual serialization format. There are many of these around, likely hundreds of serious ones spread across all the languages I've heard of and tens of thousands of ad-hoc formats. But my constraints were that it had to already exist (there are seriously too many for me to write my own), it only had to be reasonable in speed and space efficiency (sending text format json isn't reasonably space efficient, for instance), support for Python at a minimum, C/C++ would be nice just in case, and it had to support some manner of sending arbitrary objects (for capnp's interfaces). Obviously, there's capnp's own format, but after mucking around with the python implementation and the RPC code, I was highly put off. Doing a simple RPC call was both very difficult and verbose, even in Python (since it was as close to a straight port of the C++ as possible). Therefore, I strayed away from it and haven't looked back.

I decided that a Python RPC API shouldn't be that hard to use. I didn't see a reason to lock anybody into any particular IO loop. Tornado, gevent, asyncio, etc are all great contenders and have their own pros/cons, but the biggest pro/con for any library is if it supports yours or at least allows you to use yours. If my project is using gevent and suddenly I need a new library and I see a great one that uses tornado, I'm either going to try porting the library or look for another. Converting a project to a new IO loop, especially if that wasn't part of the design initially, is almost as time-consuming as changing languages, so I'm not going to do it.

Lastly, there's the connection setup and network layer. There are quite a few of these as well, even capnproto implemented its own. I decided I would like something like zeromq (or nanomsg, yami4, etc) that would allow high performance and zero copy if I ever needed it, since the project would eventually need to be high speed in certain areas. capnproto's network layer may have been acceptable if it was accessible, but it imposed its own IO loop and forced plenty of other things on you.

Implementation

If you skipped my excuses section, this is where you should land.

I eventually landed on using these libraries and some glue code to put them together:

capnproto's schema language
pseud as the connectivity and network layer
- tornado io loop, no real reason for it over gevent, except that I've never used tornado before and pseud lets me switch easily in case I need to.
- zeromq and msgpack as bundled with pseud, allowing me to focus on the schema stuff.

msgpack 2.0 and above include an ExtType object that is encoded as a signed integer (negative values reserved, so only 0-127 are available) and some binary data. I intend to use this to ship interfaces across the wire, where the data is some id that references the interface on the sender that the receiver can then send over with the method's name/id and arguments to do a call.

By gluing capnproto's schema onto pseud, I get the flexibility of pseud (and its underlying libraries) along with the schema specification from capnproto. This allows me to use the schema to compress my data (struct fields become integers, so msgpack can encode them into single bytes in most cases) and to inform my sending/receiving code in what to do with classes.

As part of this effort, I wanted to automatically generate the API from the capnp file, so I built a [small project that encodes the traversing of the schema in a header file](https://github.com/chainreactionmfg/capnp_generic_gen "Capnproto generic generator") that I can use in my code without worrying about /how/ I got to a type or a value, just that I did and that I will eventually traverse the entire schema. Now that that's done, I've been working on cara (Capnproto Alternative RPC API) itself and hope to have it working by week's end. Being that it's Thanksgiving week, I'll be doing some family stuff that will take most of my time (I'm writing this around 4am), but I still have hope!

Posted on: Wed 26 November 2014

Category: Software – Tags: capnproto, cara, python, rpc, schema