More PGP internals - Partial Body Lengths

related: pgp , gpg , encryption , libsimplepgp

Today we shall continue our saga into the dark depths of implementing PGP.

The OpenPGP spec, RFC 4880, is full of little oddities and interesting design choices.  Each is probably supported by heaping mounds of discussions by big players in the crytographic circles – but those explanations are pretty hard to find now.  One decision, in particular, is the subject of today’s post:  Partial Body Length Packets.

First, a quick explanation: OpenPGP is a standard that describes “messages”, and each message is made up of a bunch of “packets”, and the packets have a type, and the type of the packet determines the data within it.  Basic stuff.

Each packet also has a length.  Again, basic stuff.  You determine the length of the packet from a few bytes in its header, and then you know where it ends.   UNLESS , that is, it’s a partial body length packet.

A partial body length tells the parser: “I know there are at least N more bytes in this packet.  After N more bytes, there will be another header to tell if how many more bytes to read.”  The idea being, I guess, that you can encrypt a stream of data as it comes in without having to know when it ends.  Maybe you are PGP encrypting a speech, or some off-the-air TV.  I don’t know.  It can be infinite length – you can just keep throwing more partial body length headers in there, each one can handle up to a gigabyte in length.  Every gigabyte it informs the parser: “yeah, there’s more coming!”

That alone is not a problem.  The problem is that the headers it inserts are stuck in the middle of the packet’s data.  And that probably wouldn’t be too big of a problem either, except that packets are nested.  Yes, packets contain packets which contain more packets.

As it turns out, GnuPG uses partial body length even for encrypting small files.  It goes like this: the contents of a file are placed in a Literal Data Packet, which is compressed into a Compressed Data Packet, which is encrypted into an Encrypted Data Packet.  The Encrypted Data Packet has a partial body length header, and the Compressed Data Packet may have them, too.

This is a pain in the ass to reverse.  It means you can’t just decompress the contents of the Encrypted Data Packet, because somewhere in the middle of it are these damned partial body length headers.  They aren’t part of the compressed data, and they will screw everything up if you don’t take care of them.

Since packets can be nested quite deeply within each other, it makes sense that the higher level packets would take care of removing their own partial body length headers.  Otherwise each packet, as it is being interpreted by the parser, would have to consider how many levels deep it is, and find all higher level packets that used partial body lengths, and determine if any of its parents have stuck headers in the middle of it.  Nasty.

So this needs to be taken care of at the level of the packet that’s using them.  Your options for dealing with this seem to be one of:

  • Make duplicate copies of packet data in RAM.  This is simple, but it restricts the maximum size of data you can handle.
  • Implement some sort of fancy stream I/O system that has an input stream and an output stream, and interpret as much data as you can as it streams in – including reaching into nested packets before you have finished parsing their outer container packets.

A solution that treats both inputs and outputs as streams and doesn’t wait for packets to be complete before it starts parsing them seems to be the only way to handle large files, such as encrypted disk images, or these theoretical infinite streams of encrypted data.

GnuPG apparently implements a complex streaming I/O system.  From what I can tell, it pulls data out of an input stream and either pushes it out to the output stream if it can, or pushes it back onto the input stream with *filters* – it implements decryption, encryption, and decompression as stream filters.  Very fancy.

I don’t implement streams in libsimplepgp, so you’re restricted to what fits in RAM.  For encrypted+compressed content, you need a copy of the full packet, plus a copy of the decompressed packet.  No disk images, please.