Exploring Serialization and Deserialization Methods in C++

I recently spent a few hours learning about serialization and want to share my thoughts on serialization and deserialization binary protocol development and discovery, as I understand them right now.

For a custom serialization, it comes down to one of two things:

  • Define data boundaries for dynamic data to discern the byte boundaries between variables and how many there are. It requires some complexity but is much more flexible to support different data types and as many as possible without a ton of wasted bytes.
  • Define static boundaries that data fits into, so I can not think as hard about variable sizes since they all fit in pretty much the same size. It is easier to implement but less flexible and has a ton of wasted byte space. 

First, taking a stream of bytes and somehow discerning what the data boundaries are for things is problematic to do dynamically. 

How to discern what is in a stream of bytes and create the classes again using those byte chunks… It is easy to talk about, difficult to implement well. Nevertheless, it is a cool topic, especially for binary serialization (smallest resulting dataset than XML or JSON serialization formats, for example).

I could easily do binary serialization for data sets with the same ordering of variables (4 bytes, then 16 bytes, then 4 bytes, then the rest is a string). That would be easy to do but tightly coupled to only that instance and can’t be reused. 

I look at the DNS protocol, for example, as binary protocol… it has static boundaries for the header and dynamic boundaries for the questions (the number of elements and static boundaries between elements). I like that and considering that way of thinking as well. The header would describe what is in a dynamic body.

I need a solution that supports 100% dynamic data.

Google’s Protobuf is pretty dope for binary serialization. It makes implementation easy once I get over the learning curve of how to serialize things properly.  Cereal also looks fairly awesome. However, I’m not considering Boost serialization because I don’t want to add Boost with Qt 5.

Anyway, I want a small-sized binary serialization protocol, so doing the diligence to see how serialization works and how different things have been implemented with them is required. Being lazy is just using a turnkey without considering what is actually happening behind the scenes.

Qt 5’s QDataStream is pretty dope, but it is effectively proprietary to Qt. I’d have to write custom C and C# code to decode/deserialize/serialize, so I might as well do it myself and handle byte order and junk. I’d rather use someone else’s solution since they likely figured out all the gotchas and worked through bugs.

Note: I didn’t spend a long time on this, but I was surprised at how interesting the serialization topic is. The magic isn’t mystical once you look under the hood. But, it is sort of what you’d imagine it would be if you implemented it from scratch… sort of… perhaps not exactly accurate—big brain serialization energy.