Page 1 of 1

### Yet another way to design binary protocols (Python/C++)

Posted: Thu Oct 09, 2014 6:23 pm
I've been quite obsessed by forming & memory mapping of binary data structures in C/C++ for some time and wrote quite a lot about this topic in the past. So no wonder, I'm always eager to try new approaches in this area.

This time, I've come across a Python library called Construct, described as: a powerful declarative parser (and builder) for binary data. It's quickly proven to be a very powerful and expressive tool as I was able to rewrite an older binary schema (for Geometry) in just a while, without the necessity to involve any of the advanced techniques Construct offers.

The code snipped below shows the schema. Note the build_geometry wrapper, which builds data in two passes in order to obtain the total data size and individual offsets in the jump table. It's not the most elegant solution I could imagine but it's pretty straightforward. I especially like the concept of Anchor, which allows to capture the actual stream position, so one doesn't need to calculate offsets manually, which tends to be quite error prone. This is a huge helper!

Code: Select all

``````#!/usr/bin/env python

# /*
# (c) 2014 +++ Filip Stoklas, aka FipS, http://www.4FipS.com +++
# ARTICLE URL: http://forums.4fips.com/viewtopic.php?f=3&t=1205
# */

from construct import *

Geometry = Struct("geometry",
ULInt32("size"),
Const(Bytes("magic", 6), "FS-GEO"),
ULInt8("major_ver"),
ULInt8("minor_ver"),
),
Struct("jump_table",
ULInt32("vertex_format"),
ULInt32("vertex_data"),
),
Anchor("_anchor_vertex_format"),
Struct("vertex_format",
ULInt8("num_elems"),
Array(lambda ctx: ctx.num_elems,
Struct("elems",
Enum(ULInt8("type"),
float_4 = 0,
float_3 = 1,
float_2 = 2,
float_1 = 3,
uint8_4 = 4,
uint8_2 = 5,
uint8_3 = 6,
uint8_1 = 7,
),
Enum(ULInt8("semantics"),
position = 0,
color = 1,
normal = 2,
texcoord0 = 3,
texcoord1 = 4,
),
),
),
),
Anchor("_anchor_vertex_data"),
Struct("vertex_data",
ULInt32("num_bytes"),
Bytes("bytes", lambda ctx: ctx.num_bytes),
),
Anchor("_anchor_end"),
)

def build_geometry(container):
geom_data = Geometry.build(container)
geom = Geometry.parse(geom_data)
# 2nd pass: set size & offsets:
geom.jump_table.vertex_format = geom._anchor_vertex_format
geom.jump_table.vertex_data = geom._anchor_vertex_data
geom_data = Geometry.build(geom)
return geom_data

geom = build_geometry(Container(
size = 0, # set in the 2nd pass
magic = "FS-GEO",
major_ver = 1,
minor_ver = 0,
),
jump_table = Container(
vertex_format = 0, # set in the 2nd pass
vertex_data = 0, # set in the 2nd pass
),
_anchor_vertex_format = 0, # capture stream pos, set automatically
vertex_format = Container(
num_elems = 2,
elems = [
Container(type = "float_3", semantics = "position"),
Container(type = "uint8_4", semantics = "color"),
]
),
_anchor_vertex_data = 0, # capture stream pos, set automatically
vertex_data = Container(
num_bytes = 7,
bytes = "data...",
),
_anchor_end = 0, # capture stream pos, set automatically
))
print Geometry.parse(geom)
print "\nbinary dump:"
print " ".join("{:02X}".format(ord(c)) for c in geom)``````
and the corresponding output:

Code: Select all

``````Container:
size = 36
magic = 'FS-GEO'
major_ver = 1
minor_ver = 0
jump_table = Container:
vertex_format = 20
vertex_data = 25
vertex_format = Container:
num_elems = 2
elems = [
Container:
type = 'float_3'
semantics = 'position'
Container:
type = 'uint8_4'
semantics = 'color'
]
vertex_data = Container:
num_bytes = 7
bytes = 'data...'

binary dump:
24 00 00 00 46 53 2D 47 45 4F 01 00 14 00 00 00 19 00 00 00 02 01 00 04 01 07 00 00 00 64 61 74 61 2E 2E 2E``````
Just for the sake of completeness, here's a basic C++ implementation of a view class, which provides a convenient read-only interface to the underlying binary data.

Code: Select all

``````class geometry_view
{
public:

{
uint32_t size;
uint8_t magic;
uint8_t minor_ver;
uint8_t major_ver;
};

struct jump_table
{
uint32_t vertex_format;
uint32_t vertex_data;
};

struct vertex_element
{
enum type : uint8_t
{
float_4, float_3, float_2, float_1,
uint8_4, uint8_3, uint8_2, uint8_1,
};

enum semantics : uint8_t
{
position, color, normal, texcoord0, texcoord1
};

type type;
semantics sema;
};

explicit geometry_view(bytes_ref data);

const jump_table & jump_table() const { return *_jump_table; }
array_ref<vertex_element> vertex_elements() const { return _vertex_elems; }
bytes_ref vertex_bytes() const { return _vertex_bytes; }

private:

struct vertex_format
{
uint8_t num_elems;
const vertex_element elems;
};

struct vertex_data
{
uint32_t num_bytes;
uint8_t bytes;
};

const struct jump_table *_jump_table;
const vertex_format *_vertex_format;
array_ref<vertex_element> _vertex_elems;
const vertex_data *_vertex_data;
bytes_ref _vertex_bytes;
};``````
As you can see, it's pretty straightforward. Actually, the only piece of code that's worth mentioning is the constructor, which makes use of the jump table in order to initialize all the nested pointers.

Code: Select all

``````geometry_view::geometry_view(bytes_ref data):