HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Protocol code to encode/decode various data types

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
protocoldecodetypescodeencodedatavarious

Problem

I need to create a protocol for sending data of various types over a socket connection. so I need to serialise the data to a byte stream. I only need signed and unsigned 32 bit integers, 64 bit integers, string and binary. It is only a start but if anyone can code review this I would really appreciate it.

```
/* custom tlv protocol. byte1=type, bytes2-5=size, remaining=payload
Following types:
int32_t //signed int 32 bit
uint32_t //unsigned int 32 bit
uint64_t //unsigned int 64 bit
string - char - int8_t
byte - unsigned char -anything binary

Everything uses TLV. each message must contain type as first byte

currently a message is one tlv but expand so a message is a linked list of tlvs
*/

#include
#include
#include

/ include stdint.h but as this is demo just use these here /
typedef unsigned char uint8_t;
typedef signed int int32_t;
typedef unsigned int uint32_t;

#ifdef WIN32
typedef unsigned __int64 uint64_t;
#else
typedef unsigned int64_t uint64_t;
#endif

enum data_type {DTYPE_S32 = 0, DTYPE_U32 = 1, DTYPE_U64 = 2, DTYPE_STRING = 3, DTYPE_BINARY = 4 };

struct tlv_msg
{
data_type datatype; / datatypes - 5 types /
/ payload stored in a union /
union{
int32_t s32val; / signed int 1 /
uint32_t u32val; / 2 /
uint64_t u64val; / 3 /
char strval; / 4 strings */
unsigned char binval; / 5 any binary data */
};

uint32_t bytelen; / no. bytes of union/data part /
};

size_t tlv_encode(data_type type, uint64_t input_length, tlv_msg* inputdata,
unsigned char** outputdata, size_t *output_length);

/ allocation/de-allocation of memory /
size_t alloc_encode(data_type type, unsigned char** outputdata, size_t datasize = 0);
size_t alloc_decode(unsigned char* inputdata, tlv_msg** msg);

void free_encode(unsigned char** outputdata);
void free_decode(tlv_msg** msg);

size_t tlv_decode(unsigned cha

Solution

If I were to see a point in converting data types to a message, that could only be to be able to send such messages across the network without having to perform any further interpretation of their internal structure. However, your scheme produces messages that still require interpretation before they can be sent: the type of the message has to be examined in order to figure out how to transmit its payload, since the message will contain the actual data in the case of fixed-length types, but a pointer to the data in the case of variable-length types.

I think it would make a lot more sense to construct your messages in such a way that they can be sent as binary blocks, without having to know anything more than their length.

A message is supposed to be nothing else but two things: a length, and a body of data. When you are thinking at the message level and writing functions that send and receive messages, you should not be concerned with the internal structure of the data contained within the messages. Therefore, the notion of a "data type" is not pertinent at this level.

At the level where you have just received a block of data of a known length, you can now start interpreting it in order to extract the payload from it. The first byte can contain the data type enum, while the remaining bytes can be the actual value.

-
In the case of fixed-length data types, assertions can be in place to ensure that the length of the data is equal to the size of the value plus one for the first byte which contained the data type enum.

-
For variable-length data types, the length of the value is the length of the data minus one for the first byte which contained the data type enum.

An added advantage of doing things this way is that you only have to make one memory allocation per message, not two, one for the message struct and one for the variable-length data contained therein. Fewer allocations equal reduced memory fragmentation, which can be extremely important if your application runs for a long time.

Another consideration you should bear in mind is whether your underlying transport mechanism offers efficient buffering or not. If, by any chance, it does offer efficient buffering, then you might gain performance by reading/writing bytes directly to/from the transport mechanism, instead of allocating and filling message structures.

So, I do not have specific points to address in your code, because in my opinion it needs to be completely re-written.

Next time you write it, please refrain from using hard-coded constants such as 4 where you should have sizeof(int32_t), 8 in some places where you should have sizeof(int64_t), 8 in other places where you should have a manifest constant called "BITS_PER_BYTE" (or BPB for brevity) etc. The reason we use manifest constants is not because their values might change, but because their names document our intentions and clarify the code.

Also, for fixed-length data types, it is a good idea to heed @Loki Astari's advice, and pay due attention to the endianness of your data: always use a specific known endianness for data that is meant to be shared between different machines, and what better choice is there than Network Byte Order.

Context

StackExchange Code Review Q#6315, answer score: 3

Revisions (0)

No revisions yet.