I’ve had two java projects (simple multiplayer games) that relied on a byte-based connection-oriented protocol for communication.
In both cases I was unhappy with the implementation of the communication, since I couldn’t come up with an intelligent, non-verbose and object-orientied way of writing and especially parsing the bytes.
For writing I had something like
ProtocolDataUnitX pdux = new ProtocolDataUnitX("MyName", 2013); int[] bytes = pdux.getBytes(); out.write(bytes); // surrounded with try/catch etc.
That was acceptable to some extent, since I had an AbstractPDU
class with some byte conversion convenience methods. But I had to define the getBytes()
method
for every protocol data unit (pdu).
My approach for parsing the incoming byte stream lacked even more innovation.
private InputStream in; ... @Override public void run() { int c; while ((c = in.read()) != -1)) { if (c == 0x01) { // 0x01 means we have pdu #1 and can continue reading // since we know what is coming. // after we have all bytes and know the pdu // we can determine the paramters. I.e., every pdu has a // reverse constructor: bytes -> pdu }
QUESTION
How do you handle these situations? What are the best practises here? Some protocols have the total length field encoded, some not. Some protocol data units have variable length. Is there a reasonable approach here? Maybe some kind of schema definition? I don’t want to produce ugly and confusing code anylonger for this matter.
Advertisement
Answer
Summary: best practice is to use an existing, mature protocol compiler. Google protobufs is a popular choice.
Over the years, many protocol definition systems have been developed. Most of these include compilers which take a protocol description and produce client and server code, often in multiple languages. The existence of such a compiler is very helpful in projects which are not restricted to a single client (or server) implementation, since it allows other teams to easily create their own clients or servers using the standard PDU definitions. Also, as you’ve observed, making a clean object-oriented interface is non-trivial, even in a language like Java which has most of the features you would need.
The question of whether PDUs should have explicit length or be self-delimiting (say, with an end-indicator) is interesting. There are a lot of advantages to explicit length: for one thing, it is not necessary to have a complete parser in order to accept the PDU, which can make for much better isolation of deserialization from transmission. If a transmission consists of a stream of PDUs, the explicit length field makes error recovery simpler, and allows early dispatch of PDUs to handlers. Explicit length fields also make it easier to embed a PDU inside another PDU, which is often useful, particularly when parts of the PDU must be encrypted.
On the other hand, explicit length fields require that the entire PDU be assembled in memory before transmission, which is awkward for large PDUs and impossible for streaming with a single PDU. If the length field itself is of variable length, which is almost always necessary, then it becomes awkward to create PDU components unless the final length is known at the start. (One solution to this problem is to create the serialized string backwards, but that is also awkward, and doesn’t work for streaming.)
By and large, the balance has been in favour of explicit length fields, although some systems allow “chunking”. A simple form of chunking is to define a maximum chunk size, and concatenate successive chunks with the maximum size along with the first following chunk with a size less than the maximum. (It’s important to be able to specify 0-length chunks, in case the PDU is an even multiple of the maximum size.) This is a reasonable compromise; it allows streaming (with some work); but its a lot more engineering effort and it creates a lot of corner cases which need to be tested and debugged.
One important maxim in designing PDU formats is that every option is a potential information leak. To the extent possible, try to make any given internal object have only a single possible serialization. Also, remember that redundancy has a cost: anywhere there is duplication, it implies a test for validity. Keeping tests to a minimum is the key to efficiency, particularly on deserialization. Skipping a validity test is an invitation to security attacks.
In my opinion making an ad hoc protocol parser is not usually a good idea. For one thing, it’s a lot of work. For another thing, there are lots of subtle issues and its better to use a system which has dealt with them.
While I’m personally a fan of ASN.1, which is widely used particularly in the telecommunications industry, it is not an easy technology to fit into a small project. The learning curve is pretty steep and there are not as many open-source tools as one might like.
Currently, probably the most popular option is Google protobufs, which is available for C++, Java and Python (and a number of other languages through contributed plugins). It’s simple, reasonably easy to use, and open source.