Serial Programming/Forming Data Packets

< Serial Programming

Just about every idea for communicating between computers involves "data packets", especially when more than 2 computers are involved.

The idea is very similar to putting a check in an envelope to mail to the electricity company. We take the data (the "check") we want to send to a particular computer, and we place it inside an "envelope" that includes the address of that particular computer.

A packet of data starts with a preamble, some address information, some other transmission-related information, followed by the raw data, and finishes up with a few more bytes of transmission-related error-detection information -- often a Fletcher-32 checksum. We will talk more about what we do with this error-detection information in the next chapter, Serial Programming/Error Correction Methods.

The accountant at the electricity company throws away the envelope when she gets the check. She already knows the address of her own company. Does this mean the "overhead" of the envelope is useless ? No.

In a similar way, once a computer receives a packet, it immediately throws away the preamble. If the computer sees that the packet is addressed to itself, and has no errors, then it discards the wrapper and keeps the data.

Unfortunately, there are dozens of slightly different, incompatible protocols for data packets, because people pick slightly different ways to represent the address information and the error-detection information.

... gateways between incompatible protocols ...

Packet size tradeoffs

Protocol designers pick a maximum and minimum packet size based on many tradeoffs.

Start-of-packet and transparency tradeoffs

Unfortunately, it is impossible for any communication protocol to have all these nice-to-have features:

Some communication protocols break transparency, requiring extra complexity elsewhere -- requiring higher network layers to implement work-arounds such as w:binary-to-text encoding or else suffer mysterious errors, as with the w:Time Independent Escape Sequence.

Some communication protocols break "8-bit" -- i.e., in addition to the 256 possible bytes, they have "extra symbols". Some communication protocols have just a few extra non-data symbols -- such as the "long pause" used as part of the Hayes escape sequence; the "long break" used as part of the SDI-12 protocol; "command characters" or "control symbols" in 4B5B coding, 8b/10b encoding; etc. Other systems, such as 9-bit protocols,[5][6][7][8][9][10][11] transmit 9 bit symbols. Typically the first 9-bit symbol of a packet has its high bit set to 1, waking up all nodes; then each node checks the destination address of the packet, and all nodes other than the addressed node go back to sleep. The rest of the data in the packet (and the ACK response) is transmitted as 9 bit symbols with the high bit cleared to 0, effectively 8 bit values, which is ignored by the sleeping nodes. (This is similar to the way that all data bytes in a MIDI message are effectively 7 bit values; the high bit is set only on the first byte in a MIDI message). Alas, some UARTs make it awkward,[12][13] difficult, or impossible to send and receive such 9-bit characters.

Some communication protocols break "unique start" -- i.e., they allow the no-longer-unique start-of-packet symbol to occur elsewhere -- most often because we are sending a file that includes that byte, and "simple copy" puts that byte in the data payload. When a receiver is first turned on, or when cables are unplugged and later reconnected, or when noise corrupts what was intended to be the real start-of-packet symbol, the receiver will incorrectly interpret that data as the start-of-packet. Even though the receiver usually recognizes that something is wrong (checksum failure), a single such noise glitch may lead to a cascade of many lost packets, as the receiver goes back and forth between (incorrectly) interpreting that data byte in the payload as a start-of-packet, and then (incorrectly) interpreting a real start-of-packet symbol as payload data. Even worse, such common problems may cause the receiver to lose track of where characters begin and end. Early protocol designers believed that once synchronization has been lost, there must be a unique start-of-packet character sequence required to regain synchronization.[14] Later protocol designers have designed a few protocols, such as CRC-based framing,[15] that not only break "unique start" -- allow the data payload contain the same byte sequence as the start-of-packet, supporting simple-copy transparency -- they don't even need a fixed unchanging start-of-packet character sequence.

In order to keep the "unique start" feature, many communication protocols break "simple copy". This requires a little extra software and a little more time per packet than simply copying the data -- which is usually insignificant with modern processors. The awkwardness comes from (a) making sure that the entire process -- the transmitter encoding/escaping a chunk of raw data into a packet payload that must not include the start-of-packet byte, and the receiver decoding/unescaping the packet payload into a chunk of raw data -- is completely transparent to any possible sequence of raw data bytes, even if those bytes include one or more start-of-packet bytes, and (b) since the encoded/escaped payload data inevitably requires more bytes than the raw data, we must make sure we don't overflow any buffers even with the worst possible expansion, and (c) unlike "simple copy" where a constant bitrate of payload data bits results in the same constant goodput of raw data bits, we must make sure that the system is designed to handle the variations in payload data bitrate or raw data bit goodput or both. Some of this awkwardness can be reduced by using consistent-overhead byte stuffing (COBS).[16] rather than variable-overhead byte stuffing techniques such as the one used by SLIP.

Calculate the CRC and append it to the packet *before* encoding both the raw data and the CRC with COBS.[17]

preamble

Two popular approaches to preambles are:

Automatic baud rate detection


For further reading

  1. Andy McFadden. "Designing File Formats"
  2. Wikipedia: elementary stream
  3. Gabriel Bouvigne. "MPEG Audio Layer I/II/III frame header"
  4. Predrag Supurovic. "MPEG Audio Frame Header"
  5. uLan: 9-bit message oriented communication protocol, which is transferred over RS-485 link.
  6. Pavel Pisa. "uLan RS-485 Communication Driver" "9-bit message oriented communication protocol, which is transferred over RS-485 link."
  7. Peter Gasparik. "9-bit data transfer format"
  8. Stephen Byron Cooper. "9-Bit Serial Protocol".
  9. "Use The PC's UART With 9-Bit Protocols". 1998.
  10. Wikipedia: multidrop bus (MDB) is a 9-bit protocol used in many vending machines.
  11. ParitySwitch_9BitProtocols: manipulate parity to emulate a 9 bit protocol
  12. "Use The PC's UART With 9-Bit Protocols". Electronic Design. 1998-December.
  13. Thomas Lochmatter. "Linux and MARK/SPACE Parity". 2010.
  14. J. Robinson. RFC 935. "Reliable link layer protocols". 1985. quote: "once a header error has been detected, the count field must be assumed to be invalid, and so there must be a unique character sequence that introduces the next header in order that the receiver can regain synchronization with the sender."
  15. Wikipedia: CRC-based framing
  16. "Consistent Overhead Byte Stuffing" by Stuart Cheshire and Mary Baker, 1999.
  17. Jason Sachs. "Help, My Serial Data Has Been Framed: How To Handle Packets When All You Have Are Streams". 2011.
"CMX-MicroNet is the first system that allows TCP/IP
and other protocols to be run natively on small processors
... [including] AVR, PIC 18, M16C."
This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.