bryanenglish.com

Butts, and The Internet

23 Oct 2019

Note: This article first appeared on dev.to

First… Butts

Let’s talk about butts. Specifically horse butts.

There’s a story that’s been going around the Internet, probably since its inception, about how the dimensions of spaceship parts are indirectly derived from the width of a horse’s butt. The short version is something like this:

Roman chariots were typically pulled by two horses. Therefore chariots were about two horse butts wide. When the Romans built stone roads, they put ruts in to keep the carriages aligned. They used the existing widths of ruts as a guideline for the stone ruts, and then carriages throughout Europe were built for those new stone ruts. Fast forward to the industrial age and you find that European railroad builders used the ruts in the stone carriageways as a guide for how wide to build the rails (i.e. the gauge). This meant a whole bunch of trains in Europe being built approximately two horse butts wide. When America started building railroads, European engineers were brought over and used the same measurements. A final fast forward to the space age, and you’ve got rocket boosters for the Space Shuttle being transported by rail through rail tunnels, and so they need to be skinny enough to fit through those tunnels.

A typical two-horse-butt train tunnel.
A typical two-horse-butt train tunnel.

There are some obvious problems with this. The United States didn’t have a common track gauge until after the Civil War, and it was only chosen because it happened to be the only one used in the North. Even more glaring here is the fact that train tunnels are quite a bit wider than train tracks, and in fact were not an issue they had to design around for the shuttle rocket boosters. Snopes does a great job of tearing this one down.

As it turns out, simple boring happenstance is the main reason these things seem to line up as they do.

CR+LF

Way back in the 1960s, the ISO and ANSI (then called ASA) were in the process of standardizing character sets. Part of any set of printable characters is a way to indicate that text needs to appear on a new line. The two contenders were the CR+LF two-byte combo, versus a single LF on its own. In C-like languages, these are represented as \r\n and \n respectively. The ISO drafts suggested either CR+LF or LF. The ASA draft only used CR+LF. Either way, a two character sequence was supported by both standards in order to produce the new line effect.

But why? Surely one character ought to do it. Indeed, in most use cases today, we only use LF, so what was the need?

Teletype Model 33
This thing is why we have CR+LF

As it turns out, a lot of computing at the time was done using Teletype Model 33 ASR machines as terminals for input and output. These machines required both instructions in order to bring the printer head back to the start of the line, using CR (“carriage return”), and down one line, using LF (“line feed”).

We no longer use Teletype machines, and haven’t for some time. That hasn’t stopped the various twists and turns of history from keeping the CR+LF alive, long after its original technical need had been obsoleted.

When Unix arrived on the scene in the 1970s, it used the LF character alone to denote a new line transitions in text files, taking the shorter option in the ISO specification. Despite this efficiency, later operating systems like MS-DOS and Windows preferred the CR+LF line delimiter, adhering to both standards.

In 1989, the earliest version of the World Wide Web was born, and with it, HTTP. Like any other text format, newlines needed to be represented. From HTTP/0.9 straight through to HTTP/1.1, the CR+LF was used to denote the end of an HTTP message, and in the case of later versions to delimit headers. Part of the reason the two-character form of newlines was used was the differences between operating system text formats. HTTP/2 and HTTP/3 now use a compressed binary header format that does not make use of of the CR+LF to delimit headers, but since only about 41% of websites use HTTP/2, and HTTP/3 isn’t standardized yet, you’re still likely using CR+LF under the hood in 2019, regardless of which operating system you use.

Much like with horse butts, there’s a cool story of how some obsolete technology makes a surprise appearance in modern technology. Unlike the horse butts, we don’t need to squint in order to see it right there in the technology we’re using. In both cases, it’s still a matter of boring happenstance that caused an old weird technical decision to last many years longer than it should.