I recently received an e-mail from one Zeno Luiz Iensen Nadal,
a worker for Siemens in Brazil. He asked "My Algorythms teacher
asked me and my colleagues 'Why a byte has eight bits?' Is there
a technical answer for that?"
Of course I could not resist a reply to someone named Zeno, after
that teacher of ancient times. Some people copied on the reply
thought it a useful document, so (having done the hard work already)
I add it to my site as further bite of history.
I am way behind in my work, but I just cannot resist trying to
answer your question on why a "byte" has eight bits.
The answer is that some do, and some don't. But that takes
explaining, as follows:
If computers worked entirely in binary (and some did a long time
ago), and did nothing but calculations with binary numbers, there
would be no bytes.
But to use and manipulate character information we must have
encodings for those symbols. And much of this was already known
from punch card days.
The punch card of IBM (others existed) had 12 rows and 80
columns. Each column was assigned to a symbol, a term I use here
although they have fancier names nowadays because computers have
been used in so many new ways.
The columns, going down, starting from the top, were
12-11-0-1-2-3-4-5-6-7-8-9. A punch in the 0 to 9 rows signified
the digits 0-9. A group of columns could be called a "field",
and a number in such a field could carry a plus sign for the
number (an additional punch in top row 12 of the units position
of the number), or a minus sign (an additional punch in row 11
just under that).
Then they started to need alphabets. This was accomplished by
adding the 12 punch to the digits 1-9 to make letters A through
I, the 11 punch to make letters J through R. For S through Z
they added the 0 punch to the digits 2 through 9 (the 0-1
combination was skipped -- 3x9=27, but the English alphabet has
only 26 letters). The 12, 11, and 0 punches were called "zones",
and you'll notice them today lurking in the high-order 4 bits.
Remember that this was much prior to binary representations of
those same characters.
The first bonus was that the 12 and 11 punches without any 0-9
punch gave us the characters + and -. But no other punctuation
was represented then, not even a period (dot, full stop) in IBM
or telecommunication equipment. One can see this in early
telegrams, where one said "I MISS YOU STOP COME HOME STOP".
"STOP" stood for the period the machine did not have.
Then punctuation and other marks had combinations of punches
assigned, but there had to be 3 punches in a column to do this.
In most case the third punch was an extra "8".
In this way, with 10 digits, 26 alphabetic, and 11 others, IBM
got to 47 characters. UNIVAC, with different punch cards (round
holes, not rectangles, and 90 columns, not 80) got to about 54.
But most of these were commercial characters. When FORTRAN came
along, they needed, for example, a "divide" symbol, and an "="
symbol, and others not in the commercial set. So they had to use
an alternate set of rules for scientific and mathematical work. A
set of FORTRAN cards would cause havoc in payroll!
With many early computers these punch cards were used as input
and output, and inasmuch as the total number of characters
representable did not exceed 64, why not use just 6 bits each to
represent them? The same applied to 6-track punched tape for
teletypes.
In this period I came to work for IBM, and saw all the confusion
caused by the 64-character limitation. Especially when we started
to think about word processing, which would require both upper
and lower case. Add 26 lower case letters to 47 existing, and
one got 73 -- 9 more than 6 bits could represent.
I even made a proposal (in view of STRETCH, the very first computer
I know of with an 8-bit byte) that would extend the number of punch
card character codes to 256 [1]. Some folks took it seriously.
I thought of it as a spoof.
So some folks started thinking about 7-bit characters, but this
was ridiculous. With IBM's STRETCH computer as background,
handling 64-character words divisible into groups of 8 (I
designed the character set for it, under the guidance of Dr.
Werner Buchholz, the man who DID coin the term "byte" for an
8-bit grouping). [2] It seemed reasonable to make a universal
8-bit character set, handling up to 256. In those days my mantra
was "powers of 2 are magic". And so the group I headed developed
and justified such a proposal [3].
That was a little too much progress when presented to the
standards group that was to formalize ASCII, so they stopped short
for the moment with a 7-bit set, or else an 8-bit set with the
upper half left for future work.
The IBM 360 used 8-bit characters, although not ASCII directly.
Thus Buchholz's "byte" caught on everywhere. I myself did not
like the name for many reasons. The design had 8 bits moving around
in parallel. But then came a new IBM part, with 9 bits for
self-checking, both inside the CPU and in the tape drives. I
exposed this 9-bit byte to the press in 1973. But long before
that, when I headed software operations for Cie. Bull in France
in 1965-66, I insisted that "byte" be deprecated in favor of
"octet".
You can notice that my preference then is now the preferred term.
It is justified by new communications methods that can carry 16, 32,
64, and even 128 bits in parallel. But some foolish people now
refer to a "16-bit byte" because of this parallel transfer, which
is visible in the UNICODE set. I'm not sure, but maybe this
should be called a "hextet".
But you will notice that I am still correct. Powers of 2
are still magic!
REFERENCES
- R.W.Bemer, "A proposal for a generalized card code of 256 characters",
Commun. ACM 2, No. 9, 19-23, 1959 Sep
-- Computing Reviews 00025
Early public hint of 8-bit bytes to come.
- R.W.Bemer, W.Buchholz, "An extended character set standard",
IBM Tech. Pub. TR00.18000.705, 1960 Jan, rev. TR00.721, 1960 Jun
-- Computing Reviews 00813
- R.W.Bemer, H.J.Smith, Jr., F.A.Williams,
"Design of an improved transmission/data processing code",
Commun. ACM 4, No. 5, 212-217, 225, 1961 May
-- Computer Abstracts 61-1920
ASCII in its original form.
Back to Home Page