Life as Digital Information
James Watson, one of the two scientists credited with the
discovery of the genetic code, was once asked by a journalist to summarize the
significance of his discovery in a single sentence. Watson thought hard for a
moment, and then replied, "All life is digital information."
As members of the 21st century, we are deeply
embedded in a digital world. From the
music playing on our MP3 players to checking our e-mail, we have become
increasingly dependent on digital information. When we check e-mail, we take
for granted what is happening behind the scene within the buzzing box on our
desk. Recall that a computer relies on
two essential components; information and a way to interpret that information
and translate it into function. A computer stores information in bits, each of
which is a simple distinction between "on" and
"off," or "yes" and "no." With a
sufficient number of bits, a computer can represent and store large numbers,
images, or words in a language. Once the computer has this information it is
useless unless it also has rules for interpreting and manipulating the
information. As users, we install software or programs onto our computer that
serve as the rules for proper use of the information.
Language is just one way to convey information. In the
English language, the smallest unit of information is a letter. However, the smallest unit of useful information is
a word. Combining our simple list of twenty-six letters in a variety of ways
will make many different words, but not all of the possible combinations of
letters are allowed. In fact, the
average person only uses 3500 to 4000 words to communicate. Just as random letters don’t always create a meaningful words, neither do a string of words necessarily
convey information. It is the rules of
language that subsequently dictate how words can be combined. The results can be as beautiful and varied as
a haiku poem or a long novel.
Life, by nature, is very complex. Just like a novel consists of thousands of
words that are created from a limited alphabet of letters, the instructions for
life are written using an alphabet of just four letters. These letters are simply molecules that we
call bases: adenine (A), cytosine (C),
guanine (G), and thymine (T). Each base
is combined with a sugar molecule and a phosphate molecule to make a larger
molecule that is called a nucleotide.
The nucleotides bind together to make a long string. Two of these strings can then line up side by
side. The bases of the nucleotides in
one string bind to the bases in the other string, like the rungs that join the
two sides of a ladder together. The
structure of the DNA molecule, this double-stranded ladder that twists like a
spiral staircase, was the major discovery credited to James Watson and Francis
Crick in the 1950s.
This simple alphabet of just four letters can easily be
represented digitally as “bits”. For
instance, if we had just two letters we could represent those two letters in
two bits with the numbers 0 and 1. However, in this case we have four
letters. We can use a combination of 1s
and 0s to represent the four letters of our alphabet as follows:
A = 1 1
T = 1 0
C = 0 1
G = 0 0
Now that we have an alphabet, we need a set of rules to help
make sense of the alphabet. The genetic
code that governs all life has only a couple of rules. The first rule concerns
the actual structure of the molecules that make up DNA. The shape of the molecules A, T, G, and C
means that they can only combine to make the rungs of the ladder in a few ways
— adenine and thymine can combine with each other, and cytosine and guanine can
combine with each other. Other pairings are highly unlikely. So the four
possible rungs of the DNA ladder are AT, TA, CG, and GC. This means that one
side of the ladder is complementary to the other; you can predict the string of
molecules comprising one side of the ladder if you know the other side. The other rule relates to the way the
information contained in a DNA molecule is read. If we were to read only one
side of the ladder at a time, from top to bottom, we wouldn’t be sure where one
word ends or another begins. Life
eliminates this problem by reading the ladder three nucleotides at a time; each
word consists of only three letters. We
call each three-letter word a codon.
With only four letters, there are 4 × 4 × 4 = 64 ways that they can
combine in groups of three. Remember that each letter can be translated into
two bits of information. If there are
three letters in each codon, then a single codon represents 2 × 3 = 6 bits of
information.
We learned in basic biology that genes are the molecules in
our cells that serve as the blueprint for life.
Genes are made up of DNA, and can therefore be thought of as a sequence
of codons. We can think of a gene as
analogous to a paragraph in the English language. Each gene contains several
thousand bits of information. A human
has about 30,000 genes. Simple organisms have far fewer. The total number of
base pairs (rungs of the ladder) in our DNA is about 3 billion, or 3 × 109. If we wanted to, we could calculate the
digital information content of the human genetic code. Each base is two bits,
so the information content is 2 × 3 × 109 = 6 billion bits. Recall that 8 bits
equals one byte. So the information content is (6 × 109)/8 = 7.5 × 108 bits, or
750 megabytes — equal to the information contained in 500 books or two hours of
music on a compact disk. You could easily store the information of the human
genetic code on a personal computer or small memory card… in your pocket!