Natural language processing, the CARMEL toolkit, and the Zodiac Killer

You may have heard about the “computer that was taught to think like the Zodiac Killer,” which was featured on a History Channel series about the eponymous murderer and has received a fair amount of science journalism coverage. Incorrectly referred to as a “supercomputer” by several articles, it is actually a software package called “Carmel” that was developed by Jonathan Graehl, Kevin Knight, and co-workers at the University of Southern California’s Information Sciences Institute, though I suppose that they are running Carmel on some very powerful computers for the large-scale cypher decryption jobs that have attracted so much attention from the press. The Carmel toolkit was designed to address natural language processing problems, which we can think of as “computer hearing” or “computer reading” problems in some ways analogous to the computer vision topics I’ve covered in previous blog entries. You can download the toolkit for non-commercial purposes here.

Technically, Carmel is a “finite state transducer” package. But what the heck does that mean? First, we should define a finite state machine, which can be both an abstract “machine” in the form of a piece of software, or a real, physical machine. A finite state machine can exist in exactly one state at a time, determined by its inputs, and can “transition” to another state given appropriate inputs. That may sound confusing, but there are finite state machines all around us. One classic example is a vending machine. Given the right amount of money and button presses, all in a specific sequence, the machine transitions from a resting state into a state wherein it is vending a product. Following vending, the machine transitions again, back to its resting state. Other examples of common finite state machines include subway turnstiles that transition from locked to open after a token is inserted and back to locked after the passenger passes, and combination locks that transition from locked to open following a specific series of numerical inputs.

Finite state machines can be “acceptors” or “transducers.” Acceptors, like our vending machine example above, produce an output if a given input is accepted by the current state of the machine: the machine vends only if the correct inputs are provided, otherwise it just sits there. Transducers go beyond simple accepting or not accepting an input; instead they convert or transduce an input signal to a different output signal. Finite state transducers find practical application in language processing jobs like automatic translation. The example below, taken from Kevin Knight and Yaser Al-Onaizan’s primer on finite-state software, shows some simple finite state machines for language processing. On the left is an acceptor that accepts three simple inputs: “he saw me,” “he ran home,” and “she talked.” In the middle is a representation of a transducer that will change an input of “big dog” to “small dog.” Finally, on the right is a representation of a transducer that might be used as (a very small) part of a speech-to-text application, where the phonetic sounds represented by capital letters SH, AE, and K are converted into (possible) letter equivalents.

Schematic representation of example finite state machines. Left, a simple finite state acceptor; middle, a finite state transducer that converts the phrase “big dog” to “small dog;” right, a finite state transducer that converts phonemes (capital letters) to letters (lowercase). The symbol *e* indicates an empty transition label and is necessary here so that SH is translated only to “sh” and “s” alone is disallowed.

Schematic representation of example finite state machines. Left, a simple finite state acceptor; middle, a finite state transducer that converts the phrase “big dog” to “small dog;” right, a finite state transducer that converts phonemes (capital letters) to letters (lowercase). The symbol *e* indicates an empty transition label and is necessary here so that SH is translated only to “sh” and “s” alone is disallowed.

The examples above are “unweighted” or “deterministic” machines. To address practical language processing problems, “probabilistic” machines are needed. Simply put, probabilistic acceptors and transducers are deterministic machines with weights added to the transitions. If these weights reflect the probability of words appearing in specific sequences in a natural language like English, then we have begun to create a “language model.” A language model that can predict the relative probability of specific words occurring in a given sequence is created first by building a vocabulary using a large body of the target language in normal use and syntax. Then, the language model can be trained to recognize how frequently specific word combinations are likely to occur. This is useful in voice-recognition algorithms. For example a computer program might distinguish the phonetically similar “I scream” from “ice cream” based on the surrounding syntax. The training process of these algorithms is conceptually akin to training of neural networks that I have covered in previous posts.

Another useful application of entrained, linked, and nested networks of finite state machines like those that can be created with Carmel is cipher decryption. In 2011, Carmel was used to successfully decrypt a work known as The Copiale cipher. This manuscript consists of 75,000 handwritten characters in a substitution cipher and remained undeciphered for more than 260 years. One of the initial steps was to identify the language of the plaintext version of the manuscript. This was accomplished by comparing character usage patterns of the manuscript to known languages, together with a hint in the form of one of the only unencrypted words in the manuscript, “Phillipp,” a German spelling. After determining the original language, and with some intuitive guidance by Kevin Knight, the Copiale cipher was decoded in a matter of several weeks using a “brute force” computational method based on a network of finite state machines trained on German letter and word usage patterns and managed by the Carmel package.

That brings us to the Zodiac Killer, who was active in northern California in the late 1960s and early 1970s. The Zodiac Killer is probably most famous for the taunting letters sent to the press. In addition to the taunts and threats, these letters contained apparently encoded text. The first Zodiac cipher was decrypted in a short period of a few days by a school teacher and his wife, but three other Zodiac ciphers have remained unsolved to this day. In an attempt to decode the remaining unsolved ciphers, Carmel was used to entrain a network of finite state machines with the specific writing style of the Zodiac killer, who sent many letters in plaintext to papers and the police while active. Although the cipher appears not to have been solved yet, an interesting side effect has been the creation of an algorithm that can recreate the verbal style of the Zodiac killer. Knight and co-workers have even created a tool you can use to make your own (bad) creepy Zodiac poetry (although I haven’t been able to get this to work in my browser…) Here’s an example.

Mourning moaning mournful whispers weeping,
Together through the sleepless strangers slumber,
Alone and lonely lovers sleeping thieving,
A helpless dead forgotten former lover.

Through the taxi and the prison break,
Alone and angry at a brutal murder.
Surrounded by an artificial lake,
Never a convicted murderer…