Markov chains and WonderTao

Three years ago I wrote a small Python script as a bit of fun. The code combines Alice’s Adventures in Wonderland with the Tao Te Ching using a Markov chain algorithm to generate pseudo-profound snippets of nonsense. These get tweeted to the world through the @WonderTao account.

Markov chains are kind of amazing. The underlying principle is straightforward, but it can be hard to wrap your head around at first. From Wikipedia:

Markov chain […] is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of ‘memorylessness’ is called the Markov property. Markov chains have many applications as statistical models of real-world processes.

The key to understanding how Markov chains work is the line, “the next state depends only on the current state and not on the sequence of events that preceded it.” The basic idea is that you pull in a whole bunch of sequence data, split it into a series of small components, examine the likelihood of each part preceding every other part and then store the results in a sort of frequency dictionary.

The input sequence data might be a collection of discrete observations, like a series of weather conditions (e.g. “clear”, “clear”, “cloudy”, “showers”, “rain”, “rain”…), or perhaps words from a text (e.g. “Alice was beginning to get very tired of sitting by her sister on the bank…”). You take the input sequences, split them into a series of token keys that constitute all possible states, then establish the probability that each token will precede every other element.

To create @WonderTao I used pairs of words as token keys  So, for example, the opening words of Alice’s Adventures in Wonderland“Alice was beginning to get very tired of sitting by her sister on the bank…”, are spit into (“Alice”, “was“), (“was”, “beginning“), (“beginning, to“), (“to, get“), (“get”, “very“) and so on. Between the two source texts, there are eight possible words that follow the token key (“Alice“, “was“):

  1. “Alice was beginning…“,
  2. “Alice was not“,
  3. “Alice was soon“,
  4. “Alice was more“,
  5. “Alice was just“,
  6. “Alice was a“,
  7. “Alice was very“,
  8. “Alice was too“.

My implementation ignores probabilities and simply picks a following word at random. In this case, let’s imagine that the algorithm picks “not”. The sentence becomes “Alice was not…” and the current two-word token key is shuffled to the final two words: (“was”, “not”). This new key is associated with the words “a”, “here”, “marked”, “easy” and “vainly”. Again picking a word at random the sentence might become “Alice was not here.

Hmmm… I guess that is still a bit confusing.

Here is what I like about @WonderTao. I have almost no control over what comes out the other end. Most of the output is garbage. However, it frequently generates text that is amusing and sometimes even profound. When I find something that appeals to me, I queue it up in a tweet scheduler service and eventually it gets sent out to the world. The algorithm plays with my expectations. Markov chains are simple but unpredictable mathematical constructs. The reason I like this silly thing is because, even though I understand everything about it, I have no idea what it will do. And there is delight in novel surprises.


The WonderTao code is available on Githib. I have not included any source texts in the repository, but they can easily be sourced from places like Project Gutenburg or elsewhere on the web.

Leave a Reply

Your email address will not be published. Required fields are marked *