I’m developing some code to work with Markov models, and as an early test I created a model of Shakespeare’s sonnets. Then I had the program generate some verse from the model. For example:
If they in thou, thy love, that old,
Thought, that which yet do the heat did excess
My love concord never saw my woeful taste
At ransoms you for that so unprovident:
For thereby beauty show’st I am fled,
Althoughts to the dead, that care
With should thus shall by their fair,
Where too much my jade,
Her loves, my heart.
The Markov model generates text one character at a time. The next character is selected based on the distribution of characters in the original source text, and also on the “order” of the model. In an order 0 model, character selection is based entirely on the frequency of characters in the source text. If the letter ‘e’ appears 12% of the time in the source text, then it will appear approximately 12% of the time in the output. Output from an order 0 model is just a weighted random sampling of the source text. For example:
oaa,stpr iyt eimhr rttntnettohe Tt
nnnfIhwste aek oppe
An order 1 model bases character selection on the frequency of characters that follow the current character. That is, given the current character, the next character selected is guaranteed to be one that followed that character in the source text. The model cannot generate a bigram (sequence of two characters) that did not exist in the source. So, for example, the sequence ‘qz’ can’t appear in the generated text unless it was also in the source. Verse generated from an order 1 model is slightly better than what you’ll get from an order 0 model:
Myouivey inn f th peroeethe,
Touss s o by thy!
Ar temye corste d w.
Be wh s t, n rverose d th ms t,
I s s Ifiththin ay past lll s farenoulofan r, t thonard areall ngain,
Lorowor d thoodoref pathe,
Things begin to improve quickly as you move up in order of the model. Order 2 bases character selection on the two preceeding characters, and you start to see recognizable English words although overall it looks more like some freaky foreign language:
Thich Must I winter,
The I do loodye ming to lin
So me lie stat so hou faut forse trom fore.
That, all shart and’s mor thee my swed thy head.
And thawarit, a strand sabudge,
To ale whal wart,
Order 3 verse contains primarily recognizable English words. At order 5 (the first sample above), almost all words are valid and you begin to see word pairs that existed in the original source text. The resulting verse is reminiscent of a drunk and confused dramatics major trying to impress a group of girls with his knowledge of Shakespeare.
Order 10 verse is like Shakespeare in a blender:
Thou blind fool Love, what dost thou abuse,
The bounteous largess given thee to give?
Profitless usurer why dost thou that I do fawn upon,
Nay if thou live remembered
My deepest sense, how hard true sorrow hits,
And sealed false bonds of love doth well denote,
Love’s fire heats water, water cools not love
Which alters when it alteration finds,
Or bends with thee, wherever I abide,
The guilty goddess of my passion,
A woman’s face with nature’s truth,
And delves the parallels in beauty dyed?
Both truth and beauty should look so.
Here you sometimes see complete sonnet lines. Other times you’ll see the first half of one line morph into the second half of another, or just pure randomness.
Things get less interesting beyond order 10, and by order 15 the thing starts spitting out complete sonnets exactly as they were fed in.
I probably spent entirely too much time “testing” my program: generating verse and laughing at the results. But it was a welcome bit of humor in an otherwise somewhat dull day.