In previous blogs, we’ve looked at some interesting integer sequences. We’ve looked at the Fibonacci sequence, the Look-and-say sequence, Golomb’s sequence, Van Eck’s sequence and the Lazy Caterer’s sequence. Today, we’re looking at the Kolakoski Sequence.
This fun sequence can be found on the Online Encyclopedia of Integer Sequences here.
The Kolakoski Sequence
The sequence is named after recreational mathematician William Kolakoski, who described it in 1965, when he was only 20 years old. Kolakoski was also an artist. However the sequence is sometimes named the Oldenburger-Kolasinski sequence, after another American mathematician Rufus Oldenburger who was also associated with the sequence.
How to generate the sequence
The Kolakoski Sequence, much like Golomb’s sequence, is a self-describing sequence. The sequence consists of only 1s and 2s, and begins with 1. Each term in the sequence describes the length of the next run of the same number (either 1s or 2s). That probably didn’t make sense (as it didn’t to me the first time I read it), so let me show you what I mean…
The sequence begins with 1. So this means the first run of the same digit only contains 1 number. Since the first number is 1, this means there must only be one 1 in this run. And by definition, because this run must only contain one 1, the next number must be 2.
Because the second number is 2, that means that there are 2 of the same number in the next run. So the sequence will next contain another 2. And this term will be the end of this run of the same number.
Because the third term is 2, we know that the next run of the same number will have length 2. Because we know the third term was the end of the last run, the fourth term has to be 1.
The fifth term will also be 1, as this run has to have length 2.
Hopefully it’s making more sense now. But if not, maybe this graphic will help your understanding:
I’ve put the sequence on two separate lines. The top line will focus on the lengths of the next run of the same number. The bottom line will show these runs, and they will be colour-coded so you can see which numbers correspond.

So the black number in the top line shows how many black numbers appear in the bottom line, the red number in the top line shows how many red numbers appear in the bottom line, and so on for each of the colours.
Another effective way of displaying the sequence is below. Each run of numbers is underlined, and the length of that run is displayed below the run. What you will notice is the numbers below the sequence follow also follow the sequence perfectly!

But it gets even better than that. What if we did the same for the second row…

You’ll notice that we see the sequence again on the third row!
And if we did it again?

You guessed it. The sequence appears yet again. This will go on forever, and you can do this as many times as you like – it will always generate the same numbers in that same order. I know that maths being cool is relative, but that’s incredibly cool isn’t it?
Characteristics of the sequence
Obviously we already know that there are only 1s and 2s in the sequence, as this is how the sequence is defined. By this logic, it follows that we will never see more than two of the same number in a row – otherwise we would have a number larger than 2 in the sequence, which is not allowed.
The obvious question to ask about the Kolakoski sequence is does it contain more 1s or more 2s?
The answer to this question, as often is the case with these fascinating integer sequences, is that we don’t know (yet). Although mathematicians are working on it. They hypothesise that it tends to an exact 50/50 split.
What we can do, however is make a decent guess based on the evidence we have. Using some brilliantly written Python code I found on Stack Exchange, I generated 4 million terms in the sequence and counted how many 1s and 2s appeared. Click here for the code I used.
The table below shows the numbers (and percentages) of 1s and 2s the sequence generates in the first n terms:
| n | 1s | 2s | % 1s | % 2s | % difference |
| 100 | 49 | 51 | 49.0000% | 51.0000% | 2.0000% |
| 1,000 | 502 | 498 | 50.2000% | 49.8000% | 0.4000% |
| 10,000 | 4,996 | 5,004 | 49.9600% | 50.0400% | 0.0800% |
| 100,000 | 49,972 | 50,028 | 49.9720% | 50.0280% | 0.0560% |
| 1,000,000 | 499,986 | 500,014 | 49.9986% | 50.0014% | 0.0028% |
| 2,000,000 | 999,952 | 1,000,048 | 49.9976% | 50.0024% | 0.0048% |
| 3,000,000 | 1,500,021 | 1,499,979 | 50.0007% | 49.9993% | 0.0014% |
| 4,000,000 | 1,999,967 | 2,000,033 | 49.9992% | 50.0008% | 0.0017% |
You can see from the table that most of the time there are more 2s than 1s (except for the first 1,000 and 3 million terms, for some reason), and that the percentage difference between the number of 1s and 2s usually seems to narrow the more terms you consider. Except, curiously, for 2 million and 4 million (which I have underlined), where actually the gap increases with a million more terms. But you can see that in broad terms, without considering lots of decimal places, the percentages seem to tend to 50% each – as is suspected. I could try to do analyse more terms, but I don’t think my poor little laptop could cope with much more.
If you’re a coder and are interested in how I did this, I can share my Python code with you if you drop me an email at sam@metatutor.co.uk
I hope you found that blog interesting. I certainly found this sequence to be fascinating, it’s probably my favourite one so far (except for Fibonacci of course).
If you are in Bristol and need some additional in-person support, book a free taster session.
Click here to read testimonials from some of our clients (and here for Google reviews).
Click here to access our free worksheets.
