Tuesday 26 January 2021

How to make breaking a simple substitution cipher harder

 



As you may be aware, simple substitution ciphers can be cracked quite easily, especially if your text is long. Still, there are ways to make things a little difficult for any potential code breakers trying to crack it.

Strategy one: Hide the spaces

Letter lengths can clue someone in on standalone letters like ‘a’ and ‘I’, or common doubles and triples such as ‘so’, ‘to’, ‘on’, ‘the’, etc. One way to obscure these is to hide your spaces. One option is to remove them completely, but it can lead to ambiguity, as you don't know where one word ends and the other begins. This is especially true with complicated words and place names.

You can also hide them in the ciphertext by replacing spaces with an uncommon letter, say, z in plaintext, likezthis.zIfzyouzarezwritingzinzEngllish,zthezletterszZzandzQzarezsuitablezforzthis.

Once you cipher it, you'll get a solid block of text. This will make short words less obvious visually. However, it can be cracked, because when you do the frequency count, guess what will be the most numerous? The character you replaced space with. For example, this paragraph has fifty eight spaces and thirty five examples of the character E.

In addition to that, when it comes to doubles and triplets, there is statistical data available on the most common pairs and triples in the English language. This data can also be used to break the cipher, again, provided your ciphertext is long enough.

Strategy two: Change your alphabet

What we’ve done so far uses just the 26 letters of the English alphabet. You can mix it up by separating upper and lowercase letters like so:





You can make it even more complicated with numbers and punctuation, or even the space character. Or you could include symbols and Greek letters. I won't judge you.

There are two reasons this approach works. One, the alphabet is larger, which means that the probability distribution has to be calculated over a larger number of characters. Often, these frequency tables are harder to obtain if not impossible. If you throw a few numbers and punctuation into the mix, it’s all but impossible.

The second is, a larger number of possible characters also means you need more text for the typical distribution of letters to show up.

At the end of the day, though, you’re still working with a simple substitution cipher. The most numerous character will be space, followed by lowercase e. Based on that, you can crack the rest. That's a weakness of simple substitution ciphers in general. You could complicate things by replacing characters, but there is a limit to what they can do to secure your cipher.

If you have any other strategies to make breaking the cipher harder, please tell me more in the comments. 

You can follow me on Facebook here or on YouTube here.

See you next time!


No comments:

Post a Comment

How to write a character who is smarter than you

We all have that one character (or few) who is significantly smarter than the writer. So, as a writer, how do you write such a character con...