Tuesday 26 January 2021

How to make breaking a simple substitution cipher harder

 



As you may be aware, simple substitution ciphers can be cracked quite easily, especially if your text is long. Still, there are ways to make things a little difficult for any potential code breakers trying to crack it.

Strategy one: Hide the spaces

Letter lengths can clue someone in on standalone letters like ‘a’ and ‘I’, or common doubles and triples such as ‘so’, ‘to’, ‘on’, ‘the’, etc. One way to obscure these is to hide your spaces. One option is to remove them completely, but it can lead to ambiguity, as you don't know where one word ends and the other begins. This is especially true with complicated words and place names.

You can also hide them in the ciphertext by replacing spaces with an uncommon letter, say, z in plaintext, likezthis.zIfzyouzarezwritingzinzEngllish,zthezletterszZzandzQzarezsuitablezforzthis.

Once you cipher it, you'll get a solid block of text. This will make short words less obvious visually. However, it can be cracked, because when you do the frequency count, guess what will be the most numerous? The character you replaced space with. For example, this paragraph has fifty eight spaces and thirty five examples of the character E.

In addition to that, when it comes to doubles and triplets, there is statistical data available on the most common pairs and triples in the English language. This data can also be used to break the cipher, again, provided your ciphertext is long enough.

Strategy two: Change your alphabet

What we’ve done so far uses just the 26 letters of the English alphabet. You can mix it up by separating upper and lowercase letters like so:





You can make it even more complicated with numbers and punctuation, or even the space character. Or you could include symbols and Greek letters. I won't judge you.

There are two reasons this approach works. One, the alphabet is larger, which means that the probability distribution has to be calculated over a larger number of characters. Often, these frequency tables are harder to obtain if not impossible. If you throw a few numbers and punctuation into the mix, it’s all but impossible.

The second is, a larger number of possible characters also means you need more text for the typical distribution of letters to show up.

At the end of the day, though, you’re still working with a simple substitution cipher. The most numerous character will be space, followed by lowercase e. Based on that, you can crack the rest. That's a weakness of simple substitution ciphers in general. You could complicate things by replacing characters, but there is a limit to what they can do to secure your cipher.

If you have any other strategies to make breaking the cipher harder, please tell me more in the comments. 

You can follow me on Facebook here or on YouTube here.

See you next time!


Wednesday 20 January 2021

Is 'said' a good dialogue tag?

 



To make my stance on the issue clear – I prefer a mixture of no dialogue tags and ‘said’. I know not everyone agrees with me, and in fact, I know most of my English teachers do not agree with me, but hear me out.

I’ll start with an example. 

Example 1: Anything but said

It’s the same dialogue, written in four different ways. The first: Here, dialogue tags are used every time, but if ‘said’ can be avoided, it is avoided.


This may not be the best representation of that kind of writing, but that’s mostly because I actively dislike this type of writing. My problem with it is that it leaves nothing to the reader’s imagination. It treats the reader like an idiot. On top of that, it adds emphasis everywhere, dulling the effect for when you really need to hammer the point home.

Example 2: Always said

Here, too, the speaker is indicated every time, and ‘said’ is used except when it’s necessary to use a different dialogue tag.



As far as I am concerned, this is actually better than the previous one. True, it doesn’t demonstrate your superior vocabulary, but the dialogue tag is mostly invisible here. More importantly, the reader can fill in some of the nuance themselves and the text isn't screaming at them all the time.

There is still not perfect, or as perfect as a writing style can be. It’s necessary to indicate the speaker every time. Besides, even an invisible word can start to stand out if it’s overused.

Example 3: Said + no dialogue tags

The third: this avoids dialogue tags when possible, but uses said when you need a dialogue tag to indicate the speaker.


Full disclosure, this is my preferred method. The reader has quite a lot they can fill in, but the gaps in information aren’t so large that they can’t be bridged. I, for one, like to assume that my readers are paying some attention and are intelligent. 

When you need a dialogue tag, you can use something fairly innocuous, like said. When there’s a back and forth between two people and it’s clear that it is the case, you can avoid dialogue tags for, say, a few lines.

But please, for the sake of whatever you believe in, do not have a back and forth conversation of three pages with no reminder of who’s saying what.

About that,

Example 4: There are no dialogue tags

This avoids dialogue tags altogether, instead using action and the rhythm of the conversation to indicate who’s speaking.



This one works too, but it requires quite a bit of work on the reader’s part in order to make sense of what is going on. The reader needs to be paying attention all the time - face it, no writer is that good at pacing. Besides, it’s fine for a ~100 word conversation, but with longer texts, it's easy to lose track of who's saying what. That ambiguity can change the interpretation of a scene completely.

That was my two cents on the topic. Do let me know what you think of it in the comments. 

You can follow me on Facebook here or on YouTube here.

See you next time!

Thursday 14 January 2021

Introduction to airships


 


This is a follow up from the article on types of flight. Here, we’ll be taking a further look at airships. 

I’m sure most of you know what an airship is already. You might be familiar with advertising blimps. You may have seen old photos. Perhaps, you’ve encountered the Hindenburg disaster in your physics lesson about electricity (that’s how I encountered it).

These are pretty interesting, actually, because they are almost ridiculously efficient in terms of energy requirement. That’s mainly why it looks like they will make a comeback. To be precise, it’s hybrid airships making a comeback, but they are a topic for another time. The main focus here is your traditional airship.

First things first, we’ll look at a diagram of an airship.



The big balloon on top (blue) is the envelope. That is a large balloon-like structure filled with a gas lighter than air. This is what keeps the airship airborne.

The gas inside the envelope is called the lifting gas. As it is lighter than air and because there is such a lot of it relative to the rest of the structure, it makes the overall structure less dense than air. This means it experiences an upthrust that is higher than its weight, making in float. This happens until the upthrust and weight balance. As air gets less dense with altitude, you will reach an altitude where the upthrust that the airship experiences is the same as its weight, and that will be where your airship floats.

The gas itself is usually Helium on modern airships. It is very light and extremely safe, but our supply of helium is limited. As a result, it’s expensive.

Hydrogen has been used historically. The problem with hydrogen is that it is flammable when mixed with air, so if the envelope leaks and something sets the mixture on fire, the whole airship can burn up. However, pure hydrogen is not flammable, so if you have a spark inside a container of pure hydrogen, nothing with happen. Preventing leaks completely on a moving ship, especially one with a mostly flexible, thin exterior can be very difficult if not impossible. That is why it’s not used in great quantities anymore.

Another option is hot air. You can use a burner to heat air inside the envelope. Air expands when heated, which reduces its density, essentially making it lighter than the air outside the envelope. Air is not a limited resource, and here, the lifting gas itself is not flammable. The problem here is that you will need a fuel supply. You can burn a fuel like kerosene, or you can use electric heaters inside the balloon.

With this method, the fuel weight can become significant. This can be true for batteries or solar panels as well. Besides, if you’re burning your fuel, you’ll be losing fuel weight throughout the journey, making the balloon lighter throughout the journey. You will have to compensate for that when flying. Besides, the fuel itself can be flammable, or in the case of batteries, explosive.

All said, however, it’s a happy medium, as long as all the risks are managed.

For a writing project, you can always make up a lifting gas (provided you’re writing fantasy), along with its properties, availability, price, and all that. Another possibility is using a vacuum. It will lift the airship alright, but the envelope will have to be very strong, and altitude control will be a bit of a problem. I will probably discuss this in detail later. (Look, I’m a flying obsessed mechatronics engineer, what else do you expect?)

Going back to the diagram, the small section below the envelope (green) is your gondola. That’s the section for passengers, crew, storage, and fuel. The gondola can be quite roomy for the number of passengers it can carry (due to the need to maintain a low weight). It also means the gondola needs some form of communication between its various areas. It’s probably a bit like being on a ship.

You may have noticed the engine (yellow) on the diagram. That is what provides power for forward motion. Usually, they would be fitted with a propeller, as few other types of propulsion make sense for the speed an airship flies at. The actual engine type can be pretty much anything you’d see on a propeller-driven plane. I’ll elaborate on this later. The engines are usually fitted on the sides of the gondola to allow differential thrust for steering.

The fuel for the engines can be pretty much anything. You could use an electric motor, powered by a battery or even solar power. You can use a conventional fuel like diesel or petrol or even biogas.  

The airship in the diagram has a vertical stabilizer and rudder, as well as horizontal control surfaces. These will be covered in more detail in later as I would go off on a 20,000 word tangent if I tried to cover it here.

The main disadvantages of an airship, compared to a conventional airplane, are,

  •      Lower speeds, as the envelope’s enormous cross-section produces a lot of drag (I’ll explain it in more detail, I promise. I am aware it’s a lot of things to explain later).
  •      Lower service ceiling in general. This is dependent on the difference in density between the lifting gas and the surrounding air, how much lifting gas there is, and the weight of the airship overall. As a result, it’s capped by the possible size of the envelope.
  •      Properties of the lifting gas can make it a fire hazard.
  •      Huge, slow moving target if anyone wants to take a pot shot at it. This can be especially dangerous if the lifting gas is flammable.
  •    Susceptible to weather, as it is large and slow moving.

Advantages are:

  •      Lower fuel requirement, as there is no energy needed to provide lift (under most circumstances).
  •     Can stay aloft forever (there will be some gas leakage, but well, for a very long time). Is great for sightseeing and observation as a result.
  •     Not so easily brought down by enemy fire, unless it catches fire. A bullet hole will produce a hole that is so small that it can take hours for the envelope to deflate completely, giving enough time for a safe landing.
  •      If used for passenger transport, the sheer amount of space on it compared to an airliner. You wouldn’t be cramped into one seat for eight hours, instead you can walk around, sleep, eat, and enjoy the view for two days. It’s up to you whether it’s actually an advantage, honestly.

That’s it for a brief introduction. I promise I will follow up on everything I promised to follow up, but that’s all for now. 


The painting in the video is available here

You can follow me on Facebook here or on YouTube here.


 See you next time!

Friday 8 January 2021

How to break a simple substitution cipher

 



G KWTB UNNH NUEWSO BNZB TE BOPB BON VNBBNMT PMN HGTBMGAWBNH GU P IPUUNM BJFGYPV ED

BON NUSVGTO VPUSWPSN TE BOPB GB YPU AN AMEQNU WTGUS TBPBGTBGYPV INBOEHT.

UNNHVNTT BE TPJ, FPMPSMPFOT BOPB PREGH P FPMBGYWVPM VNBBNM XGVV AN IEMN

YEIFVGYPBNH.

BON VEUSNM JEWM BNZB, BON IEMN EARGEWT BONTN FPBBNMUT XGVV ANYEIN. DEM

NZPIFVN, GD JEW XNMN BE PUPVJTN PU NUBGMN AEEQ, JEW XGVV TNN P RNMJ

BJFGYPV HGTBMGAWBGEU.


Look at the text above. You are given that it is in English, and that it is a simple substitution cipher. The text is in the description below.

Here, the focus is on breaking the cipher. Do take a look at the video above for an explanation. 

Breaking the cipher means we’re not starting with the key. i.e., we don’t know which letter maps to which.

There are several ways to start with this. One of the easiest is to start with statistics – just count the number of times a particular letter appears in a text.

You can find this information on Wikipedia – go to the article on letter frequency. It shows you the relative frequency of letters in English. There are two sets of statistics, for texts and for dictionaries. We need to look at texts, because we are trying to decipher a paragraph.

What we see right away is that the letter with the highest frequency is E, which accounts for 13% of texts. The next highest are A and T, which accounts for 8.2% and 9.1% of texts respectively.

In order to apply this information, let’s put our text into a frequency counter. You can find several of these online, or you can make your own. Or you can count it yourself, I won’t judge.

The frequencies I get are show on the screen:

A

8

N

44

B

38

O

12

C

0

P

28

D

3

Q

2

E

23

R

3

F

8

S

7

G

23

T

20

H

8

U

17

I

7

V

19

J

8

W

11

K

1

X

4

L

0

Y

8

M

20

Z

3

 

As you can see, N has the highest frequency. So a safe bet would be to assume that it is E.

G KWTB UeeH eUEWSO BeZB TE BOPB BOe VeBBeMT PMe HGTBMGAWBeH GU P IPUUeM BJFGYPV

ED BOe eUSVGTO VPUSWPSe TE BOPB GB YPU Ae AMEQeU WTGUS TBPBGTBGYPV IeBOEHT.

UeeHVeTT BE TPJ, FPMPSMPFOT BOPB PREGH P FPMBGYWVPM VeBBeM XGVV Ae IEMe

YEIFVGYPBeH.

BOe VEUSeM JEWM BeZB, BOe IEMe EARGEWT BOeTe FPBBeMUT XGVV AeYEIe. DEM

eZPIFVe, GD JEW XeMe BE PUPVJTe PU eUBGMe AEEQ, JEW XGVV Tee P ReMJ

BJFGYPV HGTBMGAWBGEU.

Yes, I did a case sensitive search and replace, which you should be able to do with any good word processor.

One thing you might notice immediately is the triplet BOe. It appears four times in the short text. It could be ‘THE’, In order to confirm that, we can look at our letter frequencies. As expected, B has a frequency of 38, and is the second most numerous letter. It would be safe to assume B = T, and therefore, that BOe = the. Let’s replace B with T and O with H.

G KWTt UeeH eUEWSh teZt TE thPt the VetteMT PMe HGTtMGAWteH GU P IPUUeM tJFGYPV

ED the eUSVGTh VPUSWPSe TE thPt Gt YPU Ae AMEQeU WTGUS TtPtGTtGYPV IethEHT.

UeeHVeTT tE TPJ, FPMPSMPFhT thPt PREGH P FPMtGYWVPM VetteM XGVV Ae IEMe

YEIFVGYPteH.

the VEUSeM JEWM teZt, the IEMe EARGEWT theTe FPtteMUT XGVV AeYEIe. DEM

eZPIFVe, GD JEW XeMe tE PUPVJTe PU eUtGMe AEEQ, JEW XGVV Tee P ReMJ

tJFGYPV HGTtMGAWtGEU.

 

Here, 'thPt' looks a lot like ‘that’. You might also notice that P appears by itself quite a lot. It has a frequency of 28, which is quite high. So, P = A looks like a good assumption.

G KWTt UeeH eUEWSh teZt TE that the VetteMT aMe HGTtMGAWteH GU a IaUUeM tJFGYaV

ED the eUSVGTh VaUSWaSe TE that Gt YaU Ae AMEQeU WTGUS TtatGTtGYaV IethEHT.

UeeHVeTT tE TaJ, FaMaSMaFhT that aREGH a FaMtGYWVaM VetteM XGVV Ae IEMe

YEIFVGYateH.

the VEUSeM JEWM teZt, the IEMe EARGEWT theTe FatteMUT XGVV AeYEIe. DEM

eZaIFVe, GD JEW XeMe tE aUaVJTe aU eUtGMe AEEQ, JEW XGVV Tee a ReMJ

tJFGYaV HGTtMGAWtGEU.

The only other standalone letter that appears frequently in English is I, and the only other standalone letter in the cipher is G. The frequency is 23, which is good enough to proceed. Another interesting observation is ‘tE’, and that can only be ‘to’. Frequency for ‘E’ is 23, which is good enough. Let’s replace G with I and E with O.

i KWTt UeeH eUoWSh teZt To that the VetteMT aMe HiTtMiAWteH iU a IaUUeM tJFiYaV

oD the eUSViTh VaUSWaSe To that it YaU Ae AMoQeU WTiUS TtatiTtiYaV IethoHT.

UeeHVeTT to TaJ, FaMaSMaFhT that aRoiH a FaMtiYWVaM VetteM XiVV Ae IoMe

YoIFViYateH.

the VoUSeM JoWM teZt, the IoMe oARioWT theTe FatteMUT XiVV AeYoIe. DoM

eZaIFVe, iD JoW XeMe to aUaVJTe aU eUtiMe AooQ, JoW XiVV Tee a ReMJ

tJFiYaV HiTtMiAWtioU

‘T’ looks pretty consistent with ‘S’, as you have ‘To’, ‘Tee’, a frequency of 20, and quite a few words that begin with it. Let’s go ahead and replace it. At this point, this is mostly an art. The result is:

i KWst UeeH eUoWSh teZt so that the VetteMs aMe HistMiAWteH iU a IaUUeM tJFiYaV

oD the eUSVish VaUSWaSe so that it YaU Ae AMoQeU WsiUS statistiYaV IethoHs.

UeeHVess to saJ, FaMaSMaFhs that aRoiH a FaMtiYWVaM VetteM XiVV Ae IoMe

YoIFViYateH.

the VoUSeM JoWM teZt, the IoMe oARioWs these FatteMUs XiVV AeYoIe. DoM

eZaIFVe, iD JoW XeMe to aUaVJse aU eUtiMe AooQ, JoW XiVV see a ReMJ
tJFiYaV HistMiAWtioU.

 

Let’s take another look at the frequencies table. N and R are two high frequency letters that aren’t accounted for yet. M has a frequency of 20, and ‘aMe’ looks like a tell. Let’s assume it’s R. If you are wrong, you can always go back.

i KWst UeeH eUoWSh teZt so that the Vetters are HistriAWteH iU a IaUUer tJFiYaV

oD the eUSVish VaUSWaSe so that it YaU Ae AroQeU WsiUS statistiYaV IethoHs.

UeeHVess to saJ, FaraSraFhs that aRoiH a FartiYWVar Vetter XiVV Ae Iore

YoIFViYateH.

the VoUSer JoWr teZt, the Iore oARioWs these FatterUs XiVV AeYoIe. Dor

eZaIFVe, iD JoW Xere to aUaVJse aU eUtire AooQ, JoW XiVV see a RerJ

tJFiYaV HistriAWtioU. 

From here on, you need to see if you can spot any probable words, and start replacing letters. ‘Vetters’ could be ‘letters’, ‘D’ could be ‘f’ (‘oD’ and ‘Dor’), ‘F’ looks like it could be ‘P’, and ‘U’ could be ‘N’. Let’s replace these, and see it if makes sense.

i KWst neeH enoWSh teZt so that the letters are HistriAWteH in a Ianner tJpiYal

of the enSlish lanSWaSe so that it Yan Ae AroQen WsinS statistiYal IethoHs.

neeHless to saJ, paraSraphs that aRoiH a partiYWlar letter Xill Ae Iore

YoIpliYateH.

the lonSer JoWr teZt, the Iore oARioWs these patterns Xill AeYoIe. for

eZaIple, if JoW Xere to analJse an entire AooQ, JoW Xill see a RerJ

tJpiYal HistriAWtion.

Now, ‘S’ can be replaced with ‘G’, ‘W’ with U, and ‘Y’ with C, ‘H’ with D, and ‘X’ with W. These were based on common words you can identify from the text.

i Kust need enough teZt so that the letters are distriAuted in a Ianner tJpical

of the english language so that it can Ae AroQen using statistical Iethods.

needless to saJ, paragraphs that aRoid a particular letter will Ae Iore

coIplicated.

the longer Jour teZt, the Iore oARious these patterns will AecoIe. for

eZaIple, if Jou were to analJse an entire AooQ, Jou will see a RerJ

tJpical distriAution. 

From observation, we can now get ‘Z’ = X, ‘A’ = B, ‘J’ = Y, ‘K’ = J, ‘I’  =M, ‘R’ = V. Do the replacements.

i just need enough text so that the letters are distributed in a manner typical

of the english language so that it can be broQen using statistical methods.

needless to say, paragraphs that avoid a particular letter will be more

complicated.

the longer your text, the more obvious these patterns will become. for

example, if you were to analyse an entire booQ, you will see a very

typical distribution. 

By now, it’s obvious that ‘Q’ = K. Just replace it, and you have your deciphered message.

i just need enough text so that the letters are distributed in a manner typical

of the english language so that it can be broken using statistical methods.

needless to say, paragraphs that avoid a particular letter will be more

complicated.

the longer your text, the more obvious these patterns will become. for

example, if you were to analyse an entire book, you will see a very

typical distribution.

 

This method works provided your message is long enough. If your message is not in English, you need the frequency tables for the language you’re working with.

If you know you’re working with a Caesar cipher, the first letter you definitively break will give you the key right away. Use that to find your shift, and just decode the whole message normally. If you can’t find anything definitive, you can just break the whole message normally anyway.

Go ahead and try this, and let me know how it goes.

You can follow me on Facebook here or on YouTube here.

 See you next time!




How to write a character who is smarter than you

We all have that one character (or few) who is significantly smarter than the writer. So, as a writer, how do you write such a character con...