Breaking simple ciphers

Posted by Jonas Elfström Wed, 16 Sep 2009 18:19:00 GMT

The last few days I've happened to stumble over a couple of ciphers and I just couldn't help myself from trying to break them.

The Lost Symbol

Dan Brown has a new book coming out and part of the promotion is this cipher text "AOFACFSOA FSZWBEIC EIOA ZOHSFWQWOA OQQSDW". The WQW, QQ and three of the words ending with an A made me believe we could be dealing with a substitution cipher and maybe even a Caesar cipher, the most simple of them all.

My usual tool of choice is Ruby and in this case the splendid Interactive Ruby Shell.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ irb
>> s="AOFACFSOA FSZWBEIC EIOA ZOHSFWQWOA OQQSDW"
=> "AOFACFSOA FSZWBEIC EIOA ZOHSFWQWOA OQQSDW"
>> def caesar(text,n)
>>   alphas=('A'..'Z').to_a*2
>>   text.tr('A-Z', alphas[n..n+26].join)
>> end
>> 1.upto(25) do |n| puts "%2d. %s" % [n, caesar(s,n)] end
 1. BPGBDGTPB GTAXCFJD FJPB APITGXRXPB PRRTEX
 2. CQHCEHUQC HUBYDGKE GKQC BQJUHYSYQC QSSUFY
 3. DRIDFIVRD IVCZEHLF HLRD CRKVIZTZRD RTTVGZ
 4. ESJEGJWSE JWDAFIMG IMSE DSLWJAUASE SUUWHA
 5. FTKFHKXTF KXEBGJNH JNTF ETMXKBVBTF TVVXIB
 6. GULGILYUG LYFCHKOI KOUG FUNYLCWCUG UWWYJC
 7. HVMHJMZVH MZGDILPJ LPVH GVOZMDXDVH VXXZKD
 8. IWNIKNAWI NAHEJMQK MQWI HWPANEYEWI WYYALE
 9. JXOJLOBXJ OBIFKNRL NRXJ IXQBOFZFXJ XZZBMF
10. KYPKMPCYK PCJGLOSM OSYK JYRCPGAGYK YAACNG
11. LZQLNQDZL QDKHMPTN PTZL KZSDQHBHZL ZBBDOH
12. MARMOREAM RELINQUO QUAM LATERICIAM ACCEPI
13. NBSNPSFBN SFMJORVP RVBN MBUFSJDJBN BDDFQJ
14. OCTOQTGCO TGNKPSWQ SWCO NCVGTKEKCO CEEGRK
15. PDUPRUHDP UHOLQTXR TXDP ODWHULFLDP DFFHSL
16. QEVQSVIEQ VIPMRUYS UYEQ PEXIVMGMEQ EGGITM
17. RFWRTWJFR WJQNSVZT VZFR QFYJWNHNFR FHHJUN
18. SGXSUXKGS XKROTWAU WAGS RGZKXOIOGS GIIKVO
19. THYTVYLHT YLSPUXBV XBHT SHALYPJPHT HJJLWP
20. UIZUWZMIU ZMTQVYCW YCIU TIBMZQKQIU IKKMXQ
21. VJAVXANJV ANURWZDX ZDJV UJCNARLRJV JLLNYR
22. WKBWYBOKW BOVSXAEY AEKW VKDOBSMSKW KMMOZS
23. XLCXZCPLX CPWTYBFZ BFLX WLEPCTNTLX LNNPAT
24. YMDYADQMY DQXUZCGA CGMY XMFQDUOUMY MOOQBU
25. ZNEZBERNZ ERYVADHB DHNZ YNGREVPVNZ NPPRCV

Take a closer look at row 12.

MARMOREAM RELINQUO QUAM LATERICIAM ACCEPI

I found Rome a city of bricks and left it a city of marble. - Google tells me it's Augustus.

The code is not the most clear I've written but if you read Ruby in your sleep you can skip this part.

('A'..'Z') is a range in Ruby. Another, maybe more obvious, example of a range is (0..7).

.to_a could be read as to_array and unsurprisingly it converts a range to an array. (0..7).to_a will create [0, 1, 2, 3, 4, 5, 6, 7]

The operator * for arrays appends n copies of the array. Thus [0,1,2]*2 will create [0,1,2,0,1,2].

String#tr works the same way as the Unix command tr, it translates the characters in the string according to the from and to parameters.

At last .join converts the array to a string.

The recruiting agency

A government agency responsible for signals intelligence is hiring. Among the qualifications they are looking for is the ability to break a certain cipher. I will not publish their cipher here but instead one of my own, constructed in the same way as theirs.

"VGhpcyBpcyBleGNsdXNpdmUgZm9yIHlvdSwgb3I/IGMNR0d
LCkZPXgpTRV8K\nTENEQ1lCCkhfXgpoT1NFRElPCkNZCksKSE9eXk
9YCklYU1peRU1YS1pCT1gK\nXkJLRAp5SUJET0NPWAQ="
At first glance it looked like Base64 and the ending "=" made it even more likely.
1
2
3
4
5
$ irb
>> require 'base64'
>> cipher = "VGhpcyBpcyBleGNsdXNpdmUgZm9yIHlvdSwgb3I/IGMNR0dLCkZPXgpTRV8K\nTENEQ1lCCkhfXgpoT1NFRElPCkNZCksKSE9eXk9YCklYU1peRU1YS1pCT1gK\nXkJLRAp5SUJET0NPWAQ="
>> decoded=Base64.decode64(cipher)
=> "This is exclusive for you, or? c\rGGK\nFO^\nSE\nLCDCYB\nH^\nhOSEDIO\nCY\nK\nHO^^OX\nIXSZ^EMXKZBOX\n^BKD\nyIBDOCOX\004"

So it's Base64 but to no surprise it didn't end there. The "This is exclusive for you, or?" hinted at XOR so I tried XORing the text with 0-255.

1
2
3
4
5
6
7
8
>> code=decoded[31..decoded.length].split(//)
>> File.open('xor.txt','w') { |file|
?>   0.upto(255) {|n|
?>     file.write(n.to_s + " ")
>>     code.each {|c| file.write( (c[0]^n).chr ) }
>>     file.write("\n\n")
>>   }
>> }

A quick look in the file told me that XORing with 42 was the solution.

Now you know how to break two of the most simple cipher methods. Use the knowledge wisely. :)

Posted in Ruby, Cryptography | 7 comments

Comments

    1. Avatar
      Romy Wed, 23 Sep 2009 22:51:12 GMT

      This post was teh bomb.

      I <3 ruby. I <3 ciphers. I <3 kanye stepping on kittens.

      Keep it coming…

    2. Avatar
      Jonas Elfström Fri, 06 Nov 2009 23:02:21 GMT

      Thanks!

    3. Avatar
      Mihai Thu, 07 Apr 2011 01:05:38 GMT

      Hello,

      I am working on a statistical analysis of a simple Vigenère cipher for school, let’s say the Caesar cipher because it’s a subclass of Vigenère ciphers.

      My task is to prove that letter frequencies in a message are equal to letter frequencies in a criptogram (just on different letters). histogram(message) == histogram(caesar(message));

      Within a 27 character alphabet ([A-Z\s]) all of this is easy enough. But i wanted to take this assignment up a notch and not limit the input alphabet.

      So i used base64 encode to bring any input string to a common alphabet of 64 characters.

      I can then apply the cipher on the encoded string using a 64 character alphabet ([A-Za-z\+\/]).

      Base64 reads 6bits from the input and writes one character. When doing this to 8bit letters the results become 2 bit shifted on every letter.

      Now i’m racking my brain to figure out how to analyse the text statistically because. histogram(base64decode(caesar(base64encode(message)))) !== histogram(message);

      Do you have any ideeas?

    4. Avatar
      Jonas Elfström Thu, 07 Apr 2011 22:21:49 GMT

      I think you already answered the question. The problem is that you are encrypting the message in 6 bits chunks but the actual data is 8 bits per character (or is it?). I’m sure there’s a way to analyze this but at a glance it seems a lot harder than to compare histograms for a plaintext and a Caesar ciphertext.

      I don’t know if it’s of use to you but histogram(caesar(base64encode(message))) is the same as for histogram(base64encode(message))

      Code example here: https://gist.github.com/908911

    5. Avatar
      Mihai Fri, 08 Apr 2011 01:01:04 GMT

      If our priority is not to compare apples to oranges we have to do what you propose.

      But in order for an external entity (code breaker) to do it he needs a corpus of data similar to the one being transmitted with which to compare histograms: base64encode(criptogram) compared to base64encode(corpus).

      If we assume we need to compare the criptogram statistics to a given corpus and not the initial message (to which the code breaker could not have access to) some other method must be devised.

      The main issue is that the “simple mono-alphabetic cipher” we apply on the base64encoded level generates a “poli-alphabetic cipher”.

      Any 8 bit entity is partially replaced during the enciphering stage, i.e. only 4-6 bits of it’s data is replaced. These replacements may generate 8 bit entities that were not present in the original message.

      Hence the histogram gets stretched over some new symbols as well as redistributed over the old symbols.

      My findings were that the histogram maximum of 18% frequency of spaces in the message dropped to a maximum of 3% frequency of some other symbol in the criptogram. The histogram is heavily redistributed.

    6. Avatar
      Jonas Elfström Sun, 10 Apr 2011 21:25:40 GMT

      How about grouping the characters in three and do a frequency analysis on that instead? The reasoning is that 6+6+6+6=24 and 24/8=3. Most probably only useful if it’s Caesar and not a longer key. It also seems likely that you would need quite a long chiphertext before this kind of frequency analysis would give anything useful.

      I tested the hypothesis at https://gist.github.com/912723 and the histograms for the cipher and the plaintext are the same.

      For “A Scandal in Bohemia” by Sir Arthur Conan Doyle (19119 characters) the number of occurrences for the top five are:
      ” th” => 80
      “he ” => 63
      “the” => 54
      “and” => 46
      ” of” => 38

    7. Avatar
      Jonas Elfström Mon, 11 Apr 2011 22:06:18 GMT

      I just ran the same frequency analysis on the longer “The Red-headed League” (76713 characters) and guess what came out on top?

      ” th” => 363
      “he ” => 307
      “the” => 296
      “and” => 163

      That’s exactly the same order as for “A Scandal in Bohemia” but then something strange happened

      “nd ” => 159
      ” an” => 151
      “to ” => 140
      “ed ” => 139
      ” to” => 139
      “ing” => 134

      and not until then we find

      ” of” => 120

      Until more texts has been analyzed like this I can’t say if it’s a problem or an anomaly.

Comments are closed