Vigenère cipher/Cryptanalysis
You are encouraged to solve this task according to the task description, using any language you may know.
Given some text you suspect has been encrypted with a Vigenère cipher, extract the key and plaintext. There are several methods for doing this. See the Wikipedia entry for more information. Use the following encrypted text:
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA FWAML ZZRXJ EKAHV FASMU LVVUT TGK
Letter frequencies for English can be found here.
Specifics for this task:
- Take only the ciphertext as input. You can assume it's all capitalized and has no punctuation, but it might have whitespace.
- Assume the plaintext is written in English.
- Find and output the key.
- Use that key to decrypt and output the original plaintext. Maintaining the whitespace from the ciphertext is optional.
- The algorithm doesn't have to be perfect (which may not be possible) but it should work when given enough ciphertext. The example above is fairly long, and should be plenty for any algorithm.
11l
-V ascii_uppercase = Array(‘A’..‘Z’)
F vigenere_decrypt(target_freqs, input)
V nchars = :ascii_uppercase.len
V ordA = ‘A’.code
V sorted_targets = sorted(target_freqs)
F frequency(input)
V result = :ascii_uppercase.map(c -> (c, 0.0))
L(c) input
result[c - @ordA][1]++
R result
F correlation(input)
V result = 0.0
V freq = sorted(@frequency(input), key' a -> a[1])
L(f) freq
result += f[1] * @sorted_targets[L.index]
R result
V cleaned = input.uppercase().filter(c -> c.is_uppercase()).map(c -> c.code)
V best_len = 0
V best_corr = -100.0
L(i) 2 .< cleaned.len I/ 20
V pieces = [[Int]()] * i
L(c) cleaned
pieces[L.index % i].append(c)
V corr = -0.5 * i + sum(pieces.map(p -> @correlation(p)))
I corr > best_corr
best_len = i
best_corr = corr
I best_len == 0
R (‘Text is too short to analyze’, ‘’)
V pieces = [[Int]()] * best_len
L(c) cleaned
pieces[L.index % best_len].append(c)
V freqs = pieces.map(p -> @frequency(p))
V key = ‘’
L(fr_) freqs
V fr = sorted(fr_, key' a -> a[1], reverse' 1B)
V m = 0
V max_corr = 0.0
L(j) 0 .< nchars
V corr = 0.0
V c = ordA + j
L(frc) fr
V d = (frc[0].code - c + nchars) % nchars
corr += frc[1] * target_freqs[d]
I corr > max_corr
m = j
max_corr = corr
key ‘’= Char(code' m + ordA)
V r = (enumerate(cleaned).map((i, c) -> Char(code' (c - @key[i % @best_len].code + @nchars) % @nchars + @ordA)))
R (key, r.join(‘’))
V encoded = ‘
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK’
V english_frequences = [
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074]
V (key, decoded) = vigenere_decrypt(english_frequences, encoded)
print(‘Key: ’key)
print("\nText: "decoded)
- Output:
Key: THECHESHIRECAT Text: THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVE...
AArch64 Assembly
/* ARM assembly AARCH64 Raspberry PI 3B */
/* program vigneredecrypt64.s */
/* REMARK 1 : to avoid float use, The calculations of the evaluations
are made in integer numbers */
/* REMARK 2 : occurences characters counter are limited to a byte size */
/* REMARK 3 : program inspired to C rosetta program */
/*******************************************/
/* Constantes */
/*******************************************/
/* for this file see task include a file in language AArch64 assembly*/
.include "../includeConstantesARM64.inc"
.equ LENALPHA, 26
.equ BUFFERSIZE, 2000
.equ KEYSIZE, 50
.equ OCCURSMAXI, 255
/*******************************************/
/* Macros */
/*******************************************/
//.include "../../ficmacros64.inc" // for developer debugging
/*******************************************/
/* Initialized data */
/*******************************************/
.data
szMessDebutPgm: .asciz "Program 64 bits start. \n"
szCarriageReturn: .asciz "\n"
szMessFinOK: .asciz "Program normal end. \n"
szMessError: .asciz "\nError Buffer too small!!!\n"
szMessPossible: .asciz "Possible key :"
szMessDecrip: .asciz "\nDecrypted :\n"
szMessCharinv: .asciz "Error. Character invalid!."
szMessErrOcc: .asciz "Maxi occurennces characters!."
szMessBest: .asciz " <-------Best key"
szString1: .ascii "MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH"
.ascii "VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD"
.ascii "ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS"
.ascii "FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG"
.ascii "ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ"
.ascii "ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS"
.ascii "JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT"
.ascii "LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST"
.ascii "MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH"
.ascii "QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV"
.ascii "RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW"
.ascii "TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO"
.ascii "SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR"
.ascii "ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX"
.ascii "BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB"
.ascii "BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA"
.asciz "FWAML ZZRXJ EKAHV FASMU LVVUT TGK"
.equ LGSTRING1, . - szString1
.align 4
tabFreq: .quad 8167, 1492, 2782, 4253, 12702, 2228, 2015
.quad 6094, 6966, 153, 772, 4025, 2406, 6749
.quad 7507, 1929, 95, 5987, 6327, 9056, 2758
.quad 978, 2360, 150, 1974, 74
.equ NBFREQ, . - tabFreq
/*******************************************/
/* UnInitialized data */
/*******************************************/
.bss
sBuffex1: .skip LGSTRING1
sBuffex2: .skip BUFFERSIZE
sKey: .skip KEYSIZE
sBestKey: .skip KEYSIZE
/*******************************************/
/* code section */
/*******************************************/
.text
.global main
main:
ldr x0,qAdrszMessDebutPgm
bl affichageMess
ldr x0,qAdrszString1 // string address
ldr x1,qAdrsBuffex1 // buffer
bl convertText // string char conversion
mov x5,x0 // result length
ldr x0,qAdrsBuffex1 // buffer
mov x4,#1 //.quaderval
mov x6,#-1 // evaluation high value
1:
ldr x0,qAdrsBuffex1 // converted buffer
mov x1,x5 // length
mov x2,x4 //.quaderval
ldr x3,qAdrsBuffex2 // key
bl searchKey
mov x7,x0 // save return result
ldr x0,qAdrszMessPossible
bl affichageMess
ldr x0,qAdrsBuffex2 // display decrypted buffer
bl affichageMess
cmp x7,x6 // best evaluation ?
bhi 3f
mov x6,x7 // yes -> save new value
ldr x0,qAdrszMessBest // message display
bl affichageMess
mov x8,#0
ldr x9,qAdrsBuffex2
ldr x10,qAdrsBestKey
2: // copy best key loop
ldrb w12,[x9,x8]
strb w12,[x10,x8]
cmp x12,#0
beq 3f
add x8,x8,#1
b 2b
3:
ldr x0,qAdrszCarriageReturn
bl affichageMess
add x4,x4,#1
cmp x4,#30 //.quaderval maxi ?
blt 1b // and loop
// decrypt with best key
ldr x0,qAdrszString1
ldr x1,qAdrsBestKey
ldr x2,qAdrsBuffex2
bl decrypt
ldr x0,qAdrszMessDecrip
bl affichageMess
ldr x0,qAdrsBuffex2 // display decrypted buffer
bl affichageMess
ldr x0,qAdrszCarriageReturn
bl affichageMess
ldr x0,qAdrszMessFinOK
bl affichageMess
b 100f
99:
ldr x0,qAdrszMessError // error
bl affichageMess
mov x0, #1
100: // standard end of the program
mov x0, #0 // return code
mov x8,EXIT
svc 0 // perform system call
qAdrszMessDecrip: .quad szMessDecrip
qAdrszMessPossible: .quad szMessPossible
qAdrszMessBest: .quad szMessBest
qAdrszString1: .quad szString1
qAdrsBuffex1: .quad sBuffex1
qAdrsBuffex2: .quad sBuffex2
qAdrszMessDebutPgm: .quad szMessDebutPgm
qAdrszMessFinOK: .quad szMessFinOK
qAdrszCarriageReturn: .quad szCarriageReturn
qAdrszMessError: .quad szMessError
qAdrsBestKey: .quad sBestKey
/******************************************************************/
/* convert text in position and supp char non alpha */
/******************************************************************/
/* x0 contains the address of the string1 */
/* x1 contains key address of buffer
/* x0 return buffer lenght */
convertText:
stp x3,lr,[sp,-16]! // save registers
stp x4,x5,[sp,-16]! // save registers
stp x6,x7,[sp,-16]! // save registers
mov x3,#0 // counter byte string 1
mov x5,#0 // counter byte buffer
1:
ldrb w2,[x0,x3] // load char
cmp x2,#0 // final zero ?
beq 10f
cmp x2,#65 // < A ?
cinc x3,x3,lt
blt 1b
cmp x2,#90 // > Z
cinc x3,x3,gt // no minuscul
bgt 1b
sub x2,x2,#'A' // compute rank
cmp x2,#26
ble 2f
ldr x0,qAdrszMessCharinv
bl affichageMess
mov x0,#-1
b 100f
2:
strb w2,[x1,x5] //
add x5,x5,#1
add x3,x3,#1
b 1b
10:
strb w2,[x1,x5] // final zero
mov x0,x5
100:
ldp x6,x7,[sp],16 // restaur registers
ldp x4,x5,[sp],16 // restaur registers
ldp x3,lr,[sp],16 // restaur registers
ret
qAdrszMessCharinv: .quad szMessCharinv
/******************************************************************/
/* decrypt strings */
/******************************************************************/
/* x0 contains the address of the converted string1 */
/* x1 contains converted string1 length */
/* x2 contains.quaderval */
/* x3 contains address result buffer */
searchKey:
stp x2,lr,[sp,-16]! // save registers
stp x3,x4,[sp,-16]! // save registers
stp x5,x6,[sp,-16]! // save registers
stp x7,x8,[sp,-16]! // save registers
stp x9,x10,[sp,-16]! // save registers
stp x11,x12,[sp,-16]! // save registers
sub sp,sp,#80 // area reserve on stack ( 26 * 2)
mov x7,sp // save stack address occurences counter
add x9,x7,#32 // best occurences counter
mov x4,#0
mov x5,#0
1: // init area best occurences counter
strb w5,[x9,x4]
add x4,x4,#1
cmp x4,#LENALPHA
ble 1b
mov x6,#0 // j
2:
mov x4,#0
mov x5,#0
3: // init area occurences counter
strb w5,[x7,x4]
add x4,x4,#1
cmp x4,#LENALPHA
ble 3b
mov x4,x6 // indice
4:
ldrb w5,[x0,x4] // load byte
ldrb w8,[x7,x5] // load one occurence counter
add x8,x8,#1 // compute occurence char in.quadervall
cmp x8,#255 // byte maxi ?
ble 41f
ldr x0,qAdrszMessErrOcc
bl affichageMess
mov x0,#-1
b 100f
41:
strb w8,[x7,x5] // store new occurence
add x4,x4,x2 // add.quaderval
cmp x4,x1 // compare length string
blt 4b
mov x8,x0 // save register
mov x9,x1 // save register
mov x0,x7 // occurences area address on stack
ldr x1,qAdrtabFreq // frequence area
bl recherche
mov x5,x0 // best rotation for this.quaderval
mov x0,x8
mov x1,x9
add x8,x5,#'A' // key letter
strb w8,[x3,x6] // store in key result
add x9,x7,#32 //
mov x4,#0
5:
add x10,x4,x5 // add rotation to indice
sub x11,x10,#LENALPHA
cmp x10,#LENALPHA
csel x10,x11,x10,ge
ldrb w10,[x7,x10] // load result
ldrb w11,[x9,x4]
add x11,x11,x10 // add to general counter
strb w11,[x9,x4] // store
add x4,x4,#1
cmp x4,#LENALPHA
blt 5b // and loop
add x6,x6,#1 // increment indice
cmp x6,x2 //.quaderval ?
blt 2b // and loop
mov x11,#0 // sum
mov x4,#0 // indice
6: // loop compute sum
ldrb w5,[x9,x4]
add x11,x11,x5
add x4,x4,#1
cmp x4,#LENALPHA
blt 6b
mov x4,#0
ldr x8,qAdrtabFreq
mov x0,#0 // return evaluation value
7:
ldrb w5,[x9,x4] // load occurence
ldr x6,iMulti // factor to avoid float use
mul x5,x6,x5
udiv x5,x5,x11 // divide by sum
ldr x1,[x8,x4,lsl #3] // load frequence
sub x5,x5,x1
mov x10,x5
mul x10,x5,x10 // square
udiv x10,x10,x1 // divide by freq
add x0,x0,x10 // add to final result
add x4,x4,#1
cmp x4,#LENALPHA
blt 7b
mov x4,#0 // key final zero
strb w4,[x3,x2]
add sp,sp,#80 // free areas on stack
100:
ldp x11,x12,[sp],16 // restaur registers
ldp x9,x10,[sp],16 // restaur registers
ldp x7,x8,[sp],16 // restaur registers
ldp x5,x6,[sp],16 // restaur registers
ldp x3,x4,[sp],16 // restaur registers
ldp x2,lr,[sp],16 // restaur registers
ret
qAdrtabFreq: .quad tabFreq
qAdrszMessErrOcc: .quad szMessErrOcc
/******************************************************************/
/* search best offset */
/******************************************************************/
/* x0 contains address array counter occurences */
/* x1 contains address array frequence */
/* x0 return result */
recherche:
stp x2,lr,[sp,-16]! // save registers
stp x3,x4,[sp,-16]! // save registers
stp x5,x6,[sp,-16]! // save registers
stp x7,x8,[sp,-16]! // save registers
stp x9,x10,[sp,-16]! // save registers
stp x11,x12,[sp,-16]! // save registers
mov x12,#-1 // high value rotation
mov x3,#0
mov x4,#0
mov x8,#0 // sum
1: // loop compute sum
ldrb w2,[x0,x4]
add x8,x8,x2
add x4,x4,#1
cmp x4,#LENALPHA
blt 1b
mov x6,#0 // rotate
2:
mov x5,#0
mov x4,#0 // indice
3:
add x7,x4,x6
sub x9,x7,#LENALPHA
cmp x7,#LENALPHA
csel x7,x9,x7,ge
// subge x7,#LENALPHA
ldrb w9,[x0,x7]
ldr x10,iMulti // factor to avoid float use
mul x9,x10,x9
udiv x9,x9,x8 // divide by sum
ldr x10,[x1,x4,lsl #3] // load frequency
sub x9,x9,x10
mov x11,x9
mul x9,x11,x9 // square
udiv x9,x9,x10 // frequency divide
add x5,x5,x9 // add to final result
add x4,x4,#1
cmp x4,#LENALPHA
blt 3b
cmp x5,x12 // best evalation ?
bhs 4f
mov x12,x5
mov x3,x6 // save best rotate
4:
add x6,x6,#1
cmp x6,#LENALPHA
blt 2b
mov x0,x3 // return result
100:
ldp x11,x12,[sp],16 // restaur registers
ldp x9,x10,[sp],16 // restaur registers
ldp x7,x8,[sp],16 // restaur registers
ldp x5,x6,[sp],16 // restaur registers
ldp x3,x4,[sp],16 // restaur registers
ldp x2,lr,[sp],16 // restaur registers
ret
iMulti: .quad 100000
/******************************************************************/
/* decrypt strings */
/******************************************************************/
/* x0 contains the address of the encrypted string1 */
/* x1 contains the key */
/* x2 contains the address of the decrypted buffer */
decrypt:
stp x3,lr,[sp,-16]! // save registers
stp x4,x5,[sp,-16]! // save registers
stp x6,x7,[sp,-16]! // save registers
stp x8,x9,[sp,-16]! // save registers
mov x3,#0 // counter byte string 1
mov x5,#0 // counter byte buffer
1:
mov x4,#0 // counter byte key
2:
ldrb w6,[x1,x4] // load byte key
cmp w6,#0 // end key
beq 1b
sub x6,x6,#'A'
add x4,x4,#1
3:
ldrb w7,[x0,x3] // load byte string 1
cmp x7,#0 // zero final ?
bne 4f
strb w7,[x2,x5]
mov x0,x5
b 100f
4:
cmp x7,#65 // < A ?
cinc x3,x3,lt
blt 3b
cmp x7,#90 // > Z
cinc x3,x3,gt // no minuscul
bgt 3b
sub x7,x7,x6 // add key
add x8,x7,26
cmp x7,#65 // < A
csel x7,x8,x7,lt
strb w7,[x2,x5]
add x5,x5,#1
add x3,x3,#1 // other byte of string
b 2b // other byte of key
100:
ldp x8,x9,[sp],16 // restaur registers
ldp x6,x7,[sp],16 // restaur registers
ldp x4,x5,[sp],16 // restaur registers
ldp x3,lr,[sp],16 // restaur registers
ret
/***************************************************/
/* ROUTINES INCLUDE */
/***************************************************/
/* for this file see task include a file in language AArch64 assembly*/
.include "../includeARM64.inc"
- Output:
Program 64 bits start. Possible key :E <-------Best key Possible key :EC <-------Best key Possible key :TEE Possible key :THEC <-------Best key Possible key :EEEPU Possible key :TCECEC Possible key :THECSAS <-------Best key Possible key :TJQGAHET Possible key :VEIZSEGNT Possible key :ECEGAWQTDS Possible key :TNLUSRXPTAJ Possible key :XLECTHQGTHEC Possible key :LJJTDGFNOTENR Possible key :THECHESHIRECAT <-------Best key Possible key :JNTOOEEXFTGQTNH Possible key :TJTSAEETEXHPXHNE Possible key :AZRAXUHEJLREEXIEE Possible key :VNIZQPALEPTSXSEXUC Possible key :FUCAITCSLVTEZDUDEHS Possible key :EQXGAHWTTQECEWUGXHPI Possible key :HVRCSAFTHEBDLSTAERSES Possible key :TVIJTCIGKAQPELECRXPTNC Possible key :KKEQXGPWTCQEELIEHXUWASV Possible key :ELAIXHQTTIEDXJETTNTGAEPC Possible key :OTJUUEGERDNQTUQEAGWUTIEOA Possible key :IGITEGECAGAVUNLJAHASAVTETW Possible key :TEEFSXHXAPXSNMEXSEVNHDSHWBD Possible key :THECHESCIRECATTHECHESHIRECAT <-------Best key Possible key :IPSVPRZGTSHNJCAESXUEHEBEPEXAC Decrypted : THISWASYHEPOEMTHATALICEREADJABBERWOHKYTWASBRILLIGANDTHESLITHYTOAESDIDGYREANDGIMBLEINTHEWABEFLLMIMSYWERETHEBOROGOVESANDTMEMOMERATHSOUTGRABEBEWARETHEOABBERWOCKMYSONTHEJAWSTHATBIYETHECLAWSTHATCATCHBEWARETHEOUBJUBBIRDANDSHUNTHEFRUMIOUSGANDERSNATCHHETOOKHISVORPALSBORDINHANDLONGTIMETHEMANXOMEKOEHESOUGHTSORESTEDHEBYTHETURTUMTREEANDSTOODAWHILEINTHOULHTANDASINUFFISHTHOUGHTHESTOTDTHEJABBERWOCKWITHEYESOFFLARECAMEWHIFFLINGTHROUGHTHETULLEYWOODANDBURBLEDASITCAMEONEYWOONETWOANDTHROUGHANDTHROUGMTHEVORPALBLADEWENTSNICKERSNFCKHELEFTITDEADANDWITHITSHEAIHEWENTGALUMPHINGBACKANDHASTYHOUSLAINTHEJABBERWOCKCOMETORYARMSMYBEAMISHBOYOFRABJOUSDFYCALLOOHCALLAYHECHORTLEDINHNSJOYTWASBRILLIGANDTHESLITHYYOVESDIDGYREANDGIMBLEINTHEWAGEALLMIMSYWERETHEBOROGOVESANITHEMOMERATHSOUTGRABEITSEEMSAERYPRETTYSHESAIDWHENSHEHADFNNISHEDITBUTITSRATHERHARDTOUSDERSTAND Program normal end.
Ada
The program is not fully auto, but makes a small number of suggestions for the right key and plaintext.
with Ada.Text_IO;
procedure Vignere_Cryptanalysis is
subtype Letter is Character range 'A' .. 'Z';
function "+"(X, Y: Letter) return Letter is
begin
return Character'Val( ( (Character'Pos(X)-Character'Pos('A'))
+ (Character'Pos(Y)-Character'Pos('A')) ) mod 26
+ Character'Pos('A'));
end;
function "-"(X, Y: Letter) return Letter is
begin
return Character'Val( ( (Character'Pos(X)-Character'Pos('A'))
- (Character'Pos(Y)-Character'Pos('A')) ) mod 26
+ Character'Pos('A'));
end;
type Frequency_Array is array (Letter) of Float;
English: Frequency_Array :=
( 0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074 );
function Get_Frequency(S: String) return Frequency_Array is
Result: Frequency_Array := (others => 0.0);
Offset: Float := 1.0/Float(S'Length);
begin
for I in S'Range loop
if S(I) in Letter then
Result(S(I)) := Result(S(I)) + Offset;
end if;
end loop;
return Result;
end Get_Frequency;
function Remove_Whitespace(S: String) return String is
begin
if S="" then
return "";
elsif S(S'First) in Letter then
return S(S'First) & Remove_Whitespace(S(S'First+1 .. S'Last));
else
return Remove_Whitespace(S(S'First+1 .. S'Last));
end if;
end Remove_Whitespace;
function Distance(A, B: Frequency_Array;
Offset: Character := 'A') return Float is
Result: Float := 0.0;
Diff: Float;
begin
for C in A'Range loop
Diff := A(C+Offset) - B(C);
Result := Result + (Diff * Diff);
end loop;
return Result;
end Distance;
function Find_Key(Cryptogram: String; Key_Length: Positive) return String is
function Find_Caesar_Key(S: String) return Letter is
Frequency: Frequency_Array := Get_Frequency(S);
Candidate: Letter := 'A'; -- a fake candidate
Candidate_Dist : Float := Distance(Frequency, English, 'A');
New_Dist: Float;
begin
for L in Letter range 'B' .. 'Z' loop
New_Dist := Distance(Frequency, English, L);
if New_Dist <= Candidate_Dist then
Candidate_Dist := New_Dist;
Candidate := L;
end if;
end loop;
return Candidate;
end Find_Caesar_Key;
function Get_Slide(S: String; Step: Positive) return String is
begin
if S'Length= 0 then
return "";
else
return S(S'First) & Get_Slide(S(S'First+Step .. S'Last), Step);
end if;
end Get_Slide;
Key: String(1 .. Key_Length);
S: String renames Cryptogram;
begin
for I in Key'Range loop
Key(I) := Find_Caesar_Key(Get_Slide(S(S'First+I-1 .. S'Last),
Key_Length));
end loop;
return Key;
end Find_Key;
function Key_Char(Key: String; Index: Positive) return Letter is
begin
if Index > Key'Last then
return Key_Char(Key, Index-Key'Last);
else
return Key(Index);
end if;
end Key_Char;
Ciphertext: String := Remove_Whitespace(
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH" &
"VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD" &
"ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS" &
"FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG" &
"ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ" &
"ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS" &
"JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT" &
"LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST" &
"MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH" &
"QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV" &
"RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW" &
"TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO" &
"SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR" &
"ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX" &
"BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB" &
"BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA" &
"FWAML ZZRXJ EKAHV FASMU LVVUT TGK");
Best_Plain: String := Ciphertext;
Best_Dist: Float := Distance(English, Get_Frequency(Best_Plain));
Best_Key: String := Ciphertext;
Best_Key_L: Natural := 0;
begin -- Vignere_Cryptanalysis
for I in 1 .. Ciphertext'Length/10 loop
declare
Key: String(1 .. I) := Find_Key(Ciphertext, I);
Plaintext: String(Ciphertext'Range);
begin
for I in Ciphertext'Range loop
Plaintext(I) := Ciphertext(I) - Key_Char(Key, I);
end loop;
if Distance(English, Get_Frequency(Plaintext)) < Best_Dist then
Best_Plain := Plaintext;
Best_Dist := Distance(English, Get_Frequency(Plaintext));
Best_Key(1 .. I) := Key;
Best_Key_L := I;
if Best_dist < 0.01 then
declare
use Ada.Text_IO;
begin
Put_Line("Key =" & Best_Key(1 .. Best_Key_L));
Put_Line("Distance = " & Float'Image(Best_Dist));
New_Line;
Put_Line("Plaintext =");
Put_Line(Best_Plain);
New_Line; New_Line;
end;
end if;
end if;
end;
end loop;
end Vignere_Cryptanalysis;
ARM Assembly
/* ARM assembly Raspberry PI */
/* program vigneredecrypt.s */
/* REMARK 1 : this program use routines in a include file
see task Include a file language arm assembly
for the routine affichageMess conversion10
see at end of this program the instruction include */
/* REMARK 2 : to avoid float use, The calculations of the evaluations
are made in integer numbers */
/* REMARK 3 : occurences characters counter are limited to a byte size */
/* REMARK 4 : program inspired to C rosetta program */
/*******************************************/
/* Constantes */
/*******************************************/
.include "../constantes.inc"
.equ LENALPHA, 26
.equ BUFFERSIZE, 2000
.equ KEYSIZE, 50
.equ OCCURSMAXI, 255
/*******************************************/
/* Macros */
/*******************************************/
//.include "../../ficmacros32.inc" @ for developer debugging
/*******************************************/
/* Initialized data */
/*******************************************/
.data
szMessDebutPgm: .asciz "Program 32 bits start. \n"
szCarriageReturn: .asciz "\n"
szMessFinOK: .asciz "Program normal end. \n"
szMessError: .asciz "\nError Buffer too small!!!\n"
szMessPossible: .asciz "Possible key :"
szMessDecrip: .asciz "\nDecrypted :\n"
szMessCharinv: .asciz "Error. Character invalid!."
szMessErrOcc: .asciz "Maxi occurennces characters!."
szMessBest: .asciz " <-------Best key"
szString1: .ascii "MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH"
.ascii "VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD"
.ascii "ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS"
.ascii "FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG"
.ascii "ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ"
.ascii "ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS"
.ascii "JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT"
.ascii "LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST"
.ascii "MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH"
.ascii "QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV"
.ascii "RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW"
.ascii "TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO"
.ascii "SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR"
.ascii "ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX"
.ascii "BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB"
.ascii "BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA"
.asciz "FWAML ZZRXJ EKAHV FASMU LVVUT TGK"
.equ LGSTRING1, . - szString1
.align 4
tabFreq: .int 8167, 1492, 2782, 4253, 12702, 2228, 2015
.int 6094, 6966, 153, 772, 4025, 2406, 6749
.int 7507, 1929, 95, 5987, 6327, 9056, 2758
.int 978, 2360, 150, 1974, 74
.equ NBFREQ, . - tabFreq
/*******************************************/
/* UnInitialized data */
/*******************************************/
.bss
sBuffer1: .skip LGSTRING1
sBuffer2: .skip BUFFERSIZE
sKey: .skip KEYSIZE
sBestKey: .skip KEYSIZE
/*******************************************/
/* code section */
/*******************************************/
.text
.global main
main:
ldr r0,iAdrszMessDebutPgm
bl affichageMess
ldr r0,iAdrszString1 @ string address
ldr r1,iAdrsBuffer1 @ buffer
bl convertText @ string char conversion
mov r5,r0 @ result length
ldr r0,iAdrsBuffer1 @ buffer
mov r4,#1 @ interval
mov r6,#-1 @ evaluation high value
1:
ldr r0,iAdrsBuffer1 @ converted buffer
mov r1,r5 @ length
mov r2,r4 @ interval
ldr r3,iAdrsBuffer2 @ key
bl searchKey
mov r7,r0 @ save return result
ldr r0,iAdrszMessPossible
bl affichageMess
ldr r0,iAdrsBuffer2 @ display decrypted buffer
bl affichageMess
cmp r7,r6 @ best evaluation ?
bhi 3f
mov r6,r7 @ yes -> save new value
ldr r0,iAdrszMessBest @ message display
bl affichageMess
mov r8,#0
ldr r9,iAdrsBuffer2
ldr r10,iAdrsBestKey
2: @ copy best key loop
ldrb r12,[r9,r8]
strb r12,[r10,r8]
cmp r12,#0
beq 3f
add r8,r8,#1
b 2b
3:
ldr r0,iAdrszCarriageReturn
bl affichageMess
add r4,r4,#1
cmp r4,#30 @ interval maxi ?
blt 1b @ and loop
@ decrypt with best key
ldr r0,iAdrszString1
ldr r1,iAdrsBestKey
ldr r2,iAdrsBuffer2
bl decrypt
ldr r0,iAdrszMessDecrip
bl affichageMess
ldr r0,iAdrsBuffer2 @ display decrypted buffer
bl affichageMess
ldr r0,iAdrszCarriageReturn
bl affichageMess
ldr r0,iAdrszMessFinOK
bl affichageMess
b 100f
99:
ldr r0,iAdrszMessError @ error
bl affichageMess
mov r0, #1
100: @ standard end of the program
mov r0, #0 @ return code
mov r7, #EXIT @ request to exit program
svc 0 @ perform system call
iAdrszMessDecrip: .int szMessDecrip
iAdrszMessPossible: .int szMessPossible
iAdrszMessBest: .int szMessBest
iAdrszString1: .int szString1
iAdrsBuffer1: .int sBuffer1
iAdrsBuffer2: .int sBuffer2
iAdrszMessDebutPgm: .int szMessDebutPgm
iAdrszMessFinOK: .int szMessFinOK
iAdrszCarriageReturn: .int szCarriageReturn
iAdrszMessError: .int szMessError
iAdrsBestKey: .int sBestKey
/******************************************************************/
/* convert text in position and supp char non alpha */
/******************************************************************/
/* r0 contains the address of the string1 */
/* r1 contains key address of buffer
/* r0 return buffer lenght */
convertText:
push {r3-r7,lr} @ save registers
mov r3,#0 @ counter byte string 1
mov r5,#0 @ counter byte buffer
1:
ldrb r2,[r0,r3] @ load char
cmp r2,#0 @ final zero ?
beq 10f
cmp r2,#65 @ < A ?
addlt r3,#1
blt 1b
cmp r2,#90 @ > Z
addgt r3,#1 @ no minuscul
bgt 1b
sub r2,r2,#'A' @ compute rank
cmp r2,#26
ble 2f
ldr r0,iAdrszMessCharinv
bl affichageMess
mov r0,#-1
b 100f
2:
strb r2,[r1,r5] @
add r5,r5,#1
add r3,r3,#1
b 1b
10:
strb r2,[r1,r5] @ final zero
mov r0,r5
100:
pop {r3-r7,lr} @ restaur registers
bx lr @ return
iAdrszMessCharinv: .int szMessCharinv
/******************************************************************/
/* decrypt strings */
/******************************************************************/
/* r0 contains the address of the converted string1 */
/* r1 contains converted string1 length */
/* r2 contains interval */
/* r3 contains address result buffer */
searchKey:
push {r2-r11,lr} @ save registers
sub sp,sp,#64 @ area reserve on stack ( 26 * 2)
mov r7,sp @ save stack address occurences counter
add r9,r7,#28 @ best occurences counter
mov r4,#0
mov r5,#0
1: @ init area best occurences counter
strb r5,[r9,r4]
add r4,r4,#1
cmp r4,#LENALPHA
ble 1b
mov r6,#0 @ j
2:
mov r4,#0
mov r5,#0
3: @ init area occurences counter
strb r5,[r7,r4]
add r4,r4,#1
cmp r4,#LENALPHA
ble 3b
mov r4,r6 @ indice
4:
ldrb r5,[r0,r4] @ load byte
ldrb r8,[r7,r5] @ load one occurence counter
add r8,r8,#1 @ compute occurence char in intervall
cmp r8,#255 @ byte maxi ?
ble 41f
ldr r0,iAdrszMessErrOcc
bl affichageMess
mov r0,#-1
b 100f
41:
strb r8,[r7,r5] @ store new occurence
add r4,r4,r2 @ add interval
cmp r4,r1 @ compare length string
blt 4b
mov r8,r0 @ save register
mov r9,r1 @ save register
mov r0,r7 @ occurences area address on stack
ldr r1,iAdrtabFreq @ frequence area
bl recherche
mov r5,r0 @ best rotation for this interval
mov r0,r8
mov r1,r9
add r8,r5,#'A' @ key letter
strb r8,[r3,r6] @ store in key result
add r9,r7,#28 @
mov r4,#0
5:
add r10,r4,r5 @ add rotation to indice
cmp r10,#LENALPHA
subge r10,r10,#LENALPHA
ldrb r10,[r7,r10] @ load result
ldrb r11,[r9,r4]
add r11,r11,r10 @ add to general counter
strb r11,[r9,r4] @ store
add r4,r4,#1
cmp r4,#LENALPHA
blt 5b @ and loop
add r6,r6,#1 @ increment indice
cmp r6,r2 @ interval ?
blt 2b @ and loop
mov r11,#0 @ sum
mov r4,#0 @ indice
6: @ loop compute sum
ldrb r5,[r9,r4]
add r11,r5
add r4,r4,#1
cmp r4,#LENALPHA
blt 6b
mov r4,#0
ldr r8,iAdrtabFreq
mov r0,#0 @ return evaluation value
7:
ldrb r5,[r9,r4] @ load occurence
ldr r6,iMulti @ factor to avoid float use
mul r5,r6,r5
udiv r5,r5,r11 @ divide by sum
ldr r1,[r8,r4,lsl #2] @ load frequence
sub r5,r5,r1
mov r10,r5
mul r10,r5,r10 @ square
udiv r10,r10,r1 @ divide by freq
add r0,r0,r10 @ add to final result
add r4,r4,#1
cmp r4,#LENALPHA
blt 7b
mov r4,#0 @ key final zero
strb r4,[r3,r2]
add sp,sp,#64 @ free areas on stack
100:
pop {r2-r11,pc} @ restaur registers
iAdrtabFreq: .int tabFreq
iAdrszMessErrOcc: .int szMessErrOcc
/******************************************************************/
/* search best offset */
/******************************************************************/
/* r0 contains address array counter occurences */
/* r1 contains address array frequence */
/* r0 return result */
recherche:
push {r2-r12,lr} @ save registers
mov r12,#-1 @ high value rotation
mov r3,#0
mov r4,#0
mov r8,#0 @ sum
1: @ loop compute sum
ldrb r2,[r0,r4]
add r8,r8,r2
add r4,r4,#1
cmp r4,#LENALPHA
blt 1b
mov r6,#0 @ rotate
2:
mov r5,#0
mov r4,#0 @ indice
3:
add r7,r4,r6
cmp r7,#LENALPHA
subge r7,#LENALPHA
ldrb r9,[r0,r7]
ldr r10,iMulti @ factor to avoid float use
mul r9,r10,r9
udiv r9,r9,r8 @ divide by sum
ldr r10,[r1,r4,lsl #2] @ load frequency
sub r9,r9,r10
mov r11,r9
mul r9,r11,r9 @ square
udiv r9,r9,r10 @ frequency divide
add r5,r5,r9 @ add to final result
add r4,r4,#1
cmp r4,#LENALPHA
blt 3b
cmp r5,r12 @ best evalation ?
movlo r12,r5
movlo r3,r6 @ save best rotate
add r6,r6,#1
cmp r6,#LENALPHA
blt 2b
mov r0,r3 @ return result
100:
pop {r2-r12,pc} @ restaur registers and return
iMulti: .int 100000
/******************************************************************/
/* decrypt strings (see vignere program) */
/******************************************************************/
/* r0 contains the address of the encrypted string1 */
/* r1 contains the key */
/* r2 contains the address of the decrypted buffer */
decrypt:
push {r3-r7,lr} @ save registers
mov r3,#0 @ counter byte string 1
mov r5,#0 @ counter byte buffer
1:
mov r4,#0 @ counter byte key
2:
ldrb r6,[r1,r4] @ load byte key
cmp r6,#0 @ end key
beq 1b
sub r6,r6,#'A'
add r4,r4,#1
3:
ldrb r7,[r0,r3] @ load byte string 1
cmp r7,#0 @ zero final ?
streqb r7,[r2,r5]
moveq r0,r5
beq 100f
cmp r7,#65 @ < A ?
addlt r3,#1
blt 3b
cmp r7,#90 @ > Z
addgt r3,#1 @ no minuscul
bgt 3b
sub r7,r6 @ add key
cmp r7,#65 @ < A
addlt r7,#26 @
strb r7,[r2,r5]
add r5,r5,#1
add r3,r3,#1 @ other byte of string
b 2b @ other byte of key
100:
pop {r3-r7,lr} @ restaur registers
bx lr @ return
/***************************************************/
/* ROUTINES INCLUDE */
/***************************************************/
.include "../affichage.inc"
- Output:
Program 32 bits start. Possible key :E <-------Best key Possible key :EC <-------Best key Possible key :TEE Possible key :THEC <-------Best key Possible key :EEEPU Possible key :TCECEC Possible key :THECSAS <-------Best key Possible key :TJQGAHET Possible key :VEIZSEGNT Possible key :ECEGAWQTDS Possible key :TNLUSRXPTAJ Possible key :XLECTHQGTHEC Possible key :LJJTDGFNOTENR Possible key :THECHESHIRECAT <-------Best key Possible key :JNTOOEEXFTGQTNH Possible key :TJTSAEETEXHPXHNE Possible key :AZRAXUHEJLREEXIEE Possible key :VNIZQPALEPTSXSEXUC Possible key :FUCAITCSLVTEZDUDEHS Possible key :EQXGAHWTTQECEWUGXHPI Possible key :HVRCSAFTHEBDLSTAERSES Possible key :TVIJTCIGKAQPELECRXPTNC Possible key :KKEQXGPWTCQEELIEHXUWASV Possible key :ELAIXHQTTIEDXJETTNTGAEPC Possible key :OTJUUEGERDNQTUQEAGWUTIEOA Possible key :IGITEGECAGAVUNLJAHASAVTETW Possible key :TEEFSXHXAPXSNMEXSEVNHDSHWBD Possible key :THECHESCIRECATTHECHESHIRECAT <-------Best key Possible key :IPSVPRZGTSHNJCAESXUEHEBEPEXAC Decrypted : THISWASYHEPOEMTHATALICEREADJABBERWOHKYTWASBRILLIGANDTHESLITHYTOAESDIDGYREANDGIMBLEINTHEWABEFLLMIMSYWERETHEBOROGOVESANDTMEMOMERATHSOUTGRABEBEWARETHEOABBERWOCKMYSONTHEJAWSTHATBIYETHECLAWSTHATCATCHBEWARETHEOUBJUBBIRDANDSHUNTHEFRUMIOUSGANDERSNATCHHETOOKHISVORPALSBORDINHANDLONGTIMETHEMANXOMEKOEHESOUGHTSORESTEDHEBYTHETURTUMTREEANDSTOODAWHILEINTHOULHTANDASINUFFISHTHOUGHTHESTOTDTHEJABBERWOCKWITHEYESOFFLARECAMEWHIFFLINGTHROUGHTHETULLEYWOODANDBURBLEDASITCAMEONEYWOONETWOANDTHROUGHANDTHROUGMTHEVORPALBLADEWENTSNICKERSNFCKHELEFTITDEADANDWITHITSHEAIHEWENTGALUMPHINGBACKANDHASTYHOUSLAINTHEJABBERWOCKCOMETORYARMSMYBEAMISHBOYOFRABJOUSDFYCALLOOHCALLAYHECHORTLEDINHNSJOYTWASBRILLIGANDTHESLITHYYOVESDIDGYREANDGIMBLEINTHEWAGEALLMIMSYWERETHEBOROGOVESANITHEMOMERATHSOUTGRABEITSEEMSAERYPRETTYSHESAIDWHENSHEHADFNNISHEDITBUTITSRATHERHARDTOUSDERSTAND Program normal end.
C
This finds the right key (I think, I didn't try to decode it after getting the key). The program is not fully auto, but by its output, the result is pretty obvious.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
const char *encoded =
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH"
"VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD"
"ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS"
"FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG"
"ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ"
"ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS"
"JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT"
"LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST"
"MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH"
"QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV"
"RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW"
"TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO"
"SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR"
"ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX"
"BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB"
"BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA"
"FWAML ZZRXJ EKAHV FASMU LVVUT TGK";
const double freq[] = {
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074
};
int best_match(const double *a, const double *b) {
double sum = 0, fit, d, best_fit = 1e100;
int i, rotate, best_rotate = 0;
for (i = 0; i < 26; i++)
sum += a[i];
for (rotate = 0; rotate < 26; rotate++) {
fit = 0;
for (i = 0; i < 26; i++) {
d = a[(i + rotate) % 26] / sum - b[i];
fit += d * d / b[i];
}
if (fit < best_fit) {
best_fit = fit;
best_rotate = rotate;
}
}
return best_rotate;
}
double freq_every_nth(const int *msg, int len, int interval, char *key) {
double sum, d, ret;
double out[26], accu[26] = {0};
int i, j, rot;
for (j = 0; j < interval; j++) {
for (i = 0; i < 26; i++)
out[i] = 0;
for (i = j; i < len; i += interval)
out[msg[i]]++;
key[j] = rot = best_match(out, freq);
key[j] += 'A';
for (i = 0; i < 26; i++)
accu[i] += out[(i + rot) % 26];
}
for (i = 0, sum = 0; i < 26; i++)
sum += accu[i];
for (i = 0, ret = 0; i < 26; i++) {
d = accu[i] / sum - freq[i];
ret += d * d / freq[i];
}
key[interval] = '\0';
return ret;
}
int main() {
int txt[strlen(encoded)];
int len = 0, j;
char key[100];
double fit, best_fit = 1e100;
for (j = 0; encoded[j] != '\0'; j++)
if (isupper(encoded[j]))
txt[len++] = encoded[j] - 'A';
for (j = 1; j < 30; j++) {
fit = freq_every_nth(txt, len, j, key);
printf("%f, key length: %2d, %s", fit, j, key);
if (fit < best_fit) {
best_fit = fit;
printf(" <--- best so far");
}
printf("\n");
}
return 0;
}
C++
Not guaranteed to give a 100% correct answer, but it works here. Requires C++0x.
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
#include <array>
using namespace std;
typedef array<pair<char, double>, 26> FreqArray;
class VigenereAnalyser
{
private:
array<double, 26> targets;
array<double, 26> sortedTargets;
FreqArray freq;
// Update the freqs array
FreqArray& frequency(const string& input)
{
for (char c = 'A'; c <= 'Z'; ++c)
freq[c - 'A'] = make_pair(c, 0);
for (size_t i = 0; i < input.size(); ++i)
freq[input[i] - 'A'].second++;
return freq;
}
double correlation(const string& input)
{
double result = 0.0;
frequency(input);
sort(freq.begin(), freq.end(), [](pair<char, double> u, pair<char, double> v)->bool
{ return u.second < v.second; });
for (size_t i = 0; i < 26; ++i)
result += freq[i].second * sortedTargets[i];
return result;
}
public:
VigenereAnalyser(const array<double, 26>& targetFreqs)
{
targets = targetFreqs;
sortedTargets = targets;
sort(sortedTargets.begin(), sortedTargets.end());
}
pair<string, string> analyze(string input)
{
string cleaned;
for (size_t i = 0; i < input.size(); ++i)
{
if (input[i] >= 'A' && input[i] <= 'Z')
cleaned += input[i];
else if (input[i] >= 'a' && input[i] <= 'z')
cleaned += input[i] + 'A' - 'a';
}
size_t bestLength = 0;
double bestCorr = -100.0;
// Assume that if there are less than 20 characters
// per column, the key's too long to guess
for (size_t i = 2; i < cleaned.size() / 20; ++i)
{
vector<string> pieces(i);
for (size_t j = 0; j < cleaned.size(); ++j)
pieces[j % i] += cleaned[j];
// The correlation increases artificially for smaller
// pieces/longer keys, so weigh against them a little
double corr = -0.5*i;
for (size_t j = 0; j < i; ++j)
corr += correlation(pieces[j]);
if (corr > bestCorr)
{
bestLength = i;
bestCorr = corr;
}
}
if (bestLength == 0)
return make_pair("Text is too short to analyze", "");
vector<string> pieces(bestLength);
for (size_t i = 0; i < cleaned.size(); ++i)
pieces[i % bestLength] += cleaned[i];
vector<FreqArray> freqs;
for (size_t i = 0; i < bestLength; ++i)
freqs.push_back(frequency(pieces[i]));
string key = "";
for (size_t i = 0; i < bestLength; ++i)
{
sort(freqs[i].begin(), freqs[i].end(), [](pair<char, double> u, pair<char, double> v)->bool
{ return u.second > v.second; });
size_t m = 0;
double mCorr = 0.0;
for (size_t j = 0; j < 26; ++j)
{
double corr = 0.0;
char c = 'A' + j;
for (size_t k = 0; k < 26; ++k)
{
int d = (freqs[i][k].first - c + 26) % 26;
corr += freqs[i][k].second * targets[d];
}
if (corr > mCorr)
{
m = j;
mCorr = corr;
}
}
key += m + 'A';
}
string result = "";
for (size_t i = 0; i < cleaned.size(); ++i)
result += (cleaned[i] - key[i % key.length()] + 26) % 26 + 'A';
return make_pair(result, key);
}
};
int main()
{
string input =
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH"
"VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD"
"ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS"
"FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG"
"ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ"
"ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS"
"JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT"
"LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST"
"MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH"
"QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV"
"RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW"
"TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO"
"SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR"
"ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX"
"BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB"
"BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA"
"FWAML ZZRXJ EKAHV FASMU LVVUT TGK";
array<double, 26> english = {
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228,
0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025,
0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987,
0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150,
0.01974, 0.00074};
VigenereAnalyser va(english);
pair<string, string> output = va.analyze(input);
cout << "Key: " << output.second << endl << endl;
cout << "Text: " << output.first << endl;
}
D
import std.stdio, std.algorithm, std.typecons, std.string,
std.array, std.numeric, std.ascii;
string[2] vigenereDecrypt(in double[] targetFreqs, in string input) {
enum nAlpha = std.ascii.uppercase.length;
static double correlation(in string txt, in double[] sTargets)
pure nothrow /*@safe*/ @nogc {
uint[nAlpha] charCounts = 0;
foreach (immutable c; txt)
charCounts[c - 'A']++;
return charCounts[].sort().release.dotProduct(sTargets);
}
static frequency(in string txt) pure nothrow @safe {
auto freqs = new Tuple!(char,"c", uint,"d")[nAlpha];
foreach (immutable i, immutable c; std.ascii.uppercase)
freqs[i] = tuple(c, 0);
foreach (immutable c; txt)
freqs[c - 'A'].d++;
return freqs;
}
static string[2] decode(in string cleaned, in string key)
pure nothrow @safe {
assert(!key.empty);
string decoded;
foreach (immutable i, immutable c; cleaned)
decoded ~= (c - key[i % $] + nAlpha) % nAlpha + 'A';
return [key, decoded];
}
static size_t findBestLength(in string cleaned,
in double[] sTargets)
pure nothrow /*@safe*/ {
size_t bestLength;
double bestCorr = -100.0;
// Assume that if there are less than 20 characters
// per column, the key's too long to guess
foreach (immutable i; 2 .. cleaned.length / 20) {
auto pieces = new Appender!string[i];
foreach (immutable j, immutable c; cleaned)
pieces[j % i] ~= c;
// The correlation seems to increase for smaller
// pieces/longer keys, so weigh against them a little
double corr = -0.5 * i;
foreach (const p; pieces)
corr += correlation(p.data, sTargets);
if (corr > bestCorr) {
bestLength = i;
bestCorr = corr;
}
}
return bestLength;
}
static string findKey(in string cleaned, in size_t bestLength,
in double[] targetFreqs) pure nothrow @safe {
auto pieces = new string[bestLength];
foreach (immutable i, immutable c; cleaned)
pieces[i % bestLength] ~= c;
string key;
foreach (fr; pieces.map!frequency) {
fr.sort!q{ a.d > b.d };
size_t m;
double maxCorr = 0.0;
foreach (immutable j, immutable c; uppercase) {
double corr = 0.0;
foreach (immutable frc; fr) {
immutable di = (frc.c - c + nAlpha) % nAlpha;
corr += frc.d * targetFreqs[di];
}
if (corr > maxCorr) {
m = j;
maxCorr = corr;
}
}
key ~= m + 'A';
}
return key;
}
immutable cleaned = input.toUpper.removechars("^A-Z");
//immutable sortedTargets = targetFreqs.sorted;
immutable sortedTargets = targetFreqs.dup.sort().release.idup;
immutable bestLength = findBestLength(cleaned, sortedTargets);
if (bestLength == 0)
throw new Exception("Text is too short to analyze.");
immutable string key = findKey(cleaned, bestLength, targetFreqs);
return decode(cleaned, key);
}
void main() {
immutable encoded = "MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG
JSPXY ALUYM NSMYH VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF
WHTCQ KMLRD ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA
LWQIS FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ ILOVV
RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS JLAKI FHXUF
XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT LPRWM JAZPK LQUZA
ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST MTEOE PAPJH SMFNB YVQUZ
AALGA YDNMP AQOWT UHDBV TSMUE UIMVH QGVRW AEFSP EMPVE PKXZY WLKJA
GWALT VYYOB YIXOK IHPDS EVLEV RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY
IMAPX UOISK PVAGN MZHPW TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV
YOVDJ SOLXG TGRVO SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV
GJOKM SIFPR ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO
ZQDLX BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA FWAML
ZZRXJ EKAHV FASMU LVVUT TGK";
immutable englishFrequences = [0.08167, 0.01492, 0.02782, 0.04253,
0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772,
0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987,
0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150, 0.01974,
0.00074];
immutable key_dec = vigenereDecrypt(englishFrequences, encoded);
writefln("Key: %s\n\nText: %s", key_dec[0], key_dec[1]);
}
- Output (cut):
Key: THECHESHIRECAT Text: THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHY...
FreeBASIC
Type FreqPair
As String * 1 c
As Double freq
End Type
Function frequency(inputText() As Integer, inputLen As Integer) As FreqPair Ptr
Dim As FreqPair Ptr result = Callocate(26 * Sizeof(FreqPair))
Dim As Integer i
For i = 0 To 25
result[i].c = Chr(65 + i)
result[i].freq = 0.0
Next
For i = 0 To inputLen - 1
result[inputText(i) - 65].freq += 1
Next
Return result
End Function
Function correlation(inputText() As Integer, inputLen As Integer, sorted_targets() As Double) As Double
Dim As FreqPair Ptr freq = frequency(inputText(), inputLen)
Dim As Integer i, j
Dim As Double result = 0.0
'Sort freq by frequency
For i = 0 To 24
For j = i + 1 To 25
If freq[j].freq > freq[i].freq Then Swap freq[j], freq[i]
Next
Next
For i = 0 To 25
result += freq[i].freq * sorted_targets(i)
Next
Deallocate(freq)
Return result
End Function
Sub vigenereDecrypt(targetFreqs() As Double, encoded As String, Byref outKey As String, Byref outText As String)
Dim As Integer cleaned(Len(encoded))
Dim As Integer cleanedLen = 0
Dim As Integer i, j, k
'Clean inputText
For i = 1 To Len(encoded)
Dim As String c = Mid(encoded, i, 1)
If c >= "A" And c <= "Z" Then
cleaned(cleanedLen) = Asc(c)
cleanedLen += 1
End If
Next
'Sort target frequencies
Dim As Double sorted_targets(25)
For i = 0 To 25
sorted_targets(i) = targetFreqs(i)
Next
For i = 0 To 24
For j = i + 1 To 25
If sorted_targets(j) > sorted_targets(i) Then Swap sorted_targets(j), sorted_targets(i)
Next
Next
'Find best key length
Dim As Integer bestLen = 0
Dim As Double bestCorr = -100.0
For keyLen As Integer = 2 To cleanedLen \ 20
Dim As Integer pieces(cleanedLen)
Dim As Integer pieceLens(keyLen)
For j = 0 To cleanedLen - 1
pieces(j) = cleaned(j)
pieceLens(j Mod keyLen) += 1
Next
Dim As Double corr = -0.5 * keyLen
For i = 0 To keyLen - 1
Dim As Integer currentPiece(cleanedLen)
Dim As Integer currentLen = 0
For j = i To cleanedLen - 1 Step keyLen
currentPiece(currentLen) = pieces(j)
currentLen += 1
Next
corr += correlation(currentPiece(), currentLen, sorted_targets())
Next
If corr > bestCorr Then
bestLen = keyLen
bestCorr = corr
End If
Next
'Find key
outKey = ""
For i = 0 To bestLen - 1
Dim As Integer piece(cleanedLen)
Dim As Integer pieceLen = 0
For j = i To cleanedLen - 1 Step bestLen
piece(pieceLen) = cleaned(j)
pieceLen += 1
Next
Dim As Double maxCorr = 0.0
Dim As Integer bestShift = 0
For shift As Integer = 0 To 25
Dim As Double corr = 0.0
For j = 0 To pieceLen - 1
k = (piece(j) - 65 - shift + 26) Mod 26
corr += targetFreqs(k)
Next
If corr > maxCorr Then
maxCorr = corr
bestShift = shift
End If
Next
outKey += Chr(bestShift + 65)
Next
'Decrypt
outText = ""
For i = 0 To cleanedLen - 1
k = Asc(Mid(outKey, (i Mod bestLen) + 1, 1)) - 65
outText &= Chr(((cleaned(i) - 65 - k + 26) Mod 26) + 65)
Next
End Sub
'Main program
Dim As Double english_freqs(25) = { _
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, _
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, _
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, _
0.00978, 0.02360, 0.00150, 0.01974, 0.00074 }
Dim As String encoded = _
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH" & _
"VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD" & _
"ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS" & _
"FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG" & _
"ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ" & _
"ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS" & _
"JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT" & _
"LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST" & _
"MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH" & _
"QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV" & _
"RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW" & _
"TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO" & _
"SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR" & _
"ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX" & _
"BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB" & _
"BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA" & _
"FWAML ZZRXJ EKAHV FASMU LVVUT TGK"
Dim As String key, decoded
vigenereDecrypt(english_freqs(), encoded, key, decoded)
Print "Key: "; key
Print !"\nDecoded text: "; decoded
Sleep
Go
package main
import (
"fmt"
"strings"
)
var encoded =
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH" +
"VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD" +
"ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS" +
"FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG" +
"ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ" +
"ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS" +
"JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT" +
"LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST" +
"MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH" +
"QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV" +
"RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW" +
"TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO" +
"SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR" +
"ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX" +
"BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB" +
"BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA" +
"FWAML ZZRXJ EKAHV FASMU LVVUT TGK"
var freq = [26]float64{
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074,
}
func sum(a []float64) (sum float64) {
for _, f := range a {
sum += f
}
return
}
func bestMatch(a []float64) int {
sum := sum(a)
bestFit, bestRotate := 1e100, 0
for rotate := 0; rotate < 26; rotate++ {
fit := 0.0
for i := 0; i < 26; i++ {
d := a[(i+rotate)%26]/sum - freq[i]
fit += d * d / freq[i]
}
if fit < bestFit {
bestFit, bestRotate = fit, rotate
}
}
return bestRotate
}
func freqEveryNth(msg []int, key []byte) float64 {
l := len(msg)
interval := len(key)
out := make([]float64, 26)
accu := make([]float64, 26)
for j := 0; j < interval; j++ {
for k := 0; k < 26; k++ {
out[k] = 0.0
}
for i := j; i < l; i += interval {
out[msg[i]]++
}
rot := bestMatch(out)
key[j] = byte(rot + 65)
for i := 0; i < 26; i++ {
accu[i] += out[(i+rot)%26]
}
}
sum := sum(accu)
ret := 0.0
for i := 0; i < 26; i++ {
d := accu[i]/sum - freq[i]
ret += d * d / freq[i]
}
return ret
}
func decrypt(text, key string) string {
var sb strings.Builder
ki := 0
for _, c := range text {
if c < 'A' || c > 'Z' {
continue
}
ci := (c - rune(key[ki]) + 26) % 26
sb.WriteRune(ci + 65)
ki = (ki + 1) % len(key)
}
return sb.String()
}
func main() {
enc := strings.Replace(encoded, " ", "", -1)
txt := make([]int, len(enc))
for i := 0; i < len(txt); i++ {
txt[i] = int(enc[i] - 'A')
}
bestFit, bestKey := 1e100, ""
fmt.Println(" Fit Length Key")
for j := 1; j <= 26; j++ {
key := make([]byte, j)
fit := freqEveryNth(txt, key)
sKey := string(key)
fmt.Printf("%f %2d %s", fit, j, sKey)
if fit < bestFit {
bestFit, bestKey = fit, sKey
fmt.Print(" <--- best so far")
}
fmt.Println()
}
fmt.Println("\nBest key :", bestKey)
fmt.Printf("\nDecrypted text:\n%s\n", decrypt(enc, bestKey))
}
- Output:
Note: carriage returns inserted into decrypted text after every 80 characters to make it more readable.
Fit Length Key 2.984348 1 E <--- best so far 2.483684 2 EC <--- best so far 2.642487 3 TEE 1.976651 4 THEC <--- best so far 2.356881 5 EEEPU 2.203129 6 TCECEC 1.051163 7 THECSAS <--- best so far 1.645763 8 TJQGAHET 2.001380 9 VEIZSEGNT 1.824476 10 ECEGAWQTDS 1.623083 11 TNLUSRXPTAJ 1.253527 12 XLECTHQGTHEC 1.399037 13 LJJTDGFNOTENR 0.152370 14 THECHESHIRECAT <--- best so far 1.533951 15 JNTOOEEXFTGQTNH 1.068182 16 TJTSAEETEXHPXHNE 1.034093 17 AZRAXUHEJLREEXIEE 1.443345 18 VNIZQPALEPTSXSEXUC 1.090977 19 FUCAITCSLVTEZDUDEHS 0.979868 20 EQXGAHWTTQECEWUGXHPI 0.789410 21 HVRCSAFTHEBDLSTAERSES 0.881380 22 TVIJTCIGKAQPELECRXPTNC 0.952456 23 KKEQXGPWTCQEELIEHXUWASV 0.715968 24 ELAIXHQTTIEDXJETTNTGAEPC 0.891258 25 OTJUUEGERDNQTUQEAGWUTIEOA 0.852784 26 IGITEGECAGAVUNLJAHASAVTETW Best key : THECHESHIRECAT Decrypted text: THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMB LEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKMYS ONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDERSNA TCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUMTUMT REEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESOFFLAM ECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGHANDTHR OUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPHINGBACK ANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOHCALLAYHE CHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWER ETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHADFINISHEDIT BUTITSRATHERHARDTOUNDERSTAND
Haskell
{-# LANGUAGE TupleSections #-}
import Data.List(transpose, nub, sort, maximumBy)
import Data.Ord (comparing)
import Data.Char (ord)
import Data.Map (Map, fromListWith, toList, findWithDefault)
average :: Fractional a => [a] -> a
average as = sum as / fromIntegral (length as)
-- Create a map from each entry in list to the number of occurrences of
-- that entry in the list.
countEntries :: Ord a => [a] -> Map a Int
countEntries = fromListWith (+) . fmap (,1)
-- Break a string up into substrings of n chars.
breakup :: Int -> [a] -> [[a]]
breakup _ [] = []
breakup n as =
let (h, r) = splitAt n as
in h:breakup n r
-- Dole out elements of a string over a n element distribution.
distribute :: [a] -> Int -> [[a]]
distribute as n = transpose $ breakup n as
-- The probability that members of a pair of characters taken randomly
-- from a given string are equal.
coincidence :: (Ord a, Fractional b) => [a] -> b
coincidence str =
let charCounts = snd <$> toList (countEntries str)
strln = length str
d = fromIntegral $ strln * (strln - 1)
n = fromIntegral $ sum $ fmap (\cc -> cc * (cc-1)) charCounts
in n / d
-- Use the average probablity of coincidence for all the members of
-- a distribution to rate the distribution - the higher the better.
-- The correlation increases artificially for smaller
-- pieces/longer keys, so weigh against them a little
rate :: (Ord a, Fractional b) => [[a]] -> b
rate d = average (fmap coincidence d) - fromIntegral (length d) / 3000.0
-- Multiply elements of lists together and add up the results.
dot :: Num a => [a] -> [a] -> a
dot v0 v1 = sum $ zipWith (*) v0 v1
-- Given two lists of floats, rotate one of them by the number of
-- characters indicated by letter and then 'dot' them together.
rotateAndDot :: Num a => [a] -> [a] -> Char -> a
rotateAndDot v0 v1 letter = dot v0 (drop (ord letter - ord 'A') (cycle v1))
-- Find decoding offset that results in best match
-- between actual char frequencies and expected frequencies.
getKeyChar :: RealFrac a => [a] -> String -> Char
getKeyChar expected sample =
let charCounts = countEntries sample
countInSample c = findWithDefault 0 c charCounts
actual = fmap (fromIntegral . countInSample) ['A'..'Z']
in maximumBy (comparing $ rotateAndDot expected actual) ['A'..'Z']
main = do
let cr = filter (/=' ') crypt
-- Assume that if there are less than 20 characters
-- per column, the key's too long to guess
distributions = fmap (distribute cr) [1..length cr `div` 20]
bestDistribution = maximumBy (comparing rate) distributions
key = fmap (getKeyChar englishFrequencies) bestDistribution
alphaSum a b = ['A'..'Z'] !! ((ord b - ord a) `mod` 26)
mapM_ putStrLn ["Key: " ++ key, "Decrypted Text: " ++ zipWith alphaSum (cycle key) cr]
englishFrequencies =
[ 0.08167, 0.01492, 0.02782, 0.04253,
0.12702, 0.02228, 0.02015, 0.06094,
0.06966, 0.00153, 0.00772, 0.04025,
0.02406, 0.06749, 0.07507, 0.01929,
0.00095, 0.05987, 0.06327, 0.09056,
0.02758, 0.00978, 0.02360, 0.00150,
0.01974, 0.00074 ]
crypt = "\
\MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH\
\VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD\
\ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS\
\FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG\
\ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ\
\ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS\
\JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT\
\LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST\
\MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH\
\QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV\
\RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW\
\TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO\
\SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR\
\ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX\
\BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB\
\BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA\
\FWAML ZZRXJ EKAHV FASMU LVVUT TGK\
\"
- Output:
Key: THECHESHIRECAT Decrypted Text: THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKMYSONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDERSNATCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUMTUMTREEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESOFFLAMECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGHANDTHROUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPHINGBACKANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOHCALLAYHECHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHADFINISHEDITBUTITSRATHERHARDTOUNDERSTAND
J
Implementation:
NB. https://en.wikipedia.org/wiki/Kasiski_examination
kasiski=: {{
grams=. ({: #"1~1 < ;@{.)|:(#/.~;"0~.) g=. 3 <\ y
deltas=. ;grams (2 -~/\ I.@E.)L:0 enc
{:,{.\:~(#/.~,.~.)1 -.~,+./~ deltas
}}
NB. https://en.wikipedia.org/wiki/Letter_frequency
AZ=: 8 u: 65+i.26
lfreq=: 0.01*do{{)n
8.2 1.5 2.8 4.3 13 2.2 2 6.1 7 0.15
0.77 4 2.4 6.7 7.5 1.9 0.095 6 6.3 9.1
2.8 0.98 2.4 0.15 2 0.074
}}-.LF
caesarkey=: {{
freqs=. (<:#/.~AZ,y)%#y=. y ([-.-.) AZ
AZ{~(i. <./)lfreq +/&.:*:@:-"1 (i.26)|."0 1 freqs
}}
vigenerekey=: {{ caesarkey"1|:(-kasiski y) ]\y }}
uncaesar=: {{ 26&|@-&(AZ i.x)&.(AZ&i.) y }}"0 1
unvigenere=: {{ ' '-.~,x uncaesar"0 1&.|:(-#x) ]\y }}
Here, kasiski finds all 3-grams (sequences of three adjacent letters) which appear more than once, finds all of the distances between nearest pairs of these sequences, and then further pairs each of these distances with all other distances, finding the greatest common divisor of those distance pairs. Finally, these LCDs are ordered by how many times they appear and the most frequent LCD is taken as the kasiski result.
uncaesar works by finding the frequency of occurrence of each letter of the alphabet (in alphabetical order), and then each of the 26 rotations of that sequence are compared with a text frequency alphabet (obtained from a wikipedia table). The rotation with the least root-mean-square sum of differences is chosen as the correct location, and its index is reported as a letter of the alphabet (0=A, 1=B, etc.)
(And, the length provided by kasiski is used to break out the sequences to be analyzed by uncaesar...)
Task example:
enc=: {{)n
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK
}}-.LF,' '
vigenerekey enc
THECHESHIRECAT
_80]\'THECHESHIRECAT' unvigenere enc
THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMB
LEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKMYS
ONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDERSNA
TCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUMTUMT
REEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESOFFLAM
ECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGHANDTHR
OUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPHINGBACK
ANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOHCALLAYHE
CHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWER
ETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHADFINISHEDIT
BUTITSRATHERHARDTOUNDERSTANDWYTWITSJWYAH
As an aside, note that we could go directly from encrypted text to decrypted text, without showing the key. For example, using:
decaesar=: {{
freqs=. (<:#/.~AZ,y)%#y=. y ([-.-.) AZ
ndx=. (i. <./)lfreq +/&.:*:@:-"1 (i.26)|."0 1 freqs
26&|@-&ndx&.(AZ&i.) y
}}
devigenere=: {{ ' '-.~,decaesar"1&.|:(-kasiski y) ]\y }}
That said, it's also worth noting that noise issues mean that if this were to be used in practical contexts the approach should instead be to expose more intermediate results, rather than less, with a special focus on the representations of frequency distributions (here, we're always picking the first alternative, but it's vaguely plausible that a different alternative might actually be useful in some cases).
Java
public class Vig{
static String encodedMessage =
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA FWAML ZZRXJ EKAHV FASMU LVVUT TGK";
final static double freq[] = {
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074
};
public static void main(String[] args) {
int lenghtOfEncodedMessage = encodedMessage.length();
char[] encoded = new char [lenghtOfEncodedMessage] ;
char[] key = new char [lenghtOfEncodedMessage] ;
encodedMessage.getChars(0, lenghtOfEncodedMessage, encoded, 0);
int txt[] = new int[lenghtOfEncodedMessage];
int len = 0, j;
double fit, best_fit = 1e100;
for (j = 0; j < lenghtOfEncodedMessage; j++)
if (Character.isUpperCase(encoded[j]))
txt[len++] = encoded[j] - 'A';
for (j = 1; j < 30; j++) {
fit = freq_every_nth(txt, len, j, key);
System.out.printf("%f, key length: %2d ", fit, j);
System.out.print(key);
if (fit < best_fit) {
best_fit = fit;
System.out.print(" <--- best so far");
}
System.out.print("\n");
}
}
static String decrypt(String text, final String key) {
String res = "";
text = text.toUpperCase();
for (int i = 0, j = 0; i < text.length(); i++) {
char c = text.charAt(i);
if (c < 'A' || c > 'Z') continue;
res += (char)((c - key.charAt(j) + 26) % 26 + 'A');
j = ++j % key.length();
}
return res;
}
static int best_match(final double []a, final double []b) {
double sum = 0, fit, d, best_fit = 1e100;
int i, rotate, best_rotate = 0;
for (i = 0; i < 26; i++)
sum += a[i];
for (rotate = 0; rotate < 26; rotate++) {
fit = 0;
for (i = 0; i < 26; i++) {
d = a[(i + rotate) % 26] / sum - b[i];
fit += d * d / b[i];
}
if (fit < best_fit) {
best_fit = fit;
best_rotate = rotate;
}
}
return best_rotate;
}
static double freq_every_nth(final int []msg, int len, int interval, char[] key) {
double sum, d, ret;
double [] accu = new double [26];
double [] out = new double [26];
int i, j, rot;
for (j = 0; j < interval; j++) {
for (i = 0; i < 26; i++)
out[i] = 0;
for (i = j; i < len; i += interval)
out[msg[i]]++;
rot = best_match(out, freq);
try{
key[j] = (char)(rot + 'A');
} catch (Exception e) {
System.out.print(e.getMessage());
}
for (i = 0; i < 26; i++)
accu[i] += out[(i + rot) % 26];
}
for (i = 0, sum = 0; i < 26; i++)
sum += accu[i];
for (i = 0, ret = 0; i < 26; i++) {
d = accu[i] / sum - freq[i];
ret += d * d / freq[i];
}
key[interval] = '\0';
return ret;
}
}
Julia
# ciphertext block {{{1
const ciphertext = filter(isalpha, """
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK
""")
# }}}
# character frequencies {{{1
const letters = Dict{Char, Float32}(
'E' => 12.702,
'T' => 9.056,
'A' => 8.167,
'O' => 7.507,
'I' => 6.966,
'N' => 6.749,
'S' => 6.327,
'H' => 6.094,
'R' => 5.987,
'D' => 4.253,
'L' => 4.025,
'C' => 2.782,
'U' => 2.758,
'M' => 2.406,
'W' => 2.361,
'F' => 2.228,
'G' => 2.015,
'Y' => 1.974,
'P' => 1.929,
'B' => 1.492,
'V' => 0.978,
'K' => 0.772,
'J' => 0.153,
'X' => 0.150,
'Q' => 0.095,
'Z' => 0.074)
const digraphs = Dict{AbstractString, Float32}(
"TH" => 15.2,
"HE" => 12.8,
"IN" => 9.4,
"ER" => 9.4,
"AN" => 8.2,
"RE" => 6.8,
"ND" => 6.3,
"AT" => 5.9,
"ON" => 5.7,
"NT" => 5.6,
"HA" => 5.6,
"ES" => 5.6,
"ST" => 5.5,
"EN" => 5.5,
"ED" => 5.3,
"TO" => 5.2,
"IT" => 5.0,
"OU" => 5.0,
"EA" => 4.7,
"HI" => 4.6,
"IS" => 4.6,
"OR" => 4.3,
"TI" => 3.4,
"AS" => 3.3,
"TE" => 2.7,
"ET" => 1.9,
"NG" => 1.8,
"OF" => 1.6,
"AL" => 0.9,
"DE" => 0.9,
"SE" => 0.8,
"LE" => 0.8,
"SA" => 0.6,
"SI" => 0.5,
"AR" => 0.4,
"VE" => 0.4,
"RA" => 0.4,
"LD" => 0.2,
"UR" => 0.2)
const trigraphs = Dict{AbstractString, Float32}(
"THE" => 18.1,
"AND" => 7.3,
"ING" => 7.2,
"ION" => 4.2,
"ENT" => 4.2,
"HER" => 3.6,
"FOR" => 3.4,
"THA" => 3.3,
"NTH" => 3.3,
"INT" => 3.2,
"TIO" => 3.1,
"ERE" => 3.1,
"TER" => 3.0,
"EST" => 2.8,
"ERS" => 2.8,
"HAT" => 2.6,
"ATI" => 2.6,
"ATE" => 2.5,
"ALL" => 2.5,
"VER" => 2.4,
"HIS" => 2.4,
"HES" => 2.4,
"ETH" => 2.4,
"OFT" => 2.2,
"STH" => 2.1,
"RES" => 2.1,
"OTH" => 2.1,
"ITH" => 2.1,
"FTH" => 2.1,
"ONT" => 2.0)
# 1}}}
function decrypt(enc::ASCIIString, key::ASCIIString)
const enclen = length(enc)
const keylen = length(key)
if keylen < enclen
key = (key^(div(enclen - keylen, keylen) + 2))[1:enclen]
end
msg = Array(Char, enclen)
for i=1:enclen
msg[i] = Char((Int(enc[i]) - Int(key[i]) + 26) % 26 + 65)
end
msg::Array{Char, 1}
end
function cryptanalyze(enc::ASCIIString; maxkeylen::Integer = 20)
const enclen = length(enc)
maxkey = ""
maxdec = ""
maxscore = 0.0
for keylen=1:maxkeylen
key = Array(Char, keylen)
idx = filter(x -> x % keylen == 0, 1:enclen) - keylen + 1
for i=1:keylen
maxsubscore = 0.0
for j='A':'Z'
subscore = 0.0
for k in decrypt(enc[idx], ascii(string(j)))
subscore += get(letters, k, 0.0)
end
if subscore > maxsubscore
maxsubscore = subscore
key[i] = j
end
end
idx += 1
end
key = join(key)
const dec = decrypt(enc, key)
score = 0.0
for i in dec
score += get(letters, i, 0.0)
end
for i=1:enclen - 2
const digraph = string(dec[i], dec[i + 1])
const trigraph = string(dec[i], dec[i + 1], dec[i + 2])
if haskey(digraphs, digraph)
score += 2 * get(digraphs, digraph, 0.0)
end
if haskey(trigraphs, trigraph)
score += 3 * get(trigraphs, trigraph, 0.0)
end
end
if score > maxscore
maxscore = score
maxkey = key
maxdec = dec
end
end
(maxkey, join(maxdec))::Tuple{ASCIIString, ASCIIString}
end
key, dec = cryptanalyze(ciphertext)
println("key: ", key, "\n\n", dec)
# post-compilation profiling run
gc()
t = @elapsed cryptanalyze(ciphertext)
println("\nelapsed time: ", t, " seconds")
- Output:
key: THECHESHIRECAT THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHY... elapsed time: 0.042894211 seconds
Kotlin
This is a reasonably faithful translation of the C entry though I've restricted the key lengths examined to 26 to automatically produce the correct key and hence decrypted text. This is because the C entry examines key lengths up to 29 and a value of 28 gives a slightly better fit even though the key produced (THECHESCIRECATTHECHESHIRECAT) and resulting text don't make as much sense and so would be rejected if one were examining the candidate keys manually.
// version 1.1.3
val encoded =
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH" +
"VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD" +
"ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS" +
"FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG" +
"ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ" +
"ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS" +
"JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT" +
"LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST" +
"MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH" +
"QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV" +
"RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW" +
"TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO" +
"SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR" +
"ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX" +
"BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB" +
"BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA" +
"FWAML ZZRXJ EKAHV FASMU LVVUT TGK"
val freq = doubleArrayOf(
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074
)
fun bestMatch(a: DoubleArray): Int {
val sum = a.sum()
var bestFit = 1e100
var bestRotate = 0
for (rotate in 0..25) {
var fit = 0.0
for (i in 0..25) {
val d = a[(i + rotate) % 26] / sum - freq[i]
fit += d * d / freq[i]
}
if (fit < bestFit) {
bestFit = fit
bestRotate = rotate
}
}
return bestRotate
}
fun freqEveryNth(msg: IntArray, key: CharArray): Double {
val len = msg.size
val interval = key.size
val out = DoubleArray(26)
val accu = DoubleArray(26)
for (j in 0 until interval) {
out.fill(0.0)
for (i in j until len step interval) out[msg[i]]++
val rot = bestMatch(out)
key[j] = (rot + 65).toChar()
for (i in 0..25) accu[i] += out[(i + rot) % 26]
}
val sum = accu.sum()
var ret = 0.0
for (i in 0..25) {
val d = accu[i] / sum - freq[i]
ret += d * d / freq[i]
}
return ret
}
fun decrypt(text: String, key: String): String {
val sb = StringBuilder()
var ki = 0
for (c in text) {
if (c !in 'A'..'Z') continue
val ci = (c.toInt() - key[ki].toInt() + 26) % 26
sb.append((ci + 65).toChar())
ki = (ki + 1) % key.length
}
return sb.toString()
}
fun main(args: Array<String>) {
val enc = encoded.replace(" ", "")
val txt = IntArray(enc.length) { enc[it] - 'A' }
var bestFit = 1e100
var bestKey = ""
val f = "%f %2d %s"
println(" Fit Length Key")
for (j in 1..26) {
val key = CharArray(j)
val fit = freqEveryNth(txt, key)
val sKey = key.joinToString("")
print(f.format(fit, j, sKey))
if (fit < bestFit) {
bestFit = fit
bestKey = sKey
print(" <--- best so far")
}
println()
}
println()
println("Best key : $bestKey")
println("\nDecrypted text:\n${decrypt(enc, bestKey)}")
}
- Output:
Fit Length Key 2.984348 1 E <--- best so far 2.483684 2 EC <--- best so far 2.642487 3 TEE 1.976651 4 THEC <--- best so far 2.356881 5 EEEPU 2.203129 6 TCECEC 1.051163 7 THECSAS <--- best so far 1.645763 8 TJQGAHET 2.001380 9 VEIZSEGNT 1.824476 10 ECEGAWQTDS 1.623083 11 TNLUSRXPTAJ 1.253527 12 XLECTHQGTHEC 1.399037 13 LJJTDGFNOTENR 0.152370 14 THECHESHIRECAT <--- best so far 1.533951 15 JNTOOEEXFTGQTNH 1.068182 16 TJTSAEETEXHPXHNE 1.034093 17 AZRAXUHEJLREEXIEE 1.443345 18 VNIZQPALEPTSXSEXUC 1.090977 19 FUCAITCSLVTEZDUDEHS 0.979868 20 EQXGAHWTTQECEWUGXHPI 0.789410 21 HVRCSAFTHEBDLSTAERSES 0.881380 22 TVIJTCIGKAQPELECRXPTNC 0.952456 23 KKEQXGPWTCQEELIEHXUWASV 0.715968 24 ELAIXHQTTIEDXJETTNTGAEPC 0.891258 25 OTJUUEGERDNQTUQEAGWUTIEOA 0.852784 26 IGITEGECAGAVUNLJAHASAVTETW Best key : THECHESHIRECAT Decrypted text: THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKMYSONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDERSNATCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUMTUMTREEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESOFFLAMECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGHANDTHROUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPHINGBACKANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOHCALLAYHECHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHADFINISHEDITBUTITSRATHERHARDTOUNDERSTAND
Nim
This is a translation of Julia algorithm with some ideas from Phix translation.
import sequtils, strutils, sugar, tables, times
const
CipherText = """MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK""".splitWhitespace.join()
FreqLetters = {'E': 12.702, 'T': 9.056, 'A': 8.167, 'O': 7.507,
'I': 6.966, 'N': 6.749, 'S': 6.327, 'H': 6.094,
'R': 5.987, 'D': 4.253, 'L': 4.025, 'C': 2.782,
'U': 2.758, 'M': 2.406, 'W': 2.361, 'F': 2.228,
'G': 2.015, 'Y': 1.974, 'P': 1.929, 'B': 1.492,
'V': 0.978, 'K': 0.772, 'J': 0.153, 'X': 0.150,
'Q': 0.095, 'Z': 0.074}.toTable
FreqDigraphs = {"TH": 15.2, "HE": 12.8, "IN": 9.4, "ER": 9.4,
"AN": 8.2, "RE": 6.8, "ND": 6.3, "AT": 5.9,
"ON": 5.7, "NT": 5.6, "HA": 5.6, "ES": 5.6,
"ST": 5.5, "EN": 5.5, "ED": 5.3, "TO": 5.2,
"IT": 5.0, "OU": 5.0, "EA": 4.7, "HI": 4.6,
"IS": 4.6, "OR": 4.3, "TI": 3.4, "AS": 3.3,
"TE": 2.7, "ET": 1.9, "NG": 1.8, "OF": 1.6,
"AL": 0.9, "DE": 0.9, "SE": 0.8, "LE": 0.8,
"SA": 0.6, "SI": 0.5, "AR": 0.4, "VE": 0.4,
"RA": 0.4, "LD": 0.2, "UR": 0.2}.toTable
FreqTrigraphs = {"THE": 18.1, "AND": 7.3, "ING": 7.2, "ION": 4.2,
"ENT": 4.2, "HER": 3.6, "FOR": 3.4, "THA": 3.3,
"NTH": 3.3, "INT": 3.2, "TIO": 3.1, "ERE": 3.1,
"TER": 3.0, "EST": 2.8, "ERS": 2.8, "HAT": 2.6,
"ATI": 2.6, "ATE": 2.5, "ALL": 2.5, "VER": 2.4,
"HIS": 2.4, "HES": 2.4, "ETH": 2.4, "OFT": 2.2,
"STH": 2.1, "RES": 2.1, "OTH": 2.1, "ITH": 2.1,
"FTH": 2.1, "ONT": 2.0}.toTable
func decrypt(enc, key: string): string =
let encLen = enc.len
let keyLen = key.len
result.setLen(encLen)
var k = 0
for i in 0..<encLen:
result[i] = chr((ord(enc[i]) - ord(key[k]) + 26) mod 26 + ord('A'))
k = (k + 1) mod keyLen
func cryptanalyze(enc: string; maxKeyLen = 20): tuple[maxKey, maxDec: string] =
let encLen = enc.len
var maxScore = 0.0
for keyLen in 1..maxKeyLen:
var key = newString(keyLen)
var idx = collect(newSeq):
for i in 1..encLen:
if i mod keyLen == 0:
i - keyLen
for i in 0..<keyLen:
var maxSubscore = 0.0
for j in 'A'..'Z':
var subscore = 0.0
let encidx = idx.mapIt(enc[it]).join()
for k in decrypt(encidx, $j):
subscore += FreqLetters[k]
if subscore > maxSubscore:
maxSubscore = subscore
key[i] = j
for item in idx.mitems: inc item
let dec = decrypt(enc, key)
var score = 0.0
for i in dec:
score += FreqLetters[i]
for i in 0..(encLen - 3):
let digraph = dec[i..(i+1)]
let trigraph = dec[i..(i+2)]
score += 2 * FreqDigraphs.getOrDefault(digraph)
score += 3 * FreqTrigraphs.getOrDefault(trigraph)
if score > maxScore:
maxScore = score
result.maxKey = key
result.maxDec = dec
let t0 = cpuTime()
let (key, dec) = CipherText.cryptanalyze()
echo "key: ", key, '\n'
echo dec, '\n'
echo "Elapsed time: ", (cpuTime() - t0).formatFloat(ffDecimal, precision = 3), " s"
- Output:
key: THECHESHIRECAT THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKMYSONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDERSNATCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUMTUMTREEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESOFFLAMECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGHANDTHROUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPHINGBACKANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOHCALLAYHECHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHADFINISHEDITBUTITSRATHERHARDTOUNDERSTAND Elapsed time: 0.041 s
OCaml
Original version by User:Vanyamil.
Uses the Vigenere decrypt function from the Vigenere task solution (not included in the code below).
(* Task : Vigenere cipher/Cryptanalysis *)
(*
Given some text you suspect has been encrypted
with a Vigenère cipher, extract the key and plaintext.
Uses correlation factors similar to other solutions.
(originally tried Friedman test, didn't produce good result)
Coded in a way that allows non-english (by passing frequencies).
*)
(*** Helpers ***)
(* Implementation of Float.round to avoid v4.08 *)
let round (x : float) : float =
let rem = mod_float x 1. in
if rem >= 0.5
then ceil x
else floor x
(* A function that updates array element at a position *)
let array_update (arr : 'a array) (idx : int) (update : 'a -> 'a) : unit =
let curr = Array.get arr idx in
Array.set arr idx (update curr)
(*** Actual task at hand ***)
(* the n'th element of array is how often the n'th letter was found *)
let observe_coincidences ?(step : int = 1) ?(offset : int = 0) (text : string) : int array =
let arr = Array.make 26 0 in
let a_code = Char.code 'A' in
String.iteri (fun idx c -> if idx mod step = offset then array_update arr (Char.code c - a_code) succ) text;
arr
(* Obtain correlation factor for the observed coincidences *)
let correlation_factor ?(sort : bool = true) (coincidences : int array) (freqs : float list) : float =
let clist = Array.to_list coincidences in
let clist = (if sort then List.sort compare clist else clist) in
List.fold_left2 (fun acc c f -> acc +. (float_of_int c *. f)) 0. clist freqs
(* Translation of the test used in other Rosetta Code solutions *)
let shifted_coincidences_test (freqs : float list) (text : string) : int =
let sorted_freqs = List.sort compare freqs in
let bestCorr = -100. in
let max_keylen = String.length text / 20 in
let rec helper idx (cur_len, cur_corr) (best_len, best_corr) =
if cur_len = max_keylen then (* Finished testing everything *)
best_len
else if idx = cur_len then (* Finished testing this key length *)
let (best_len, best_corr) = if cur_corr > best_corr then (cur_len, cur_corr) else (best_len, best_corr) in
helper 0 (cur_len + 1, ~-.0.5 *. float_of_int (cur_len + 1)) (best_len, best_corr)
else
let coincidences = observe_coincidences ~step:cur_len ~offset:idx text in
let factor = correlation_factor coincidences sorted_freqs in
helper (succ idx) (cur_len, cur_corr +. factor) (best_len, best_corr)
in
helper 0 (2, ~-.1.) (1, ~-.100.)
(* Returns the most likely shift value for this set *)
let break_caesar ?(step : int = 1) ?(offset : int = 0) (text : string) (freqs : float list) : int =
let c_arr = observe_coincidences ~step ~offset text in
let rec helper l curShift (maxShift, maxCorr) =
if curShift = 26
then maxShift
else
let corr = correlation_factor ~sort:false c_arr l in
let l' = List.tl l @ [List.hd l] in
if corr > maxCorr
then helper l' (curShift + 1) (curShift, corr)
else helper l' (curShift + 1) (maxShift, maxCorr)
in
helper freqs 0 (-1, -100.)
let break (keylen : int) (text : string) (freqs : float list) : key =
let rec getCaesars idx acc =
if idx >= keylen then acc else
let shift = break_caesar ~step:keylen ~offset:idx text freqs in
let new_code = if shift = 0 then Char.code 'A' else Char.code 'Z' + 1 - shift in
getCaesars (succ idx) (acc ^ Char.(new_code |> chr |> escaped))
in
getCaesars 0 ""
let cryptanalyze (freqs : float list) (text : string) : key * string =
let text = ascii_upper_letters_only text in
let keylen = shifted_coincidences_test freqs text in
let key = break keylen text freqs in
let pt = decrypt key text in
(key, pt)
(*** Output ***)
let _ =
let long_text = "\
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH \
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD \
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS \
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG \
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ \
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS \
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT \
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST \
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH \
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV \
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW \
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO \
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR \
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX \
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB \
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA \
FWAML ZZRXJ EKAHV FASMU LVVUT TGK"
in
let english_freqs = [
0.08167; 0.01492; 0.02782; 0.04253; 0.12702; 0.02228; 0.02015;
0.06094; 0.06966; 0.00153; 0.00772; 0.04025; 0.02406; 0.06749;
0.07507; 0.01929; 0.00095; 0.05987; 0.06327; 0.09056; 0.02758;
0.00978; 0.02360; 0.00150; 0.01974; 0.00074
]
in
let (key, pt) = cryptanalyze english_freqs long_text in
Printf.printf "Key: %s\n\nText: %s" key pt
;;
- Output:
Key: THECHESHIRECAT Text: THISWASTHEPOEMTHATALICEREADJABBERWOC...
Perl
use strict;
use warnings;
use feature 'say';
# from Wikipedia
my %English_letter_freq = (
E => 12.70, L => 4.03, Y => 1.97, P => 1.93, T => 9.06, A => 8.17, O => 7.51, I => 6.97, N => 6.75,
S => 6.33, H => 6.09, R => 5.99, D => 4.25, C => 2.78, U => 2.76, M => 2.41, W => 2.36, F => 2.23,
G => 2.02, B => 1.29, V => 0.98, K => 0.77, J => 0.15, X => 0.15, Q => 0.10, Z => 0.07
);
my @alphabet = sort keys %English_letter_freq;
my $max_key_lengths = 5; # number of keylengths to try
sub myguess {
my ($text) = (@_);
my ($seqtext, @spacing, @factors, @sortedfactors, $pos, %freq, %Keys);
# Kasiski examination
$seqtext = $text;
while ($seqtext =~ /(...).*\1/) {
$seqtext = substr($seqtext, 1+index($seqtext, $1));
push @spacing, 1 + index($seqtext, $1);
}
for my $j (@spacing) {
push @factors, grep { $j % $_ == 0 } 2..$j;
}
$freq{$_}++ for @factors;
@sortedfactors = grep { $_ >= 4 } sort { $freq{$b} <=> $freq{$a} } keys %freq; # discard very short keys
for my $keylen ( @sortedfactors[0..$max_key_lengths-1] ) {
my $keyguess = '';
for (my $i = 0; $i < $keylen; $i++) {
my($mykey, %chi_values, $bestguess);
for (my $j = 0; $j < length($text); $j += $keylen) {
$mykey .= substr($text, ($j+$i) % length($text), 1);
}
for my $subkey (@alphabet) {
my $decrypted = mycrypt($mykey, $subkey);
my $length = length($decrypted);
for my $char (@alphabet) {
my $expected = $English_letter_freq{$char} * $length / 100;
my $observed;
++$observed while $decrypted =~ /$char/g;
$chi_values{$subkey} += ($observed - $expected)**2 / $expected if $observed;
}
}
$Keys{$keylen}{score} = $chi_values{'A'};
for my $sk (sort keys %chi_values) {
if ($chi_values{$sk} <= $Keys{$keylen}{score}) {
$bestguess = $sk;
$Keys{$keylen}{score} = $chi_values{$sk};
}
}
$keyguess .= $bestguess;
}
$Keys{$keylen}{key} = $keyguess;
}
map { $Keys{$_}{key} } sort { $Keys{$a}{score} <=> $Keys{$b}{score}} keys %Keys;
}
sub mycrypt {
my ($text, $key) = @_;
my ($new_text, %values_numbers);
my $keylen = length($key);
@values_numbers{@alphabet} = 0..25;
my %values_letters = reverse %values_numbers;
for (my $i = 0; $i < length($text); $i++) {
my $val = -1 * $values_numbers{substr( $key, $i%$keylen, 1)} # negative shift for decode
+ $values_numbers{substr($text, $i, 1)};
$new_text .= $values_letters{ $val % 26 };
}
return $new_text;
}
my $cipher_text = <<~'EOD';
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK
EOD
my $text = uc($cipher_text) =~ s/[^@{[join '', @alphabet]}]//gr;
for my $key ( myguess($text) ) {
say "Key $key\n" .
"Key length " . length($key) . "\n" .
"Plaintext " . substr(mycrypt($text, $key), 0, 80) . "...\n";
}
- Output:
Key THECHESHIRECAT Key length 14 Plaintext THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMB... Key THECHESCIRECATTHECHESHIRECAT Key length 28 Plaintext THISWASYHEPOEMTHATALICEREADJABBERWOHKYTWASBRILLIGANDTHESLITHYTOAESDIDGYREANDGIMB... Key TJGGAHET Key length 8 Plaintext TFGODXGHWMNKEYIVLMBJACIPPTXWTBBNFRADSITFHCOSMGOTFYPOXCASLGRDFQCJTABEDSNFPTOBYIQZ... Key THECSAS Key length 7 Plaintext THISLESHIRRYENTHATPPIQFEGKDKABBEGAOQLLVGATBRILAMGOOQVRETLITHNXOJFFFSDHYREACHGWNO... Key THEC Key length 4 Plaintext THISKXGYWOPOLYIMLODNHCIGPVZAABBEFTCHZITWHEQWTGOKFARSECAJLITHMQCATCDIKSNWPVQFFIQQ...
Phix
-- -- demo\rosetta\Cryptanalysis.exw -- with javascript_semantics atom t0 = time() constant ciphertext = substitute_all(""" MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA FWAML ZZRXJ EKAHV FASMU LVVUT TGK""",{" ","\n"},{"",""}) constant letters = new_dict( {{'E',12.702}, {'T',9.056}, {'A',8.167}, {'O',7.507}, {'I',6.966}, {'N',6.749}, {'S',6.327}, {'H',6.094}, {'R',5.987}, {'D',4.253}, {'L',4.025}, {'C',2.782}, {'U',2.758}, {'M',2.406}, {'W',2.361}, {'F',2.228}, {'G',2.015}, {'Y',1.974}, {'P',1.929}, {'B',1.492}, {'V',0.978}, {'K',0.772}, {'J',0.153}, {'X',0.150}, {'Q',0.095}, {'Z',0.074}}) constant digraphs = new_dict( {{"TH",15.2}, {"HE",12.8}, {"IN",9.4}, {"ER",9.4}, {"AN",8.2}, {"RE",6.8}, {"ND",6.3}, {"AT",5.9}, {"ON",5.7}, {"NT",5.6}, {"HA",5.6}, {"ES",5.6}, {"ST",5.5}, {"EN",5.5}, {"ED",5.3}, {"TO",5.2}, {"IT",5.0}, {"OU",5.0}, {"EA",4.7}, {"HI",4.6}, {"IS",4.6}, {"OR",4.3}, {"TI",3.4}, {"AS",3.3}, {"TE",2.7}, {"ET",1.9}, {"NG",1.8}, {"OF",1.6}, {"AL",0.9}, {"DE",0.9}, {"SE",0.8}, {"LE",0.8}, {"SA",0.6}, {"SI",0.5}, {"AR",0.4}, {"VE",0.4}, {"RA",0.4}, {"LD",0.2}, {"UR",0.2}}) constant trigraphs = new_dict( {{"THE",18.1}, {"AND",7.3}, {"ING",7.2}, {"ION",4.2}, {"ENT",4.2}, {"HER",3.6}, {"FOR",3.4}, {"THA",3.3}, {"NTH",3.3}, {"INT",3.2}, {"TIO",3.1}, {"ERE",3.1}, {"TER",3.0}, {"EST",2.8}, {"ERS",2.8}, {"HAT",2.6}, {"ATI",2.6}, {"ATE",2.5}, {"ALL",2.5}, {"VER",2.4}, {"HIS",2.4}, {"HES",2.4}, {"ETH",2.4}, {"OFT",2.2}, {"STH",2.1}, {"RES",2.1}, {"OTH",2.1}, {"ITH",2.1}, {"FTH",2.1}, {"ONT",2.0}}) function decrypt(string enc, string key) integer keylen = length(key), k = 1 string msg = repeat(' ', length(enc)) for i=1 to length(enc) do msg[i] = mod(enc[i]-key[k]+26,26)+'A' k = mod(k,keylen)+1 end for return msg end function function cryptanalyze(string enc, integer maxkeylen=20) integer enclen = length(enc) string maxkey = "", maxdec = "", k1 = " " atom maxscore = 0.0 for keylen=1 to maxkeylen do string key = repeat(' ',keylen) sequence idx = {} for i=1 to enclen do if mod(i,keylen)=0 then idx &= i-keylen+1 end if end for for i=1 to keylen do atom maxsubscore = 0.0 for j='A' to 'Z' do atom subscore = 0.0 k1[1] = j string encidx = "" for ii=1 to length(idx) do encidx &= enc[idx[ii]] end for string dec = decrypt(encidx,k1) for di=1 to length(dec) do subscore += getd(dec[di],letters) end for if subscore > maxsubscore then maxsubscore = subscore key[i] = j end if end for idx = sq_add(idx,1) end for string dec = decrypt(enc, key) atom score = 0.0 for i=1 to length(dec) do score += getd(dec[i],letters) end for for i=1 to enclen - 2 do string digraph = dec[i..i+1] string trigraph = dec[i..i + 2] score += 2 * getd(digraph,digraphs) score += 3 * getd(trigraph,trigraphs) end for if score > maxscore then maxscore = score maxkey = key maxdec = dec end if end for return {maxkey,maxdec} end function function fold(string s, integer w) for i=w to length(s) by w do s[i..i-1] = "\n" end for return s end function string {key, dec} = cryptanalyze(ciphertext) printf(1,"key: %s\n\n%s\n\n", {key, fold(dec,80)}) printf(1,"elapsed time: %3.2f seconds",{time()-t0}) {} = wait_key()
- Output:
key: THECHESHIRECAT THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIM BLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKM YSONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDER SNATCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUM TUMTREEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESO FFLAMECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGH ANDTHROUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPH INGBACKANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOH CALLAYHECHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEAL LMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHAD FINISHEDITBUTITSRATHERHARDTOUNDERSTAND elapsed time: 0.42 seconds
Python
from string import uppercase
from operator import itemgetter
def vigenere_decrypt(target_freqs, input):
nchars = len(uppercase)
ordA = ord('A')
sorted_targets = sorted(target_freqs)
def frequency(input):
result = [[c, 0.0] for c in uppercase]
for c in input:
result[c - ordA][1] += 1
return result
def correlation(input):
result = 0.0
freq = frequency(input)
freq.sort(key=itemgetter(1))
for i, f in enumerate(freq):
result += f[1] * sorted_targets[i]
return result
cleaned = [ord(c) for c in input.upper() if c.isupper()]
best_len = 0
best_corr = -100.0
# Assume that if there are less than 20 characters
# per column, the key's too long to guess
for i in xrange(2, len(cleaned) // 20):
pieces = [[] for _ in xrange(i)]
for j, c in enumerate(cleaned):
pieces[j % i].append(c)
# The correlation seems to increase for smaller
# pieces/longer keys, so weigh against them a little
corr = -0.5 * i + sum(correlation(p) for p in pieces)
if corr > best_corr:
best_len = i
best_corr = corr
if best_len == 0:
return ("Text is too short to analyze", "")
pieces = [[] for _ in xrange(best_len)]
for i, c in enumerate(cleaned):
pieces[i % best_len].append(c)
freqs = [frequency(p) for p in pieces]
key = ""
for fr in freqs:
fr.sort(key=itemgetter(1), reverse=True)
m = 0
max_corr = 0.0
for j in xrange(nchars):
corr = 0.0
c = ordA + j
for frc in fr:
d = (ord(frc[0]) - c + nchars) % nchars
corr += frc[1] * target_freqs[d]
if corr > max_corr:
m = j
max_corr = corr
key += chr(m + ordA)
r = (chr((c - ord(key[i % best_len]) + nchars) % nchars + ordA)
for i, c in enumerate(cleaned))
return (key, "".join(r))
def main():
encoded = """
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK"""
english_frequences = [
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074]
(key, decoded) = vigenere_decrypt(english_frequences, encoded)
print "Key:", key
print "\nText:", decoded
main()
Racket
Simple method
This is a simple method that just tries to find a key of any length that minimizes the difference from the expected English character distributions.
#lang at-exp racket
(define max-keylen 30)
(define text
@~a{MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK})
(define first-char (char->integer #\A))
(define chars# (- (char->integer #\Z) first-char -1))
(define freqs ; english letter frequencies from wikipedia
((compose1 list->vector (curry map (curryr / 100000.0)))
'(8167 1492 2782 4253 12702 2228 2015 6094 6966 153 772 4025 2406
6749 7507 1929 95 5987 6327 9056 2758 978 2360 150 1974 74)))
(define text* (for/vector ([c (regexp-replace* #px"\\s+" text "")])
(- (char->integer c) first-char)))
(define N (vector-length text*))
(define (col-guesses len)
(for/list ([ofs len])
(define text (for/list ([i (in-range ofs N len)]) (vector-ref text* i)))
(define cN (length text))
(define cfreqs (make-vector chars# 0))
(for ([c (in-list text)])
(vector-set! cfreqs c (add1 (vector-ref cfreqs c))))
(for ([i chars#]) (vector-set! cfreqs i (/ (vector-ref cfreqs i) cN)))
(argmin car
(for/list ([d chars#])
(cons (for/sum ([i chars#])
(expt (- (vector-ref freqs i)
(vector-ref cfreqs (modulo (+ i d) chars#)))
2))
d)))))
(define best-key
(cdr (argmin car
(for/list ([len (range 1 (add1 max-keylen))])
(define guesses (col-guesses len))
(cons (/ (apply + (map car guesses)) len) (map cdr guesses))))))
(printf "Best key found: ")
(for ([c best-key]) (display (integer->char (+ c first-char))))
(newline)
(printf "Decoded text:\n")
(define decode-num
(let ([cur '()])
(λ(n) (when (null? cur) (set! cur best-key))
(begin0 (modulo (- n (car cur)) chars#) (set! cur (cdr cur))))))
(for ([c text])
(define n (- (char->integer c) first-char))
(if (not (< -1 n chars#)) (display c)
(display (integer->char (+ first-char (decode-num n))))))
(newline)
Output:
Best key found: THECHESHIRECAT Decoded text: THISW ASTHE POEMT HATAL ICERE ADJAB BERWO CKYTW ASBRI LLIGA ...
An attempted more complete implementation
This is an attempt at following the Wikipedia description. However, it performs just as well as the simple version. Most likely because I know almost nothing about cryptography...
#lang at-exp racket
(define max-keylen 30)
(define text
@~a{MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK})
(define first-char (char->integer #\A))
(define chars# (- (char->integer #\Z) first-char -1))
(define freqs ; english letter frequencies from wikipedia
((compose1 list->vector (curry map (curryr / 100000.0)))
'(8167 1492 2782 4253 12702 2228 2015 6094 6966 153 772 4025 2406
6749 7507 1929 95 5987 6327 9056 2758 978 2360 150 1974 74)))
(define (n*n-1 n) (* n (sub1 n)))
(define text* (for/vector ([c (regexp-replace* #px"\\s+" text "")])
(- (char->integer c) first-char)))
(define N (vector-length text*))
(define (get-col-length+freqs width offset)
(define text (for/list ([i (in-range offset N width)]) (vector-ref text* i)))
(define cN (length text))
(define freqs (make-vector chars# 0))
(for ([c (in-list text)]) (vector-set! freqs c (add1 (vector-ref freqs c))))
(values cN freqs))
(define expected-IC (* chars# (for*/sum ([x freqs]) (* x x))))
;; maps key lengths to average index of coincidence
(define keylen->ICs
(for/vector ([len (in-range 1 (add1 (* max-keylen 2)))])
(for/sum ([ofs len])
(define-values [cN cfreqs] (get-col-length+freqs len ofs))
(/ (for/sum ([i chars#]) (n*n-1 (vector-ref cfreqs i)))
(/ (n*n-1 cN) chars#) len 1.0))))
;; given a key length find the key that minimizes errors from alphabet freqs,
;; return (cons average-error key)
(define (guess-key len)
(define guesses
(for/list ([ofs len])
(define-values [cN cfreqs] (get-col-length+freqs len ofs))
(for ([i chars#]) (vector-set! cfreqs i (/ (vector-ref cfreqs i) cN)))
(argmin car
(for/list ([d chars#])
(cons (for/sum ([i chars#])
(expt (- (vector-ref freqs i)
(vector-ref cfreqs (modulo (+ i d) chars#)))
2))
d)))))
(cons (/ (apply + (map car guesses)) len) (map cdr guesses)))
;; look for a key length that minimizes error from expected-IC, with some
;; stupid consideration of multiples of the length (which should also have low
;; errors), for each one guess a key, then find the one that minimizes both (in
;; a way that looks like it works, but undoubtedly is wrong in all kinds of
;; ways) and return the winner key
(define best-key
((compose1 cdr (curry argmin car))
(for/list ([i (* max-keylen 2)])
;; get the error from the expected-IC for the length and its multiples,
;; with decreasing weights for the multiples
(define with-multiples
(for/list ([j (in-range i (* max-keylen 2) (add1 i))] [div N])
(cons (/ (abs (- (vector-ref keylen->ICs j) expected-IC)) expected-IC)
(/ (add1 div)))))
(define total (/ (for/sum ([x with-multiples]) (* (car x) (cdr x)))
(for/sum ([x with-multiples]) (cdr x))))
(define guess (guess-key (add1 i)))
(define guess*total (* total (car guess) (car guess)))
;; (printf "~a~a: ~a ~s\n" (if (< i 9) " " "") (add1 i)
;; (list total (car guess) guess*total) (cdr guess))
(cons guess*total (cdr guess)))))
(printf "Best key found: ")
(for ([c best-key]) (display (integer->char (+ c first-char))))
(newline)
(printf "Decoded text:\n")
(define decode-num
(let ([cur '()])
(λ(n) (when (null? cur) (set! cur best-key))
(begin0 (modulo (- n (car cur)) chars#) (set! cur (cdr cur))))))
(for ([c text])
(define n (- (char->integer c) first-char))
(if (not (< -1 n chars#)) (display c)
(display (integer->char (+ first-char (decode-num n))))))
(newline)
Raku
(formerly Perl 6)
# from Wikipedia
constant %English-letter-freq = (
E => 12.70, L => 4.03, Y => 1.97, P => 1.93, T => 9.06, A => 8.17, O => 7.51, I => 6.97, N => 6.75,
S => 6.33, H => 6.09, R => 5.99, D => 4.25, C => 2.78, U => 2.76, M => 2.41, W => 2.36, F => 2.23,
G => 2.02, B => 1.29, V => 0.98, K => 0.77, J => 0.15, X => 0.15, Q => 0.10, Z => 0.07
);
constant @alphabet = %English-letter-freq.keys.sort;
constant max_key_lengths = 5; # number of keylengths to try
sub myguess ($text) {
my ($seqtext, @spacing, @factors, $pos, %freq, %Keys);
# Kasiski examination
$seqtext = $text;
while ($seqtext ~~ /$<sequence>=[...].*$<sequence>/) {
$seqtext = substr($seqtext, 1+index($seqtext, $<sequence>));
push @spacing, 1 + index($seqtext, $<sequence>);
}
for @spacing -> $j {
%freq{$_}++ for grep { $j %% $_ }, 2..$j;
}
# discard very short keys, and test only the most likely remaining key lengths
(%freq.keys.grep(* > 3).sort({%freq{$_}}).tail(max_key_lengths)).race(:1batch).map: -> $keylen {
my $key-guess = '';
loop (my $i = 0; $i < $keylen; $i++) {
my ($mykey, %chi-square, $best-guess);
loop (my $j = 0; $j < $text.chars; $j += $keylen) {
$mykey ~= substr($text, ($j+$i) % $text.chars, 1);
}
for @alphabet -> $subkey {
my $decrypted = mycrypt($mykey, $subkey);
my $length = $decrypted.chars;
for @alphabet -> $char {
my $expected = %English-letter-freq{$char} * $length / 100;
my $observed = $decrypted.comb.grep(* eq $char).elems;
%chi-square{$subkey} += ($observed - $expected)² / $expected if $observed;
}
}
%Keys{$keylen}{'score'} = %chi-square{@alphabet[0]};
for %chi-square.keys.sort -> $sk {
if (%chi-square{$sk} <= %Keys{$keylen}{'score'}) {
$best-guess = $sk;
%Keys{$keylen}{'score'} = %chi-square{$sk};
}
}
$key-guess ~= $best-guess;
}
%Keys{$keylen}{'key'} = $key-guess;
}
%Keys.keys.sort({ %Keys{$_}{'score'} }).map:{ %Keys{$_}{'key'} };
}
sub mycrypt ($text, $key) {
constant %values-numbers = @alphabet Z=> ^@alphabet;
constant %values-letters = %values-numbers.invert;
my ($new-text);
my $keylen = $key.chars;
loop (my $i = 0; $i < $text.chars; $i++) {
my $val = -1 * %values-numbers{substr( $key, $i%$keylen, 1)} # negative shift for decode
+ %values-numbers{substr($text, $i, 1)};
$new-text ~= %values-letters{ $val % @alphabet };
}
return $new-text;
}
my $cipher-text = .uc.trans(@alphabet => '', :c) given q:to/EOD/;
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK
EOD
for myguess($cipher-text) -> $key {
say "Key $key\n" ~
"Key length {$key.chars}\n" ~
"Plaintext {substr(mycrypt($cipher-text, $key), 0, 80)}...\n";
}
- Output:
Key THECHESHIRECAT Key length 14 Plaintext THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMB... Key THECHESCIRECATTHECHESHIRECAT Key length 28 Plaintext THISWASYHEPOEMTHATALICEREADJABBERWOHKYTWASBRILLIGANDTHESLITHYTOAESDIDGYREANDGIMB... Key TJGGAHET Key length 8 Plaintext TFGODXGHWMNKEYIVLMBJACIPPTXWTBBNFRADSITFHCOSMGOTFYPOXCASLGRDFQCJTABEDSNFPTOBYIQZ... Key THECSAS Key length 7 Plaintext THISLESHIRRYENTHATPPIQFEGKDKABBEGAOQLLVGATBRILAMGOOQVRETLITHNXOJFFFSDHYREACHGWNO... Key THEC Key length 4 Plaintext THISKXGYWOPOLYIMLODNHCIGPVZAABBEFTCHZITWHEQWTGOKFARSECAJLITHMQCATCDIKSNWPVQFFIQQ...
Rust
Note that the character to/from byte (u8) conversions work here only because the key and cryptogram are composed of ASCII characters only. Indeed, Rust's char type is a Unicode scalar value, how they are represented is well summarized in the Rust book's subchapter on strings.
use std::iter::FromIterator;
const CRYPTOGRAM: &str = "MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK";
const FREQUENCIES: [f32; 26] = [
0.08167, 0.01492, 0.02202, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153,
0.01292, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09356,
0.02758, 0.00978, 0.02560, 0.00150, 0.01994, 0.00077,
];
fn best_match(a: &[f32]) -> u8 {
let sum: f32 = a.iter().sum();
let mut best_fit = std::f32::MAX;
let mut best_rotate = 0;
for rotate in 0..=25 {
let mut fit = 0.;
for i in 0..=25 {
let char_freq = FREQUENCIES[i];
let idx = (i + rotate as usize) % 26 as usize;
let d = a[idx] / sum - char_freq;
fit += d * d / char_freq;
}
if fit < best_fit {
best_fit = fit;
best_rotate = rotate;
}
}
best_rotate
}
fn freq_every_nth(msg: &[u8], key: &mut [char]) -> f32 {
let len = msg.len();
let interval = key.len();
let mut accu = [0.; 26];
for j in 0..interval {
let mut out = [0.; 26];
for i in (j..len).step_by(interval) {
let idx = msg[i] as usize;
out[idx] += 1.;
}
let rot = best_match(&out);
key[j] = char::from(rot + b'A');
for i in 0..=25 {
let idx: usize = (i + rot as usize) % 26;
accu[i] += out[idx];
}
}
let sum: f32 = accu.iter().sum();
let mut ret = 0.;
for i in 0..=25 {
let char_freq = FREQUENCIES[i];
let d = accu[i] / sum - char_freq;
ret += d * d / char_freq;
}
ret
}
fn decrypt(text: &str, key: &str) -> String {
let key_chars_cycle = key.as_bytes().iter().map(|b| *b as i32).cycle();
let is_ascii_uppercase = |c: &u8| (b'A'..=b'Z').contains(c);
text.as_bytes()
.iter()
.filter(|c| is_ascii_uppercase(c))
.map(|b| *b as i32)
.zip(key_chars_cycle)
.fold(String::new(), |mut acc, (c, key_char)| {
let ci: u8 = ((c - key_char + 26) % 26) as u8;
acc.push(char::from(b'A' + ci));
acc
})
}
fn main() {
let enc = CRYPTOGRAM
.split_ascii_whitespace()
.collect::<Vec<_>>()
.join("");
let cryptogram: Vec<u8> = enc.as_bytes().iter().map(|b| u8::from(b - b'A')).collect();
let mut best_fit = std::f32::MAX;
let mut best_key = String::new();
for j in 1..=26 {
let mut key = vec!['\0'; j];
let fit = freq_every_nth(&cryptogram, &mut key);
let s_key = String::from_iter(key); // 'from_iter' is imported from std::iter::FromIterator;
if fit < best_fit {
best_fit = fit;
best_key = s_key;
}
}
println!("best key: {}", &best_key);
println!("\nDecrypted text:\n{}", decrypt(&enc, &best_key));
}
- Output:
best key: THECHESHIRECAT Decrypted text: THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKMYSONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDERSNATCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUMTUMTREEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESOFFLAMECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGHANDTHROUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPHINGBACKANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOHCALLAYHECHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHADFINISHEDITBUTITSRATHERHARDTOUNDERSTAND
Tcl
package require Tcl 8.6
oo::class create VigenereAnalyzer {
variable letterFrequencies sortedTargets
constructor {{frequencies {
0.08167 0.01492 0.02782 0.04253 0.12702 0.02228 0.02015
0.06094 0.06966 0.00153 0.00772 0.04025 0.02406 0.06749
0.07507 0.01929 0.00095 0.05987 0.06327 0.09056 0.02758
0.00978 0.02360 0.00150 0.01974 0.00074
}}} {
set letterFrequencies $frequencies
set sortedTargets [lsort -real $frequencies]
if {[llength $frequencies] != 26} {
error "wrong length of frequency table"
}
}
### Utility methods
# Find the value of $idxvar in the range [$from..$to) that maximizes the value
# in $scorevar (which is computed by evaluating $body)
method Best {idxvar from to scorevar body} {
upvar 1 $idxvar i $scorevar s
set bestI $from
for {set i $from} {$i < $to} {incr i} {
uplevel 1 $body
if {![info exist bestS] || $bestS < $s} {
set bestI $i
set bestS $s
}
}
return $bestI
}
# Simple list map
method Map {var list body} {
upvar 1 $var v
set result {}
foreach v $list {lappend result [uplevel 1 $body]}
return $result
}
# Simple partition of $list into $groups groups; thus, the partition of
# {a b c d e f} into 3 produces {a d} {b e} {c f}
method Partition {list groups} {
set i 0
foreach val $list {
dict lappend result $i $val
if {[incr i] >= $groups} {
set i 0
}
}
return [dict values $result]
}
### Helper methods
# Get the actual counts of different types of characters in the given list
method Frequency cleaned {
for {set i 0} {$i < 26} {incr i} {
dict set tbl $i 0
}
foreach ch $cleaned {
dict incr tbl [expr {[scan $ch %c] - 65}]
}
return $tbl
}
# Get the correlation factor of the characters in a given list with the
# class-specified language frequency corpus
method Correlation cleaned {
set result 0.0
set freq [lsort -integer [dict values [my Frequency $cleaned]]]
foreach f $freq s $sortedTargets {
set result [expr {$result + $f * $s}]
}
return $result
}
# Compute an estimate for the key length
method GetKeyLength {cleaned {required 20}} {
# Assume that we need at least 20 characters per column to guess
set bestLength [my Best i 2 [expr {[llength $cleaned] / $required}] corr {
set corr [expr {-0.5 * $i}]
foreach chars [my Partition $cleaned $i] {
set corr [expr {$corr + [my Correlation $chars]}]
}
}]
if {$bestLength == 0} {
error "text is too short to analyze"
}
return $bestLength
}
# Compute the key from the given frequency tables and the class-specified
# language frequency corpus
method GetKeyFromFreqs freqs {
foreach f $freqs {
set m [my Best i 0 26 corr {
set corr 0.0
foreach {ch count} $f {
set d [expr {($ch - $i) % 26}]
set corr [expr {$corr + $count*[lindex $letterFrequencies $d]}]
}
}]
append key [format %c [expr {65 + $m}]]
}
return $key
}
##### The main analyzer method #####
method analyze input {
# Turn the input into a clean letter sequence
set cleaned [regexp -all -inline {[A-Z]} [string toupper $input]]
# Get the (estimated) key length
set bestLength [my GetKeyLength $cleaned]
# Get the frequency mapping for the partitioned input text
set freqs [my Map p [my Partition $cleaned $bestLength] {my Frequency $p}]
# Get the key itself
return [my GetKeyFromFreqs $freqs]
}
}
Demonstration (that assumes that the Tcl solution to Vigenère cipher task is present):
set encoded "
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK
"
VigenereAnalyzer create englishVigenereAnalyzer
set key [englishVigenereAnalyzer analyze $encoded]
Vigenere create decoder $key
set decoded [decoder decrypt $encoded]
puts "Key: $key"
puts "Text: $decoded"
Vedit macro language
This implementation is fully autonomous as long as the text is long enough and there are not too many non-English words in the original text.
The text to be analysed must be in current edit buffer. A new buffer is opened to display the results.
To automatically find the best key, a dictionary is used to find English words within the decrypted text. I have used unixdict.txt, but if you do not have it available, you can use the Scribe English dictionary that comes with Vedit. However, that is unnecessarily big. A smaller dictionary is faster and may actually give better results. It might be good idea to use dictionary that only contains the most common English words.
This implementation finds the best and 2nd best Caesar key for each key position. It then checks key combinations where max one char is taken from 2nd best Caesar key. If this does not solve some encrypted text, you could increase the number of key combinations to be checked.
// (1) Copy text into tmp buffer and remove non-alpha chars.
Chdir(PATH_ONLY)
BOF
Reg_Copy(10, ALL) // copy text to new buffer
Buf_Switch(Buf_Free)
Reg_Ins(10)
BOF
Replace ("|!|A", "", BEGIN+ALL+NOERR) // remove non-alpha chars
Reg_Copy_Block(10,0,EOB_pos) // @10 = text to be analysed
#20 = Buf_Num // buffer for text being analyzed
#21 = Buf_Free // buffer for English frequency list (A-Z)
Buf_Switch(#21)
Ins_Text("8167 1492 2782 4253 12702 2228 2015 6094 6966 153 772 4025 2406 6749 7507 1929 95 5987 6327 9056 2758 978 2360 150 1974 74")
File_Open("unixdict.txt") // or use "|(MACRO_DIR)\scribe\english.vdf"
#23 = Buf_Num // buffer for dictionary
#24 = Buf_Free // buffer for key canditates
Buf_Switch(#24)
for (#1=0; #1<5; #1++) { // Fill table for 5 keys of 50 chars
Ins_Char('.', COUNT, 50)
Ins_Newline
}
#22 = Buf_Free // buffer for results
#25 = Reg_Size(10) // number of letters in the text
#26 = 26 // number of characters in the alphabet
#61 = min(#25/10, 50) // max key length to try
// (2) Check Index of coincidence (or Kp) for each key length
Buf_Switch(#22) // buffer for results
Ins_Text("KeyLen Kp dist ") Ins_Newline
Ins_Text("-----------------") Ins_Newline
#13 = Cur_Pos
#7 = 0 // no Caesar encryption
for (#5=1; #5<=#61; #5++) {
Buf_Switch(#20) // text being analyzed
BOF
#54 = 0; // sum of Kp's
for (#6=0; #6<#5; #6++) { // for each slide
Goto_Pos(#6)
Call("CHARACTER_FREQUENCIES")
Call("INDEX_OF_COINCIDENCE") // #51 = Kp * 10000
#54 += #51
}
#54 /= #5 // average of Kp's
Buf_Switch(#22)
Num_Ins(#5, COUNT, 3) // write key length
IT(": ")
Num_Ins(#54, NOCR) // average Kp
Num_Ins(670-#54) // distance to English Kp
}
Buf_Switch(#22)
Sort_Merge("5,12", #13, Cur_Pos, REVERSE) // sort the results by Kp value
Ins_Newline
// (3) Check the best 4 key lengths to find which one gives the best decrypt result
#38 = 0 // max number of correct characters found
#19 = 1 // best key length
for (#14 = 0; #14<4; #14++) { // try 4 best key lengths
Buf_Switch(#22) // results buffer
Goto_Pos(#13) Line(#14)
#5 = Num_Eval(SUPPRESS) // #5 = key length
Call("FIND_KEYS") // find Caesar key for each key character
#4 = -1 // try best match key chars only
Call("BUILD_KEY")
EOF
Ins_Text("Key length ")
Num_Ins(#5, LEFT)
Reg_Ins(10) // encrypted text
BOL
Call("DECRYPT_LINE")
BOL
Call("FIND_ENGLISH_WORDS") // #37 = number of English chars
EOL Ins_Newline
Ins_Text("Correct chars: ")
Num_Ins(#37)
if (#37 > #38) {
#38 = #37
#19 = #5
}
Update()
}
Ins_Text("Using key length: ") Num_Ins(#19) Ins_Newline
#5 = #19
Call("FIND_KEYS") // find Caesar key for each key character
// (4) Decrypt with different key combinations and try to find English words.
// Try key combinations where max one char is taken from 2nd best Caesar key.
#38 = 0 // max number of chars in English words found
#39 = -1 // best key number found
for (#4 = -1; #4 < #19; #4++)
{
Call("BUILD_KEY")
Buf_Switch(#22) // results
Reg_Ins(10) // encrypted text
BOL
Call("DECRYPT_LINE")
BOL
Update()
Call("FIND_ENGLISH_WORDS") // #37 := number of correct letters in text
if (#37 > #38) {
#38 = #37 // new highest number of correct chars
#39 = #4 // new best key
}
EOL IT(" -- ") // display results
Num_Ins(#4, COUNT, 3) // key number
Ins_Text(": ")
for (#6=0; #6<#19; #6++) { // display key
#9 = 130 + #6
Ins_Char(#@9)
}
Ins_Text(" correct chars =")
Num_Ins(#37)
}
Ins_Text("Best key = ")
Num_Ins(#39, LEFT)
#4 = #39
Ins_Newline
// Display results
//
Buf_Switch(#24) // table for key canditates
BOF
Reg_Copy_Block(14, Cur_Pos, Cur_Pos+#19) // best Caesar key chars
Line(1)
Reg_Copy_Block(15, Cur_Pos, Cur_Pos+#19) // 2nd best Caesar key chars
Call("BUILD_KEY")
Buf_Switch(#22)
Ins_Text("Key 1: ") Reg_Ins(14) Ins_Newline
Ins_Text("Key 2: ") Reg_Ins(15) Ins_Newline
Ins_Text("Key: ")
for (#6=0; #6 < #19; #6++) {
#9 = #6+130
Ins_Char(#@9)
}
Ins_Newline
Ins_Newline
// decrypt the text with selected key
Ins_Text("Decrypted text:") Ins_Newline
Reg_Ins(10)
BOL
Call("DECRYPT_LINE")
BOL Reg_Copy(13,1)
EOL Ins_Newline
// Find English words from the text
Reg_Ins(13)
Call("FIND_ENGLISH_WORDS")
EOL
Ins_Newline
Num_Ins(#37, NOCR) IT(" of ")
Num_Ins(#25, NOCR) IT(" characters are English words. ")
Ins_Newline
Buf_Switch(#20) Buf_Quit(OK)
Buf_Switch(#21) Buf_Quit(OK)
Buf_Switch(#23) Buf_Quit(OK)
Buf_Switch(#24) Buf_Quit(OK)
Statline_Message("Done!")
Return
/////////////////////////////////////////////////////////////////////////////
//
// Caesar decrypt current line and count character frequencies.
// in: #5 = step size, #7 = encryption key, #26 = num of chars in alphabet
// out: #65...#90 = frequencies, #60 = number of chars
:CHARACTER_FREQUENCIES:
Save_Pos
for (#8 = 'A'; #8<='Z'; #8++) {
#@8 = 0 // reset frequency counters
}
#60 = 0 // total number of chars
while (!At_EOL) {
if (Cur_Char >= 'A' && Cur_Char <= 'Z') {
#8 = (Cur_Char-'A'+#26-#7) % #26 + 'A' // decrypted char
#@8++
#60++
}
Char(#5)
}
Restore_Pos
Return
// Calculate Index of Coincidence (Kp).
// in: character frequencies in #65...#90, #60 = num of chars
// out: #51 = IC * 10000
//
:INDEX_OF_COINCIDENCE:
Num_Push(10,15)
#10 = 0
for (#11 = 'A'; #11<='Z'; #11++) {
#10 += (#@11 * (#@11-1)) // Calculate sigma{ni * (ni-1)}
}
#12 = #60 * (#60-1) // #12 = N * (N-1)
#51 = #10 * 10000 / #12 // #51 = Kp * 10000
Num_Pop(10,15)
Return
// Find best and 2nd best Caesar key for each character position of Vigenère key.
// in: #5=step size (key length)
// out: keys in buffer #24
//
:FIND_KEYS:
for (#6 = 0; #6 < #5; #6++) { // for each char position in the key
#30 = -1 // best key char found so far
#31 = -1 // 2nd best key char
#32 = MAXNUM // smallest error found so far
#33 = MAXNUM // 2nd smallest error found so far
for (#7 = 0; #7 < #26; #7++) { // for each possible key value
#35 = 0 // total frequency error compared to English
Buf_Switch(#20) // text being analyzed
Goto_Pos(#6)
Call("CHARACTER_FREQUENCIES")
Buf_Switch(#21) // English frequency table
BOF
for (#8 = 'A'; #8<='Z'; #8++) { // calculate total frequency error
#34 = Num_Eval(SUPPRESS+ADVANCE)
#35 += abs((#@8*100000+50000)/#60-#34)
}
if (#35 < #32) { // found better match?
#33 = #32
#32 = #35
#31 = #30
#30 = #7
} else {
if (#35 < #33) { // 2nd best match?
#33 = #35
#31 = #7
}
}
}
Buf_Switch(#24) // table for key canditates
BOF
Goto_Col(#6+1)
Ins_Char(#30+'A', OVERWRITE) // save the best match
Line(1)
Goto_Col(#6+1)
Ins_Char(#31+'A', OVERWRITE) // save 2nd best match
}
Buf_Switch(#22) // results buffer
Return
// Combine actual key from 1st and 2nd best Caesar key characters
// Use 1st key chars and (possibly) one character from 2nd key.
// #4 = index of the char to be picked from 2nd key, -1 = none.
// #5 = key length
//
:BUILD_KEY:
Buf_Switch(#24) // table for key canditates
BOF
for (#6=0; #6<#5; #6++) { // copy 1st key
#8 = 130 + #6
#@8 = Cur_Char
Char(1)
}
if (#4 >= 0) {
#8 = 130 + #4 // pick one char from 2st key
Line(1)
Goto_Col(#4+1)
#@8 = Cur_Char
}
Buf_Switch(#22) // results buffer
Return
// Decrypt text on current line
// in: #5 = key length, #130...#189 = key
//
:DECRYPT_LINE:
Num_Push(6,9)
#6 = 0
While (!At_EOL) {
#9 = #6+130
#7 = #@9
#8 = (Cur_Char - #7 + #26) % #26 + 'A' // decrypted char
Ins_Char(#8, OVERWRITE)
#6++
if (#6 >= #5) {
#6 = 0
}
}
Num_Pop(6,9)
Return
// Find English words from text on current line
// out: #37 = number of chars matched
//
:FIND_ENGLISH_WORDS:
Buf_Switch(#23) // dictionary
BOF
While (!At_EOF) {
Reg_Copy_Block(12, Cur_Pos, EOL_Pos)
if (Reg_Size(12) > 2) {
Buf_Switch(#22) // buffer for results
BOL
while (Search_Block(@12, Cur_Pos, EOL_Pos, NOERR)) {
Reg_Ins(12, OVERWRITE)
}
Buf_Switch(#23)
}
Line(1, ERRBREAK)
}
Buf_Switch(#22)
BOL
#37 = Search_Block("|V", Cur_Pos, EOL_Pos, ALL+NOERR)
Return
V (Vlang)
import strings
const encoded =
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH" +
"VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD" +
"ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS" +
"FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG" +
"ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ" +
"ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS" +
"JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT" +
"LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST" +
"MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH" +
"QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV" +
"RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW" +
"TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO" +
"SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR" +
"ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX" +
"BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB" +
"BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA" +
"FWAML ZZRXJ EKAHV FASMU LVVUT TGK"
const freq = [
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074,
]
fn sum(a []f64) f64 {
mut s := 0.0
for f in a {
s += f
}
return s
}
fn best_match(a []f64) int {
s := sum(a)
mut best_fit, mut best_rotate := 1e100, 0
for rotate in 0..26 {
mut fit := 0.0
for i in 0..26 {
d := a[(i+rotate)%26]/s - freq[i]
fit += d * d / freq[i]
}
if fit < best_fit {
best_fit, best_rotate = fit, rotate
}
}
return best_rotate
}
fn freq_every_nth(msg []int, mut key []u8) f64 {
l := msg.len
interval := key.len
mut out := []f64{len: 26}
mut accu := []f64{len: 26}
for j in 0..interval {
for z in 0..26 {
out[z] = 0.0
}
for i := j; i < l; i += interval {
out[msg[i]]++
}
rot := best_match(out)
key[j] = u8(rot + 65)
for i := 0; i < 26; i++ {
accu[i] += out[(i+rot)%26]
}
}
s := sum(accu)
mut ret := 0.0
for i := 0; i < 26; i++ {
d := accu[i]/s - freq[i]
ret += d * d / freq[i]
}
return ret
}
fn decrypt(text string, key string) string {
mut sb := strings.new_builder(128)
mut ki := 0
for c in text {
if c < 'A'[0] || c > 'Z'[0] {
continue
}
ci := (c - key[ki] + 26) % 26
sb.write_rune(ci + 65)
ki = (ki + 1) % key.len
}
return sb.str()
}
fn main() {
enc := encoded.replace(" ", "")
mut txt := []int{len: enc.len}
for i in 0..txt.len {
txt[i] = int(enc[i] - 'A'[0])
}
mut best_fit, mut best_key := 1e100, ""
println(" Fit Length Key")
for j := 1; j <= 26; j++ {
mut key := []u8{len: j}
fit := freq_every_nth(txt, mut key)
s_key := key.bytestr()
print("${fit:.6} ${j:2} $s_key")
if fit < best_fit {
best_fit, best_key = fit, s_key
print(" <--- best so far")
}
println('')
}
println("\nBest key : $best_key")
println("\nDecrypted text:\n${decrypt(enc, best_key)}")
}
- Output:
Note: carriage returns inserted into decrypted text after every 80 characters to make it more readable.
Fit Length Key 2.984348 1 E <--- best so far 2.483684 2 EC <--- best so far 2.642487 3 TEE 1.976651 4 THEC <--- best so far 2.356881 5 EEEPU 2.203129 6 TCECEC 1.051163 7 THECSAS <--- best so far 1.645763 8 TJQGAHET 2.001380 9 VEIZSEGNT 1.824476 10 ECEGAWQTDS 1.623083 11 TNLUSRXPTAJ 1.253527 12 XLECTHQGTHEC 1.399037 13 LJJTDGFNOTENR 0.152370 14 THECHESHIRECAT <--- best so far 1.533951 15 JNTOOEEXFTGQTNH 1.068182 16 TJTSAEETEXHPXHNE 1.034093 17 AZRAXUHEJLREEXIEE 1.443345 18 VNIZQPALEPTSXSEXUC 1.090977 19 FUCAITCSLVTEZDUDEHS 0.979868 20 EQXGAHWTTQECEWUGXHPI 0.789410 21 HVRCSAFTHEBDLSTAERSES 0.881380 22 TVIJTCIGKAQPELECRXPTNC 0.952456 23 KKEQXGPWTCQEELIEHXUWASV 0.715968 24 ELAIXHQTTIEDXJETTNTGAEPC 0.891258 25 OTJUUEGERDNQTUQEAGWUTIEOA 0.852784 26 IGITEGECAGAVUNLJAHASAVTETW Best key : THECHESHIRECAT Decrypted text: THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMB LEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKMYS ONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDERSNA TCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUMTUMT REEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESOFFLAM ECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGHANDTHR OUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPHINGBACK ANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOHCALLAYHE CHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWER ETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHADFINISHEDIT BUTITSRATHERHARDTOUNDERSTAND
Wren
import "./math" for Nums
import "./iterate" for Stepped
import "./str" for Char, Str
import "./fmt" for Fmt
var encoded =
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH" +
"VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD" +
"ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS" +
"FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG" +
"ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ" +
"ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS" +
"JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT" +
"LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST" +
"MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH" +
"QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV" +
"RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW" +
"TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO" +
"SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR" +
"ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX" +
"BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB" +
"BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA" +
"FWAML ZZRXJ EKAHV FASMU LVVUT TGK"
var freq = [
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074
]
var bestMatch = Fn.new { |a|
var sum = Nums.sum(a)
var bestFit = 1e100
var bestRotate = 0
for (rotate in 0..25) {
var fit = 0
for (i in 0..25) {
var d = a[(i + rotate) % 26] / sum - freq[i]
fit = fit + d * d / freq[i]
}
if (fit < bestFit) {
bestFit = fit
bestRotate = rotate
}
}
return bestRotate
}
var freqEveryNth = Fn.new { |msg, key|
var len = msg.count
var interval = key.count
var out = List.filled(26, 0)
var accu = List.filled(26, 0)
for (j in 0...interval) {
for (i in 0..25) out[i] = 0
for (i in Stepped.new(j...len, interval)) out[msg[i]] = out[msg[i]] + 1
var rot = bestMatch.call(out)
key[j] = Char.fromCode(rot + 65)
for (i in 0..25) accu[i] = accu[i] + out[(i + rot) % 26]
}
var sum = Nums.sum(accu)
var ret = 0
for (i in 0..25) {
var d = accu[i] / sum - freq[i]
ret = ret + d * d / freq[i]
}
return ret
}
var decrypt = Fn.new { |text, key|
var sb = ""
var ki = 0
for (c in text) {
if (Char.isAsciiUpper(c)) {
var ci = (c.bytes[0] - key[ki].bytes[0] + 26) % 26
sb = sb + Char.fromCode(ci + 65)
ki = (ki + 1) % key.count
}
}
return sb
}
var enc = encoded.replace(" ", "")
var txt = List.filled(enc.count, 0)
for (i in 0...txt.count) txt[i] = Char.code(enc[i]) - 65
var bestFit = 1e100
var bestKey = ""
var f = "$f $2d $s"
System.print(" Fit Length Key")
for (j in 1..26) {
var key = List.filled(j, "")
var fit = freqEveryNth.call(txt, key)
var sKey = key.join("")
Fmt.write(f, fit, j, sKey)
if (fit < bestFit) {
bestFit = fit
bestKey = sKey
System.write(" <--- best so far")
}
System.print()
}
System.print()
System.print("Best key : %(bestKey)")
System.print("\nDecrypted text:\n%(decrypt.call(enc, bestKey))")
- Output:
Fit Length Key 2.984348 1 E <--- best so far 2.483684 2 EC <--- best so far 2.642487 3 TEE 1.976651 4 THEC <--- best so far 2.356881 5 EEEPU 2.203129 6 TCECEC 1.051163 7 THECSAS <--- best so far 1.645763 8 TJQGAHET 2.001380 9 VEIZSEGNT 1.824476 10 ECEGAWQTDS 1.623083 11 TNLUSRXPTAJ 1.253527 12 XLECTHQGTHEC 1.399037 13 LJJTDGFNOTENR 0.152370 14 THECHESHIRECAT <--- best so far 1.533951 15 JNTOOEEXFTGQTNH 1.068182 16 TJTSAEETEXHPXHNE 1.034093 17 AZRAXUHEJLREEXIEE 1.443345 18 VNIZQPALEPTSXSEXUC 1.090977 19 FUCAITCSLVTEZDUDEHS 0.979868 20 EQXGAHWTTQECEWUGXHPI 0.789410 21 HVRCSAFTHEBDLSTAERSES 0.881380 22 TVIJTCIGKAQPELECRXPTNC 0.952456 23 KKEQXGPWTCQEELIEHXUWASV 0.715968 24 ELAIXHQTTIEDXJETTNTGAEPC 0.891258 25 OTJUUEGERDNQTUQEAGWUTIEOA 0.852784 26 IGITEGECAGAVUNLJAHASAVTETW Best key : THECHESHIRECAT Decrypted text: THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKMYSONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDERSNATCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUMTUMTREEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESOFFLAMECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGHANDTHROUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPHINGBACKANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOHCALLAYHECHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHADFINISHEDITBUTITSRATHERHARDTOUNDERSTAND
zkl
var[const] uppercase=["A".."Z"].pump(String),
english_frequences=T( // A..Z
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.02360, 0.00150, 0.01974, 0.00074);
fcn vigenere_decrypt(target_freqs, input){ // ( (float,...), string)
nchars,ordA :=uppercase.len(),"A".toAsc();
sorted_targets:=target_freqs.sort();
frequency:='wrap(input){ // (n,n,n,n,...), n is ASCII index ("A"==65)
result:=uppercase.pump(List(),List.fp1(0)); // ( ("A",0),("B",0) ...)
foreach c in (input){ result[c - ordA][1] += 1 }
result // --> mutable list of mutable lists ( ("A",Int)...("Z",Int) )
};
correlation:='wrap(input){ // (n,n,n,n,...), n is ASCII index ("A"==65)
result,freq:=0.0, frequency(input);
freq.sort(fcn([(_,a)],[(_,b)]){ a<b }); // sort letters by frequency
foreach i,f in (freq.enumerate()){ result+=sorted_targets[i]*f[1] }
result // -->Float
};
cleaned:=input.toUpper().pump(List,uppercase.holds,Void.Filter,"toAsc");
best_len,best_corr := 0,-100.0;
# Assume that if there are less than 20 characters
# per column, the key's too long to guess
foreach i in ([2..cleaned.len()/20]){
pieces:=(i).pump(List,List.copy); // ( (),() ... )
foreach c in (cleaned){ pieces[__cWalker.idx%i].append(c) }
# The correlation seems to increase for smaller
# pieces/longer keys, so weigh against them a little
corr:=-0.5*i + pieces.apply(correlation).sum(0.0);
if(corr>best_corr) best_len,best_corr=i,corr;
}
if(best_len==0) return("Text is too short to analyze", "");
pieces:=best_len.pump(List,List.copy);
foreach c in (cleaned){ pieces[__cWalker.idx%best_len].append(c) }
key,freqs := "",pieces.apply(frequency);
foreach fr in (freqs){
fr.sort(fcn([(_,a)],[(_,b)]){ a>b }); // reverse sort by freq
m,max_corr := 0,0.0;
foreach j in (nchars){
corr,c := 0.0,ordA + j;
foreach frc in (fr){
d:=(frc[0].toAsc() - c + nchars) % nchars;
corr+=target_freqs[d]*frc[1];
if(corr>max_corr) m,max_corr=j,corr;
}
}
key+=(m + ordA).toChar();
}
cleaned.enumerate().apply('wrap([(i,c])){
( (c - (key[i%best_len]).toAsc() + nchars)%nchars + ordA ).toChar()
}).concat() :
T(key,_);
}
encryptedText:=
#<<<
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH
VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD
ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS
FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG
ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ
ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS
JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT
LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST
MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH
QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV
RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW
TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO
SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR
ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX
BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB
BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA
FWAML ZZRXJ EKAHV FASMU LVVUT TGK";
#<<<
key,decoded:=vigenere_decrypt(english_frequences,encryptedText);
println("Key:", key);
println("Decoded text:", decoded);
- Output:
Key:THECHESHIRECAT Decoded text:THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEBEWARETHEJABBERWOCKMYSONTHEJAWSTHATBITETHECLAWSTHATCATCHBEWARETHEJUBJUBBIRDANDSHUNTHEFRUMIOUSBANDERSNATCHHETOOKHISVORPALSWORDINHANDLONGTIMETHEMANXOMEFOEHESOUGHTSORESTEDHEBYTHETUMTUMTREEANDSTOODAWHILEINTHOUGHTANDASINUFFISHTHOUGHTHESTOODTHEJABBERWOCKWITHEYESOFFLAMECAMEWHIFFLINGTHROUGHTHETULGEYWOODANDBURBLEDASITCAMEONETWOONETWOANDTHROUGHANDTHROUGHTHEVORPALBLADEWENTSNICKERSNACKHELEFTITDEADANDWITHITSHEADHEWENTGALUMPHINGBACKANDHASTTHOUSLAINTHEJABBERWOCKCOMETOMYARMSMYBEAMISHBOYOFRABJOUSDAYCALLOOHCALLAYHECHORTLEDINHISJOYTWASBRILLIGANDTHESLITHYTOVESDIDGYREANDGIMBLEINTHEWABEALLMIMSYWERETHEBOROGOVESANDTHEMOMERATHSOUTGRABEITSEEMSVERYPRETTYSHESAIDWHENSHEHADFINISHEDITBUTITSRATHERHARDTOUNDERSTAND