Vigenère cipher/Cryptanalysis
Given some text you suspect has been encrypted with a Vigenère cipher, extract the key and plaintext. There are several methods for doing this. See the Wikipedia entry for more information. Use the following encrypted text:
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA FWAML ZZRXJ EKAHV FASMU LVVUT TGK
Letter frequencies for English can be found here.
Specifics for this task:
- Take only the ciphertext as input. You can assume it's all capitalized and has no punctuation, but it might have whitespace.
- Assume the plaintext is written in English.
- Find and output the key.
- Use that key to decrypt and output the original plaintext. Maintaining the whitespace from the ciphertext is optional.
- The algorithm doesn't have to be perfect (which may not be possible) but it should work when given enough ciphertext. The example above is fairly long, and should be plenty for any algorithm.
Ada
The program is not fully auto, but makes a small number of suggestions for the right key and plaintext. <lang Ada>with Ada.Text_IO;
procedure Vignere_Cryptanalysis is
subtype Letter is Character range 'A' .. 'Z';
function "+"(X, Y: Letter) return Letter is begin return Character'Val( ( (Character'Pos(X)-Character'Pos('A')) + (Character'Pos(Y)-Character'Pos('A')) ) mod 26 + Character'Pos('A')); end;
function "-"(X, Y: Letter) return Letter is begin return Character'Val( ( (Character'Pos(X)-Character'Pos('A')) - (Character'Pos(Y)-Character'Pos('A')) ) mod 26 + Character'Pos('A')); end;
type Frequency_Array is array (Letter) of Float;
English: Frequency_Array := ( 0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150, 0.01974, 0.00074 );
function Get_Frequency(S: String) return Frequency_Array is Result: Frequency_Array := (others => 0.0); Offset: Float := 1.0/Float(S'Length); begin for I in S'Range loop if S(I) in Letter then Result(S(I)) := Result(S(I)) + Offset; end if; end loop; return Result; end Get_Frequency;
function Remove_Whitespace(S: String) return String is begin if S="" then return ""; elsif S(S'First) in Letter then return S(S'First) & Remove_Whitespace(S(S'First+1 .. S'Last)); else return Remove_Whitespace(S(S'First+1 .. S'Last)); end if; end Remove_Whitespace;
function Distance(A, B: Frequency_Array; Offset: Character := 'A') return Float is Result: Float := 0.0; Diff: Float; begin for C in A'Range loop Diff := A(C+Offset) - B(C); Result := Result + (Diff * Diff); end loop; return Result; end Distance;
function Find_Key(Cryptogram: String; Key_Length: Positive) return String is
function Find_Caesar_Key(S: String) return Letter is Frequency: Frequency_Array := Get_Frequency(S); Candidate: Letter := 'A'; -- a fake candidate Candidate_Dist : Float := Distance(Frequency, English, 'A'); New_Dist: Float;
begin
for L in Letter range 'B' .. 'Z' loop New_Dist := Distance(Frequency, English, L); if New_Dist <= Candidate_Dist then Candidate_Dist := New_Dist; Candidate := L; end if; end loop; return Candidate; end Find_Caesar_Key;
function Get_Slide(S: String; Step: Positive) return String is begin if S'Length= 0 then return ""; else return S(S'First) & Get_Slide(S(S'First+Step .. S'Last), Step); end if; end Get_Slide;
Key: String(1 .. Key_Length);
S: String renames Cryptogram;
begin for I in Key'Range loop Key(I) := Find_Caesar_Key(Get_Slide(S(S'First+I-1 .. S'Last), Key_Length)); end loop; return Key; end Find_Key;
function Key_Char(Key: String; Index: Positive) return Letter is begin if Index > Key'Last then return Key_Char(Key, Index-Key'Last); else return Key(Index); end if; end Key_Char;
Ciphertext: String := Remove_Whitespace( "MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH" & "VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD" & "ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS" & "FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG" & "ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ" & "ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS" & "JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT" & "LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST" & "MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH" & "QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV" & "RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW" & "TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO" & "SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR" & "ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX" & "BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB" & "BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA" & "FWAML ZZRXJ EKAHV FASMU LVVUT TGK");
Best_Plain: String := Ciphertext; Best_Dist: Float := Distance(English, Get_Frequency(Best_Plain)); Best_Key: String := Ciphertext; Best_Key_L: Natural := 0;
begin -- Vignere_Cryptanalysis
for I in 1 .. Ciphertext'Length/10 loop declare Key: String(1 .. I) := Find_Key(Ciphertext, I); Plaintext: String(Ciphertext'Range); begin for I in Ciphertext'Range loop Plaintext(I) := Ciphertext(I) - Key_Char(Key, I); end loop; if Distance(English, Get_Frequency(Plaintext)) < Best_Dist then Best_Plain := Plaintext; Best_Dist := Distance(English, Get_Frequency(Plaintext)); Best_Key(1 .. I) := Key; Best_Key_L := I; if Best_dist < 0.01 then declare use Ada.Text_IO; begin Put_Line("Key =" & Best_Key(1 .. Best_Key_L)); Put_Line("Distance = " & Float'Image(Best_Dist)); New_Line; Put_Line("Plaintext ="); Put_Line(Best_Plain); New_Line; New_Line; end; end if; end if; end; end loop;
end Vignere_Cryptanalysis;</lang>
C
This finds the right key (I think, I didn't try to decode it after getting the key). The program is not fully auto, but by its output, the result is pretty obvious. <lang C>#include <stdio.h>
- include <stdlib.h>
- include <string.h>
- include <ctype.h>
- include <math.h>
char *encoded =
"MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH" "VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD" "ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS" "FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG" "ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ" "ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS" "JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT" "LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST" "MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH" "QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV" "RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW" "TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO" "SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR" "ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX" "BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB" "BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA" "FWAML ZZRXJ EKAHV FASMU LVVUT TGK";
double freq[] = {
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150, 0.01974, 0.00074
};
int best_match(double *a, double *b) {
double sum = 0, fit, d, best_fit = 1e100; int i, rotate, best_rotate = 0; for (i = 0; i < 26; i++) sum += a[i]; for (rotate = 0; rotate < 26; rotate++) { fit = 0; for (i = 0; i < 26; i++) { d = a[(i + rotate) % 26] / sum - b[i]; fit += d * d / b[i]; }
if (fit < best_fit) { best_fit = fit; best_rotate = rotate; } }
return best_rotate;
}
double freq_every_nth(int *msg, int len, int interval, char *key) {
double sum, d, ret; double out[26], accu[26] = {0}; int i, j, rot;
for (j = 0; j < interval; j++) { for (i = 0; i < 26; i++) out[i] = 0; for (i = j; i < len; i += interval) out[msg[i]]++; key[j] = rot = best_match(out, freq); key[j] += 'A'; for (i = 0; i < 26; i++) accu[i] += out[(i + rot) % 26]; }
for (i = 0, sum = 0; i < 26; i++) sum += accu[i];
for (i = 0, ret = 0; i < 26; i++) { d = accu[i] / sum - freq[i]; ret += d * d / freq[i]; }
key[interval] = '\0'; return ret;
}
int main() {
int txt[strlen(encoded)]; int len = 0, j; char key[100]; double fit, best_fit = 1e100;
for (j = 0; encoded[j] != '\0'; j++) if (isupper(encoded[j])) txt[len++] = encoded[j] - 'A';
for (j = 1; j < 30; j++) { fit = freq_every_nth(txt, len, j, key); printf("%f, key length: %2d, %s", fit, j, key); if (fit < best_fit) { best_fit = fit; printf(" <--- best so far"); } printf("\n"); }
return 0;
}</lang>
C++
Not guaranteed to give a 100% correct answer, but it works here. Requires C++0x.
<lang cpp>#include <iostream>
- include <string>
- include <vector>
- include <map>
- include <algorithm>
- include <array>
using namespace std;
typedef array<pair<char, double>, 26> FreqArray;
class VigenereAnalyser { private:
array<double, 26> targets; array<double, 26> sortedTargets; FreqArray freq;
// Update the freqs array FreqArray& frequency(const string& input) { for (char c = 'A'; c <= 'Z'; ++c) freq[c - 'A'] = make_pair(c, 0);
for (size_t i = 0; i < input.size(); ++i) freq[input[i] - 'A'].second++;
return freq; }
double correlation(const string& input) { double result = 0.0; frequency(input);
sort(freq.begin(), freq.end(), [](pair<char, double> u, pair<char, double> v)->bool { return u.second < v.second; });
for (size_t i = 0; i < 26; ++i) result += freq[i].second * sortedTargets[i];
return result; }
public:
VigenereAnalyser(const array<double, 26>& targetFreqs) { targets = targetFreqs; sortedTargets = targets; sort(sortedTargets.begin(), sortedTargets.end()); }
pair<string, string> analyze(string input) { string cleaned; for (size_t i = 0; i < input.size(); ++i) { if (input[i] >= 'A' && input[i] <= 'Z') cleaned += input[i]; else if (input[i] >= 'a' && input[i] <= 'z') cleaned += input[i] + 'A' - 'a'; }
size_t bestLength = 0; double bestCorr = -100.0;
// Assume that if there are less than 20 characters // per column, the key's too long to guess for (size_t i = 2; i < cleaned.size() / 20; ++i) { vector<string> pieces(i); for (size_t j = 0; j < cleaned.size(); ++j) pieces[j % i] += cleaned[j];
// The correlation increases artificially for smaller // pieces/longer keys, so weigh against them a little double corr = -0.5*i; for (size_t j = 0; j < i; ++j) corr += correlation(pieces[j]);
if (corr > bestCorr) { bestLength = i; bestCorr = corr; } }
if (bestLength == 0) return make_pair("Text is too short to analyze", "");
vector<string> pieces(bestLength); for (size_t i = 0; i < cleaned.size(); ++i) pieces[i % bestLength] += cleaned[i];
vector<FreqArray> freqs; for (size_t i = 0; i < bestLength; ++i) freqs.push_back(frequency(pieces[i]));
string key = ""; for (size_t i = 0; i < bestLength; ++i) { sort(freqs[i].begin(), freqs[i].end(), [](pair<char, double> u, pair<char, double> v)->bool { return u.second > v.second; });
size_t m = 0; double mCorr = 0.0; for (size_t j = 0; j < 26; ++j) { double corr = 0.0; char c = 'A' + j; for (size_t k = 0; k < 26; ++k) { int d = (freqs[i][k].first - c + 26) % 26; corr += freqs[i][k].second * targets[d]; }
if (corr > mCorr) { m = j; mCorr = corr; } }
key += m + 'A'; }
string result = ""; for (size_t i = 0; i < cleaned.size(); ++i) result += (cleaned[i] - key[i % key.length()] + 26) % 26 + 'A';
return make_pair(result, key); }
};
int main() {
string input = "MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH" "VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD" "ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS" "FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG" "ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ" "ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS" "JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT" "LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST" "MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH" "QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV" "RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW" "TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO" "SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR" "ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX" "BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB" "BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA" "FWAML ZZRXJ EKAHV FASMU LVVUT TGK";
array<double, 26> english = { 0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150, 0.01974, 0.00074};
VigenereAnalyser va(english); pair<string, string> output = va.analyze(input);
cout << "Key: " << output.second << endl << endl; cout << "Text: " << output.first << endl;
}</lang>
D
<lang d>import std.stdio, std.algorithm, std.typecons, std.string,
std.array, std.numeric, std.ascii;
string[2] vigenereDecrypt(in double[] targetFreqs, in string input) {
enum nAlpha = std.ascii.uppercase.length;
static double correlation(in string txt, in double[] sTargets) /*pure nothrow*/ { uint[nAlpha] charCounts = 0; foreach (immutable c; txt) charCounts[c - 'A']++; return charCounts[].sort().release().dotProduct(sTargets); }
static frequency(in string txt) pure nothrow { auto freqs = new Tuple!(char,"c", uint,"d")[nAlpha]; foreach (immutable i, immutable c; std.ascii.uppercase) freqs[i] = tuple(c, 0); foreach (immutable c; txt) freqs[c - 'A'].d++; return freqs; }
static string[2] decode(in string cleaned, in string key) pure nothrow { assert(key.length > 0); string decoded; foreach (immutable i, immutable c; cleaned) decoded ~= (c - key[i % $] + nAlpha) % nAlpha + 'A'; return [key, decoded]; }
static size_t findBestLength(in string cleaned, in double[] sTargets) /*pure nothrow*/ { size_t bestLength; double bestCorr = -100.0;
// Assume that if there are less than 20 characters // per column, the key's too long to guess foreach (immutable i; 2 .. cleaned.length / 20) { auto pieces = new Appender!string[i]; foreach (immutable j, immutable c; cleaned) pieces[j % i].put(c);
// The correlation seems to increase for smaller // pieces/longer keys, so weigh against them a little double corr = -0.5 * i; foreach (const p; pieces) corr += correlation(p.data, sTargets);
if (corr > bestCorr) { bestLength = i; bestCorr = corr; } }
return bestLength; }
static string findKey(in string cleaned, in size_t bestLength, in double[] targetFreqs) /*pure*/ { auto pieces = new string[bestLength]; foreach (immutable i, immutable c; cleaned) pieces[i % bestLength] ~= c; auto freqs = map!frequency(pieces).array();
string key; foreach (immutable i, fi; freqs) { fi.sort!q{ a.d > b.d }();
size_t m; double maxCorr = 0.0; foreach (immutable j, immutable c; std.ascii.uppercase) { double corr = 0.0; foreach (immutable k; 0 .. nAlpha) { immutable di = (fi[k].c - c + nAlpha) % nAlpha; corr += fi[k].d * targetFreqs[di]; }
if (corr > maxCorr) { m = j; maxCorr = corr; } }
key ~= m + 'A'; }
return key; }
immutable cleaned = input.toUpper().removechars("^A-Z");
//immutable sortedTargets = sorted(targetFreqs); immutable sortedTargets = targetFreqs.dup.sort().release().idup;
immutable bestLength = findBestLength(cleaned, sortedTargets); if (bestLength == 0) throw new Exception("Text is too short to analyze.");
immutable string key = findKey(cleaned, bestLength, targetFreqs); return decode(cleaned, key);
}
void main() {
immutable encoded = "MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG
JSPXY ALUYM NSMYH VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA FWAML ZZRXJ EKAHV FASMU LVVUT TGK";
immutable englishFrequences = [0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150, 0.01974, 0.00074];
immutable key_dec = vigenereDecrypt(englishFrequences, encoded); writefln("Key: %s\n\nText: %s", key_dec[0], key_dec[1]);
}</lang> Output (cut):
Key: THECHESHIRECAT Text: THISWASTHEPOEMTHATALICEREADJABBERWOCKYTWASBRILLIGANDTHESLITHY...
Python
<lang python>from string import uppercase from operator import itemgetter
def vigenere_decrypt(target_freqs, input):
nchars = len(uppercase) ordA = ord('A') sorted_targets = sorted(target_freqs)
def frequency(input): result = [[c, 0.0] for c in uppercase] for c in input: result[c - ordA][1] += 1 return result
def correlation(input): result = 0.0 freq = frequency(input) freq.sort(key=itemgetter(1))
for i, f in enumerate(freq): result += f[1] * sorted_targets[i] return result
cleaned = [ord(c) for c in input.upper() if c.isupper()] best_len = 0 best_corr = -100.0
# Assume that if there are less than 20 characters # per column, the key's too long to guess for i in xrange(2, len(cleaned) // 20): pieces = [[] for _ in xrange(i)] for j, c in enumerate(cleaned): pieces[j % i].append(c)
# The correlation seems to increase for smaller # pieces/longer keys, so weigh against them a little corr = -0.5 * i + sum(correlation(p) for p in pieces)
if corr > best_corr: best_len = i best_corr = corr
if best_len == 0: return ("Text is too short to analyze", "")
pieces = [[] for _ in xrange(best_len)] for i, c in enumerate(cleaned): pieces[i % best_len].append(c)
freqs = [frequency(p) for p in pieces]
key = "" for i in xrange(best_len): freqs[i].sort(key=itemgetter(1), reverse=True)
m = 0 max_corr = 0.0 for j in xrange(nchars): corr = 0.0 c = ordA + j for k in xrange(nchars): d = (ord(freqs[i][k][0]) - c + nchars) % nchars corr += freqs[i][k][1] * target_freqs[d]
if corr > max_corr: m = j max_corr = corr
key += chr(m + ordA)
r = (chr((c - ord(key[i % best_len]) + nchars) % nchars + ordA) for i, c in enumerate(cleaned)) return (key, "".join(r))
def main():
encoded = """ MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA FWAML ZZRXJ EKAHV FASMU LVVUT TGK"""
english_frequences = [ 0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150, 0.01974, 0.00074]
(key, decoded) = vigenere_decrypt(english_frequences, encoded) print "Key:", key print "\nText:", decoded
main()</lang>
Tcl
<lang tcl>package require Tcl 8.6
oo::class create VigenereAnalyzer {
variable letterFrequencies sortedTargets constructor {{frequencies { 0.08167 0.01492 0.02782 0.04253 0.12702 0.02228 0.02015
0.06094 0.06966 0.00153 0.00772 0.04025 0.02406 0.06749 0.07507 0.01929 0.00095 0.05987 0.06327 0.09056 0.02758 0.00978 0.02360 0.00150 0.01974 0.00074
}}} {
set letterFrequencies $frequencies set sortedTargets [lsort -real $frequencies] if {[llength $frequencies] != 26} { error "wrong length of frequency table" }
}
### Utility methods # Find the value of $idxvar in the range [$from..$to) that maximizes the value # in $scorevar (which is computed by evaluating $body) method Best {idxvar from to scorevar body} {
upvar 1 $idxvar i $scorevar s set bestI $from for {set i $from} {$i < $to} {incr i} { uplevel 1 $body if {![info exist bestS] || $bestS < $s} { set bestI $i set bestS $s } } return $bestI
} # Simple list map method Map {var list body} {
upvar 1 $var v set result {} foreach v $list {lappend result [uplevel 1 $body]} return $result
} # Simple partition of $list into $groups groups; thus, the partition of # {a b c d e f} into 3 produces {a d} {b e} {c f} method Partition {list groups} {
set i 0 foreach val $list { dict lappend result $i $val if {[incr i] >= $groups} { set i 0 } } return [dict values $result]
}
### Helper methods # Get the actual counts of different types of characters in the given list method Frequency cleaned {
for {set i 0} {$i < 26} {incr i} { dict set tbl $i 0 } foreach ch $cleaned { dict incr tbl [expr {[scan $ch %c] - 65}] } return $tbl
}
# Get the correlation factor of the characters in a given list with the # class-specified language frequency corpus method Correlation cleaned {
set result 0.0 set freq [lsort -integer [dict values [my Frequency $cleaned]]] foreach f $freq s $sortedTargets { set result [expr {$result + $f * $s}] } return $result
}
# Compute an estimate for the key length method GetKeyLength {cleaned {required 20}} {
# Assume that we need at least 20 characters per column to guess set bestLength [my Best i 2 [expr {[llength $cleaned] / $required}] corr { set corr [expr {-0.5 * $i}] foreach chars [my Partition $cleaned $i] { set corr [expr {$corr + [my Correlation $chars]}] } }] if {$bestLength == 0} { error "text is too short to analyze" } return $bestLength
}
# Compute the key from the given frequency tables and the class-specified # language frequency corpus method GetKeyFromFreqs freqs {
foreach f $freqs { set m [my Best i 0 26 corr { set corr 0.0 foreach {ch count} $f { set d [expr {($ch - $i) % 26}] set corr [expr {$corr + $count*[lindex $letterFrequencies $d]}] } }] append key [format %c [expr {65 + $m}]] } return $key
}
##### The main analyzer method ##### method analyze input {
# Turn the input into a clean letter sequence set cleaned [regexp -all -inline {[A-Z]} [string toupper $input]] # Get the (estimated) key length set bestLength [my GetKeyLength $cleaned] # Get the frequency mapping for the partitioned input text set freqs [my Map p [my Partition $cleaned $bestLength] {my Frequency $p}] # Get the key itself return [my GetKeyFromFreqs $freqs]
}
}</lang> Demonstration (that assumes that the Tcl solution to Vigenère cipher task is present): <lang tcl>set encoded "
MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA FWAML ZZRXJ EKAHV FASMU LVVUT TGK
" VigenereAnalyzer create englishVigenereAnalyzer set key [englishVigenereAnalyzer analyze $encoded] Vigenere create decoder $key set decoded [decoder decrypt $encoded] puts "Key: $key" puts "Text: $decoded"</lang>