How to resolve the algorithm Vigenère cipher/Cryptanalysis step by step in the Haskell programming language
Published on 7 June 2024 03:52 AM
How to resolve the algorithm Vigenère cipher/Cryptanalysis step by step in the Haskell programming language
Table of Contents
Problem Statement
Given some text you suspect has been encrypted with a Vigenère cipher, extract the key and plaintext. There are several methods for doing this. See the Wikipedia entry for more information. Use the following encrypted text: Letter frequencies for English can be found here. Specifics for this task:
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Vigenère cipher/Cryptanalysis step by step in the Haskell programming language
The provided code is written in Haskell and it's a program that attempts to break a simple substitution cipher. A substitution cipher is a type of encryption where each character in the plaintext is replaced by another character. The program uses a statistical approach to guess the key used in the cipher.
Here's how it works:
- Frequency Analysis:
- The program reads the encrypted text stored in the variable
crypt
and removes all non-alphabetic characters. - It assumes that the key is a string of characters and that the plaintext is in English.
- Distribution Generation:
- It loops through possible key lengths, from 1 to half the length of the ciphertext, and for each length it creates a list of potential key distributions. A key distribution is a list of lists, where each inner list contains the characters from the ciphertext that were encrypted using the same character in the key.
- Coincidence Rate:
- For each key distribution, it calculates the average probability of coincidence between pairs of characters in the plaintext. The probability of coincidence is the chance that two randomly selected characters from the plaintext are the same. A higher probability of coincidence indicates a potentially valid key.
- Rating:
- It rates each key distribution based on its coincidence rate and the length of the key. The rating formula penalizes shorter keys because they increase the probability of coincidence artificially.
- Key Guessing:
- It selects the key distribution with the highest rating. For each encrypted character in the ciphertext, it guesses the corresponding character in the plaintext by finding the character in the key distribution that when rotated and aligned with the encrypted character, maximizes the dot product between their frequency distributions.
- Decryption:
- Finally, it decrypts the ciphertext using the guessed key and prints the decrypted text.
Additional Details:
- The
countEntries
function counts the number of occurrences of each character in a list. - The
breakup
function splits a list into smaller sublists of a specified size. - The
distribute
function divides a list into a specified number of equal-sized sublists. - The
dot
function performs element-wise multiplication between two lists and sums the results. - The
rotateAndDot
function rotates one of the lists by a specified number of characters and then performs the dot product. - The
getKeyChar
function finds the key character that maximizes the dot product between the actual and expected character frequencies when the ciphertext is rotated and aligned with the key character. - The
englishFrequencies
list contains the expected character frequencies in English text. - The
alphaSum
function performs modular addition to wrap around the alphabet.
Source code in the haskell programming language
{-# LANGUAGE TupleSections #-}
import Data.List(transpose, nub, sort, maximumBy)
import Data.Ord (comparing)
import Data.Char (ord)
import Data.Map (Map, fromListWith, toList, findWithDefault)
average :: Fractional a => [a] -> a
average as = sum as / fromIntegral (length as)
-- Create a map from each entry in list to the number of occurrences of
-- that entry in the list.
countEntries :: Ord a => [a] -> Map a Int
countEntries = fromListWith (+) . fmap (,1)
-- Break a string up into substrings of n chars.
breakup :: Int -> [a] -> [[a]]
breakup _ [] = []
breakup n as =
let (h, r) = splitAt n as
in h:breakup n r
-- Dole out elements of a string over a n element distribution.
distribute :: [a] -> Int -> [[a]]
distribute as n = transpose $ breakup n as
-- The probability that members of a pair of characters taken randomly
-- from a given string are equal.
coincidence :: (Ord a, Fractional b) => [a] -> b
coincidence str =
let charCounts = snd <$> toList (countEntries str)
strln = length str
d = fromIntegral $ strln * (strln - 1)
n = fromIntegral $ sum $ fmap (\cc -> cc * (cc-1)) charCounts
in n / d
-- Use the average probablity of coincidence for all the members of
-- a distribution to rate the distribution - the higher the better.
-- The correlation increases artificially for smaller
-- pieces/longer keys, so weigh against them a little
rate :: (Ord a, Fractional b) => [[a]] -> b
rate d = average (fmap coincidence d) - fromIntegral (length d) / 3000.0
-- Multiply elements of lists together and add up the results.
dot :: Num a => [a] -> [a] -> a
dot v0 v1 = sum $ zipWith (*) v0 v1
-- Given two lists of floats, rotate one of them by the number of
-- characters indicated by letter and then 'dot' them together.
rotateAndDot :: Num a => [a] -> [a] -> Char -> a
rotateAndDot v0 v1 letter = dot v0 (drop (ord letter - ord 'A') (cycle v1))
-- Find decoding offset that results in best match
-- between actual char frequencies and expected frequencies.
getKeyChar :: RealFrac a => [a] -> String -> Char
getKeyChar expected sample =
let charCounts = countEntries sample
countInSample c = findWithDefault 0 c charCounts
actual = fmap (fromIntegral . countInSample) ['A'..'Z']
in maximumBy (comparing $ rotateAndDot expected actual) ['A'..'Z']
main = do
let cr = filter (/=' ') crypt
-- Assume that if there are less than 20 characters
-- per column, the key's too long to guess
distributions = fmap (distribute cr) [1..length cr `div` 20]
bestDistribution = maximumBy (comparing rate) distributions
key = fmap (getKeyChar englishFrequencies) bestDistribution
alphaSum a b = ['A'..'Z'] !! ((ord b - ord a) `mod` 26)
mapM_ putStrLn ["Key: " ++ key, "Decrypted Text: " ++ zipWith alphaSum (cycle key) cr]
englishFrequencies =
[ 0.08167, 0.01492, 0.02782, 0.04253,
0.12702, 0.02228, 0.02015, 0.06094,
0.06966, 0.00153, 0.00772, 0.04025,
0.02406, 0.06749, 0.07507, 0.01929,
0.00095, 0.05987, 0.06327, 0.09056,
0.02758, 0.00978, 0.02360, 0.00150,
0.01974, 0.00074 ]
crypt = "\
\MOMUD EKAPV TQEFM OEVHP AJMII CDCTI FGYAG JSPXY ALUYM NSMYH\
\VUXJE LEPXJ FXGCM JHKDZ RYICU HYPUS PGIGM OIYHF WHTCQ KMLRD\
\ITLXZ LJFVQ GHOLW CUHLO MDSOE KTALU VYLNZ RFGBX PHVGA LWQIS\
\FGRPH JOOFW GUBYI LAPLA LCAFA AMKLG CETDW VOELJ IKGJB XPHVG\
\ALWQC SNWBU BYHCU HKOCE XJEYK BQKVY KIIEH GRLGH XEOLW AWFOJ\
\ILOVV RHPKD WIHKN ATUHN VRYAQ DIVHX FHRZV QWMWV LGSHN NLVZS\
\JLAKI FHXUF XJLXM TBLQV RXXHR FZXGV LRAJI EXPRV OSMNP KEPDT\
\LPRWM JAZPK LQUZA ALGZX GVLKL GJTUI ITDSU REZXJ ERXZS HMPST\
\MTEOE PAPJH SMFNB YVQUZ AALGA YDNMP AQOWT UHDBV TSMUE UIMVH\
\QGVRW AEFSP EMPVE PKXZY WLKJA GWALT VYYOB YIXOK IHPDS EVLEV\
\RVSGB JOGYW FHKBL GLXYA MVKIS KIEHY IMAPX UOISK PVAGN MZHPW\
\TTZPV XFCCD TUHJH WLAPF YULTB UXJLN SIJVV YOVDJ SOLXG TGRVO\
\SFRII CTMKO JFCQF KTINQ BWVHG TENLH HOGCS PSFPV GJOKM SIFPR\
\ZPAAS ATPTZ FTPPD PORRF TAXZP KALQA WMIUD BWNCT LEFKO ZQDLX\
\BUXJL ASIMR PNMBF ZCYLV WAPVF QRHZV ZGZEF KBYIO OFXYE VOWGB\
\BXVCB XBAWG LQKCM ICRRX MACUO IKHQU AJEGL OIJHH XPVZW JEWBA\
\FWAML ZZRXJ EKAHV FASMU LVVUT TGK\
\"
You may also check:How to resolve the algorithm Gray code step by step in the Tcl programming language
You may also check:How to resolve the algorithm The Name Game step by step in the Go programming language
You may also check:How to resolve the algorithm Animate a pendulum step by step in the Portugol programming language
You may also check:How to resolve the algorithm Stirling numbers of the first kind step by step in the Fōrmulæ programming language
You may also check:How to resolve the algorithm Short-circuit evaluation step by step in the R programming language