How to resolve the algorithm Anagrams step by step in the Haskell programming language
How to resolve the algorithm Anagrams step by step in the Haskell programming language
Table of Contents
Problem Statement
When two or more words are composed of the same characters, but in a different order, they are called anagrams. Using the word list at http://wiki.puzzlers.org/pub/wordlists/unixdict.txt, find the sets of words that share the same characters that contain the most words in them.
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Anagrams step by step in the Haskell programming language
This program counts the number of words that are anagrams of each other in a file.
In the first version, the program reads the file, sorts the words, and groups them by the sorted version of the word. It then finds the longest group and prints the words in that group.
In the second version, the program uses a more concise syntax. It groups the words by their sorted version and then filters out the groups that don't have the same length as the longest group. It then prints the words in each of the remaining groups.
Here is a breakdown of the second version of the program:
readFile "./unixdict.txt"
: Reads the file and returns the contents of the file as a string.ws = groupBy (on (==) fst) (sort (((,) =<< pack . sort) <$> lines f))
: Groups the words by their sorted version. The following functions are used in this expression:lines f
: Splits the string into a list of lines.pack
: Converts a string to a Text value.sort
: Sorts a list of values.(,) =<< ...
: Creates a tuple from the result of the expression.groupBy (on (==) fst)
: Groups a list of values by the result of applying a function to the first element of each value.
filter ((length (maximumBy (comparing length) ws) ==) . length)
: Filters out the groups that don't have the same length as the longest group. The following functions are used in this expression:length
: Computes the length of a list.maximumBy (comparing length) ws
: Finds the group with the longest length.comparing
: Creates a comparison function.
mapM_ (print . fmap snd)
: Prints the words in each of the remaining groups. The following functions are used in this expression:fmap snd
: Gets the second element of each tuple in the list.print
: Prints a value to the console.mapM_
: Applies a function to each element in a list.
Explanation of the Haskell Source Code:
Task: Find and print the longest list of words that are anagrams of each other in a given English dictionary file.
Implementation 1:
- Data.List: Imports the
groupBy
andsort
functions for list manipulation. - groupon f x y: A helper function that checks if two elements
x
andy
are equal when transformed by the functionf
. - main:
- Reads the dictionary file and stores its lines in the variable
f
. - Creates a list of tuples
(sortedWord, originalWord)
for each word in the dictionary. - Groups the tuples based on the equality of their sorted words using
groupBy
. - Finds the maximum length of any group.
- Filters the groups to include only those with the maximum length.
- Prints the original words from each of the selected groups.
- Reads the dictionary file and stores its lines in the variable
Implementation 2:
- Data.List: Imports
groupBy
,maximumBy
,sort
, andcomparing
functions. - Data.Function: Imports the
on
function for function composition. - Data.Text: Imports the
pack
function for converting strings to Text. - main:
- Reads the dictionary file and stores its lines in the variable
ws
. - Groups the dictionary lines based on their sorted and packed versions using
groupBy
. - Finds the maximum length of any group using
maximumBy
. - Filters the groups to include only those with the maximum length.
- Prints the original words from each of the selected groups.
- Reads the dictionary file and stores its lines in the variable
Working:
- Both implementations read the dictionary file and parse it into a list of lowercase words.
- They then group the words based on whether they are anagrams of each other using
groupBy
. - They find the maximum length of any group, which represents the number of anagrams in the longest set.
- Finally, they print the longest sets of anagrams from the dictionary.
Output:
["abel","able","bale","bela","elba"]
["caret","carte","cater","crate","trace"]
["angel","angle","galen","glean","lange"]
["alger","glare","lager","large","regal"]
["elan","lane","lean","lena","neal"]
["evil","levi","live","veil","vile"]
Source code in the haskell programming language
import Data.List
groupon f x y = f x == f y
main = do
f <- readFile "./../Puzzels/Rosetta/unixdict.txt"
let words = lines f
wix = groupBy (groupon fst) . sort $ zip (map sort words) words
mxl = maximum $ map length wix
mapM_ (print . map snd) . filter ((==mxl).length) $ wix
*Main> main
["abel","able","bale","bela","elba"]
["caret","carte","cater","crate","trace"]
["angel","angle","galen","glean","lange"]
["alger","glare","lager","large","regal"]
["elan","lane","lean","lena","neal"]
["evil","levi","live","veil","vile"]
import Data.List (groupBy, maximumBy, sort)
import Data.Ord (comparing)
import Data.Function (on)
import Data.Text (pack)
main :: IO ()
main = do
f <- readFile "./unixdict.txt"
let ws = groupBy (on (==) fst) (sort (((,) =<< pack . sort) <$> lines f))
mapM_
(print . fmap snd)
(filter ((length (maximumBy (comparing length) ws) ==) . length) ws)
You may also check:How to resolve the algorithm Random Latin squares step by step in the Phix programming language
You may also check:How to resolve the algorithm CUSIP step by step in the ALGOL W programming language
You may also check:How to resolve the algorithm Box the compass step by step in the AutoHotkey programming language
You may also check:How to resolve the algorithm Colorful numbers step by step in the Perl programming language
You may also check:How to resolve the algorithm Random Latin squares step by step in the Go programming language