How to resolve the algorithm Anagrams step by step in the Haskell programming language

Published on 7 June 2024 03:52 AM

How to resolve the algorithm Anagrams step by step in the Haskell programming language

Table of Contents

Problem Statement

When two or more words are composed of the same characters, but in a different order, they are called anagrams. Using the word list at   http://wiki.puzzlers.org/pub/wordlists/unixdict.txt, find the sets of words that share the same characters that contain the most words in them.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Anagrams step by step in the Haskell programming language

This program counts the number of words that are anagrams of each other in a file.

In the first version, the program reads the file, sorts the words, and groups them by the sorted version of the word. It then finds the longest group and prints the words in that group.

In the second version, the program uses a more concise syntax. It groups the words by their sorted version and then filters out the groups that don't have the same length as the longest group. It then prints the words in each of the remaining groups.

Here is a breakdown of the second version of the program:

  • readFile "./unixdict.txt": Reads the file and returns the contents of the file as a string.
  • ws = groupBy (on (==) fst) (sort (((,) =<< pack . sort) <$> lines f)): Groups the words by their sorted version. The following functions are used in this expression:
    • lines f: Splits the string into a list of lines.
    • pack: Converts a string to a Text value.
    • sort: Sorts a list of values.
    • (,) =<< ...: Creates a tuple from the result of the expression.
    • groupBy (on (==) fst): Groups a list of values by the result of applying a function to the first element of each value.
  • filter ((length (maximumBy (comparing length) ws) ==) . length): Filters out the groups that don't have the same length as the longest group. The following functions are used in this expression:
    • length: Computes the length of a list.
    • maximumBy (comparing length) ws: Finds the group with the longest length.
    • comparing: Creates a comparison function.
  • mapM_ (print . fmap snd): Prints the words in each of the remaining groups. The following functions are used in this expression:
    • fmap snd: Gets the second element of each tuple in the list.
    • print: Prints a value to the console.
    • mapM_: Applies a function to each element in a list.

Explanation of the Haskell Source Code:

Task: Find and print the longest list of words that are anagrams of each other in a given English dictionary file.

Implementation 1:

  • Data.List: Imports the groupBy and sort functions for list manipulation.
  • groupon f x y: A helper function that checks if two elements x and y are equal when transformed by the function f.
  • main:
    • Reads the dictionary file and stores its lines in the variable f.
    • Creates a list of tuples (sortedWord, originalWord) for each word in the dictionary.
    • Groups the tuples based on the equality of their sorted words using groupBy.
    • Finds the maximum length of any group.
    • Filters the groups to include only those with the maximum length.
    • Prints the original words from each of the selected groups.

Implementation 2:

  • Data.List: Imports groupBy, maximumBy, sort, and comparing functions.
  • Data.Function: Imports the on function for function composition.
  • Data.Text: Imports the pack function for converting strings to Text.
  • main:
    • Reads the dictionary file and stores its lines in the variable ws.
    • Groups the dictionary lines based on their sorted and packed versions using groupBy.
    • Finds the maximum length of any group using maximumBy.
    • Filters the groups to include only those with the maximum length.
    • Prints the original words from each of the selected groups.

Working:

  1. Both implementations read the dictionary file and parse it into a list of lowercase words.
  2. They then group the words based on whether they are anagrams of each other using groupBy.
  3. They find the maximum length of any group, which represents the number of anagrams in the longest set.
  4. Finally, they print the longest sets of anagrams from the dictionary.

Output:

["abel","able","bale","bela","elba"]
["caret","carte","cater","crate","trace"]
["angel","angle","galen","glean","lange"]
["alger","glare","lager","large","regal"]
["elan","lane","lean","lena","neal"]
["evil","levi","live","veil","vile"]

Source code in the haskell programming language

import Data.List

groupon f x y = f x == f y

main = do
  f <- readFile "./../Puzzels/Rosetta/unixdict.txt"
  let  words = lines f
       wix = groupBy (groupon fst) . sort $ zip (map sort words) words
       mxl = maximum $ map length wix
  mapM_ (print . map snd) . filter ((==mxl).length) $ wix


*Main> main
["abel","able","bale","bela","elba"]
["caret","carte","cater","crate","trace"]
["angel","angle","galen","glean","lange"]
["alger","glare","lager","large","regal"]
["elan","lane","lean","lena","neal"]
["evil","levi","live","veil","vile"]


import Data.List (groupBy, maximumBy, sort)
import Data.Ord (comparing)
import Data.Function (on)
import Data.Text (pack)

main :: IO ()
main = do
  f <- readFile "./unixdict.txt"
  let ws = groupBy (on (==) fst) (sort (((,) =<< pack . sort) <$> lines f))
  mapM_
    (print . fmap snd)
    (filter ((length (maximumBy (comparing length) ws) ==) . length) ws)


  

You may also check:How to resolve the algorithm Random Latin squares step by step in the Phix programming language
You may also check:How to resolve the algorithm CUSIP step by step in the ALGOL W programming language
You may also check:How to resolve the algorithm Box the compass step by step in the AutoHotkey programming language
You may also check:How to resolve the algorithm Colorful numbers step by step in the Perl programming language
You may also check:How to resolve the algorithm Random Latin squares step by step in the Go programming language