How to resolve the algorithm Text processing/1 step by step in the Haskell programming language

Published on 7 June 2024 03:52 AM

How to resolve the algorithm Text processing/1 step by step in the Haskell programming language

Table of Contents

Problem Statement

Often data is produced by one program, in the wrong format for later use by another program or person. In these situations another program can be written to parse and transform the original data into a format useful to the other. The term "Data Munging" is often used in programming circles for this task. A request on the comp.lang.awk newsgroup led to a typical data munging task: The data is free to download and use and is of this format: Data is no longer available at that link. Zipped mirror available here (offsite mirror). Only a sample of the data showing its format is given above. The full example file may be downloaded here. Structure your program to show statistics for each line of the file, (similar to the original Python, Perl, and AWK examples below), followed by summary statistics for the file. When showing example output just show a few line statistics and the full end summary.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Text processing/1 step by step in the Haskell programming language

This Haskell program reads data from a file, parses it into a list of tuples containing a date and a list of values and flags, and then processes the data to find summary statistics and maximum runs of consecutive false readings.

Here's a detailed explanation of the code:

  1. Importing Modules: The program imports several Haskell modules for various functionalities:

    • Data.List: For list manipulation functions.
    • Numeric: For numeric data handling.
    • Control.Arrow: For function composition.
    • Control.Monad: For monad operations.
    • Text.Printf: For formatted printing.
    • System.Environment: For accessing command-line arguments.
    • Data.Function: For function utilities.
  2. Type Declarations:

    • Date: The type representing a date, defined as a string.
    • Value: The type representing a value, defined as a double-precision floating-point number.
    • Flag: The type representing a flag, defined as a boolean.
  3. Auxiliary Functions:

    • readFlg: Converts a string to a boolean flag, returning True if the numeric value of the string is greater than 0, and False otherwise.
    • readNum: Converts a string to a numeric value.
    • take2: Takes a list of strings and splits it into pairs, using an unfoldr function.
  4. parseData Function: Parses a list of strings into a tuple containing a date and a list of value-flag pairs. The date is the first string in the list, and the value-flag pairs are extracted from the remaining strings using the take2 function.

  5. sumAccs Function: Processes a tuple representing a date and a list of value-flag pairs. It sums the values for each flag, counts the number of values for each flag, and combines the results into a tuple containing the date, a list of tuples representing the average and count of values for each flag, and a list of flags.

  6. maxNAseq Function: Finds the maximum run of consecutive false readings in a list of flags. It does this by grouping the flags by consecutive false readings, sorting the groups by length, and then returning the maximum group.

  7. main Function:

    • Reads the command-line arguments and extracts the file name.
    • Reads the file and splits each line into a list of strings.
    • Parses the data using the parseData function.
    • Processes the data using the sumAccs function to compute summary statistics.
    • Finds the maximum run of consecutive false readings using the maxNAseq function.
    • Prints the summary statistics and the maximum run of consecutive false readings.

The program takes a file as input, where each line represents a data record containing a date, a value, and a flag. It parses the data, calculates summary statistics, and identifies the maximum run of consecutive false readings. Finally, it prints the results.

Source code in the haskell programming language

import Data.List
import Numeric
import Control.Arrow
import Control.Monad
import Text.Printf
import System.Environment
import Data.Function

type Date = String
type Value = Double
type Flag = Bool

readFlg :: String -> Flag
readFlg = (> 0).read

readNum :: String -> Value
readNum = fst.head.readFloat

take2 = takeWhile(not.null).unfoldr (Just.splitAt 2)

parseData :: [String] -> (Date,[(Value,Flag)])
parseData = head &&& map(readNum.head &&& readFlg.last).take2.tail

sumAccs :: (Date,[(Value,Flag)]) -> (Date, ((Value,Int),[Flag]))
sumAccs = second (((sum &&& length).concat.uncurry(zipWith(\v f -> [v|f])) &&& snd).unzip)

maxNAseq :: [Flag] -> [(Int,Int)]
maxNAseq = head.groupBy((==) `on` fst).sortBy(flip compare)
           . concat.uncurry(zipWith(\i (r,b)->[(r,i)|not b]))
           . first(init.scanl(+)0). unzip
           . map ((fst &&& id).(length &&& head)). group

main = do
    file:_ <- getArgs
    f <- readFile file
    let dat :: [(Date,((Value,Int),[Flag]))]
        dat      = map (sumAccs. parseData. words).lines $ f
        summ     = ((sum *** sum). unzip *** maxNAseq.concat). unzip $ map snd dat
        totalFmt = "\nSummary\t\t accept: %d\t total: %.3f \taverage: %6.3f\n\n"
        lineFmt  = "%8s\t accept: %2d\t total: %11.3f \taverage: %6.3f\n"
        maxFmt   =  "Maximum of %d consecutive false readings, starting on line /%s/ and ending on line /%s/\n"
-- output statistics
    putStrLn "\nSome lines:\n"
    mapM_ (\(d,((v,n),_)) -> printf lineFmt d n v (v/fromIntegral n)) $ take 4 $ drop 2200 dat 
    (\(t,n) -> printf totalFmt  n t (t/fromIntegral n)) $ fst summ
    mapM_ ((\(l, d1,d2) -> printf maxFmt l d1 d2)
              . (\(a,b)-> (a,(fst.(dat!!).(`div`24))b,(fst.(dat!!).(`div`24))(a+b)))) $ snd summ


*Main> :main ["./RC/readings.txt"]


  

You may also check:How to resolve the algorithm Main step of GOST 28147-89 step by step in the Racket programming language
You may also check:How to resolve the algorithm Gray code step by step in the 11l programming language
You may also check:How to resolve the algorithm Factorial step by step in the Excel programming language
You may also check:How to resolve the algorithm Ethiopian multiplication step by step in the ACL2 programming language
You may also check:How to resolve the algorithm Ackermann function step by step in the Transd programming language