How to resolve the algorithm Text processing/1 step by step in the Haskell programming language
How to resolve the algorithm Text processing/1 step by step in the Haskell programming language
Table of Contents
Problem Statement
Often data is produced by one program, in the wrong format for later use by another program or person. In these situations another program can be written to parse and transform the original data into a format useful to the other. The term "Data Munging" is often used in programming circles for this task. A request on the comp.lang.awk newsgroup led to a typical data munging task: The data is free to download and use and is of this format: Data is no longer available at that link. Zipped mirror available here (offsite mirror). Only a sample of the data showing its format is given above. The full example file may be downloaded here. Structure your program to show statistics for each line of the file, (similar to the original Python, Perl, and AWK examples below), followed by summary statistics for the file. When showing example output just show a few line statistics and the full end summary.
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Text processing/1 step by step in the Haskell programming language
This Haskell program reads data from a file, parses it into a list of tuples containing a date and a list of values and flags, and then processes the data to find summary statistics and maximum runs of consecutive false readings.
Here's a detailed explanation of the code:
-
Importing Modules: The program imports several Haskell modules for various functionalities:
Data.List
: For list manipulation functions.Numeric
: For numeric data handling.Control.Arrow
: For function composition.Control.Monad
: For monad operations.Text.Printf
: For formatted printing.System.Environment
: For accessing command-line arguments.Data.Function
: For function utilities.
-
Type Declarations:
Date
: The type representing a date, defined as a string.Value
: The type representing a value, defined as a double-precision floating-point number.Flag
: The type representing a flag, defined as a boolean.
-
Auxiliary Functions:
readFlg
: Converts a string to a boolean flag, returningTrue
if the numeric value of the string is greater than 0, andFalse
otherwise.readNum
: Converts a string to a numeric value.take2
: Takes a list of strings and splits it into pairs, using an unfoldr function.
-
parseData
Function: Parses a list of strings into a tuple containing a date and a list of value-flag pairs. The date is the first string in the list, and the value-flag pairs are extracted from the remaining strings using thetake2
function. -
sumAccs
Function: Processes a tuple representing a date and a list of value-flag pairs. It sums the values for each flag, counts the number of values for each flag, and combines the results into a tuple containing the date, a list of tuples representing the average and count of values for each flag, and a list of flags. -
maxNAseq
Function: Finds the maximum run of consecutive false readings in a list of flags. It does this by grouping the flags by consecutive false readings, sorting the groups by length, and then returning the maximum group. -
main
Function:- Reads the command-line arguments and extracts the file name.
- Reads the file and splits each line into a list of strings.
- Parses the data using the
parseData
function. - Processes the data using the
sumAccs
function to compute summary statistics. - Finds the maximum run of consecutive false readings using the
maxNAseq
function. - Prints the summary statistics and the maximum run of consecutive false readings.
The program takes a file as input, where each line represents a data record containing a date, a value, and a flag. It parses the data, calculates summary statistics, and identifies the maximum run of consecutive false readings. Finally, it prints the results.
Source code in the haskell programming language
import Data.List
import Numeric
import Control.Arrow
import Control.Monad
import Text.Printf
import System.Environment
import Data.Function
type Date = String
type Value = Double
type Flag = Bool
readFlg :: String -> Flag
readFlg = (> 0).read
readNum :: String -> Value
readNum = fst.head.readFloat
take2 = takeWhile(not.null).unfoldr (Just.splitAt 2)
parseData :: [String] -> (Date,[(Value,Flag)])
parseData = head &&& map(readNum.head &&& readFlg.last).take2.tail
sumAccs :: (Date,[(Value,Flag)]) -> (Date, ((Value,Int),[Flag]))
sumAccs = second (((sum &&& length).concat.uncurry(zipWith(\v f -> [v|f])) &&& snd).unzip)
maxNAseq :: [Flag] -> [(Int,Int)]
maxNAseq = head.groupBy((==) `on` fst).sortBy(flip compare)
. concat.uncurry(zipWith(\i (r,b)->[(r,i)|not b]))
. first(init.scanl(+)0). unzip
. map ((fst &&& id).(length &&& head)). group
main = do
file:_ <- getArgs
f <- readFile file
let dat :: [(Date,((Value,Int),[Flag]))]
dat = map (sumAccs. parseData. words).lines $ f
summ = ((sum *** sum). unzip *** maxNAseq.concat). unzip $ map snd dat
totalFmt = "\nSummary\t\t accept: %d\t total: %.3f \taverage: %6.3f\n\n"
lineFmt = "%8s\t accept: %2d\t total: %11.3f \taverage: %6.3f\n"
maxFmt = "Maximum of %d consecutive false readings, starting on line /%s/ and ending on line /%s/\n"
-- output statistics
putStrLn "\nSome lines:\n"
mapM_ (\(d,((v,n),_)) -> printf lineFmt d n v (v/fromIntegral n)) $ take 4 $ drop 2200 dat
(\(t,n) -> printf totalFmt n t (t/fromIntegral n)) $ fst summ
mapM_ ((\(l, d1,d2) -> printf maxFmt l d1 d2)
. (\(a,b)-> (a,(fst.(dat!!).(`div`24))b,(fst.(dat!!).(`div`24))(a+b)))) $ snd summ
*Main> :main ["./RC/readings.txt"]
You may also check:How to resolve the algorithm Main step of GOST 28147-89 step by step in the Racket programming language
You may also check:How to resolve the algorithm Gray code step by step in the 11l programming language
You may also check:How to resolve the algorithm Factorial step by step in the Excel programming language
You may also check:How to resolve the algorithm Ethiopian multiplication step by step in the ACL2 programming language
You may also check:How to resolve the algorithm Ackermann function step by step in the Transd programming language