How to resolve the algorithm Text processing/1 step by step in the Clojure programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm Text processing/1 step by step in the Clojure programming language

Table of Contents

Problem Statement

Often data is produced by one program, in the wrong format for later use by another program or person. In these situations another program can be written to parse and transform the original data into a format useful to the other. The term "Data Munging" is often used in programming circles for this task. A request on the comp.lang.awk newsgroup led to a typical data munging task: The data is free to download and use and is of this format: Data is no longer available at that link. Zipped mirror available here (offsite mirror). Only a sample of the data showing its format is given above. The full example file may be downloaded here. Structure your program to show statistics for each line of the file, (similar to the original Python, Perl, and AWK examples below), followed by summary statistics for the file. When showing example output just show a few line statistics and the full end summary.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Text processing/1 step by step in the Clojure programming language

Source code in the clojure programming language

(ns rosettacode.textprocessing1
  (:require [clojure.string :as str]))

(defn parse-line [s]
  (let [[date & data-toks] (str/split s #"\s+")]
    {:date date
     :hour-vals (for [[v flag] (partition 2 data-toks)]
                  {:val  (Double. v)
                   :flag (Long. flag)})}))

(defn analyze-line [m]
  (let [valid?  (fn [rec] (pos? (:flag rec)))
        data    (->> (filter valid? (:hour-vals m))
                     (map :val))
        n-vals  (count data)
        sum     (reduce + data)]
    {:date   (:date m)
     :n-vals n-vals
     :sum    (double sum)
     :avg    (if (zero? n-vals) 0.0 (/ sum n-vals))
     :gaps   (for [hr (:hour-vals m)]
               {:gap? (not (valid? hr)) :date (:date m)})}))

(defn print-line [m]
  (println (format "%s: %d valid, sum: %7.3f, mean: %6.3f"
                   (:date   m)
                   (:n-vals m)
                   (:sum m)
                   (:avg m))))

(defn process-line [s]
  (let [m         (parse-line s)
        line-info (analyze-line m)]
    (print-line line-info)
    line-info))

(defn update-file-stats [file-m line-m]
  (let [append (fn [a b] (reduce conj a b))]
    (-> file-m
        (update-in [:sum]    + (:sum line-m))
        (update-in [:n-vals] + (:n-vals line-m))
        (update-in [:gap-recs] append (:gaps line-m)))))

(defn process-file [path]
  (let [file-lines (->> (slurp path)
                        str/split-lines)
        summary (reduce (fn [res line]
                          (update-file-stats res (process-line line)))
                        {:sum 0
                         :n-vals 0
                         :gap-recs []}
                        file-lines)
        max-gap (->> (partition-by :gap? (:gap-recs summary))
                     (filter #(:gap? (first %)))
                     (sort-by count >)
                     first)]
    (println (format "Sum: %f\n# Values: %d\nAvg: %f"
                     (:sum summary)
                     (:n-vals summary)
                     (/ (:sum summary) (:n-vals summary))))
    (println (format "Max gap of %d recs started on %s"
                     (count max-gap)
                     (:date (first max-gap))))))


  

You may also check:How to resolve the algorithm Department numbers step by step in the APL programming language
You may also check:How to resolve the algorithm Word frequency step by step in the PureBasic programming language
You may also check:How to resolve the algorithm Write language name in 3D ASCII step by step in the C programming language
You may also check:How to resolve the algorithm Circles of given radius through two points step by step in the Elixir programming language
You may also check:How to resolve the algorithm Hello world/Newline omission step by step in the Ol programming language