How to resolve the algorithm Text processing/2 step by step in the C# programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm Text processing/2 step by step in the C# programming language

Table of Contents

Problem Statement

The following task concerns data that came from a pollution monitoring station with twenty-four instruments monitoring twenty-four aspects of pollution in the air. Periodically a record is added to the file, each record being a line of 49 fields separated by white-space, which can be one or more space or tab characters. The fields (from the left) are: i.e. a datestamp followed by twenty-four repetitions of a floating-point instrument value and that instrument's associated integer flag. Flag values are >= 1 if the instrument is working and < 1 if there is some problem with it, in which case that instrument's value should be ignored. A sample from the full data file readings.txt, which is also used in the Text processing/1 task, follows: Data is no longer available at that link. Zipped mirror available here

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Text processing/2 step by step in the C# programming language

Overview: This C# program processes a text file containing a series of records, performs various validations on each record, and reports errors and potential duplicates.

Code Explanation: Regex Expressions:

  • multiWhite: Matches multiple whitespace characters and replaces them with a single space.
  • dateEx: Matches a date in the format YYYY-MM-DD.
  • valEx: Matches a number in the format X.XXX.
  • flagEx: Matches a single digit from 1 to 9.

Variables:

  • missformcount: Counts the number of malformed records.
  • totalcount: Keeps track of the total number of records processed.
  • dates: A dictionary to store unique dates and their line numbers.

File Processing:

  • The program reads the "readings.txt" file line by line using a StreamReader.
  • Each line is processed by removing multi-white spaces and splitting it on spaces.

Record Validation:

  • The program first checks if the record has the expected number (49) of fields. Records with missing fields are marked as malformed.
  • It then validates the date field using the dateEx regex. If the date format is incorrect, the record is marked as malformed.
  • For each non-date field, the program alternates between checking for valid float values (using valEx) and valid flag values (using flagEx). Invalid values result in the record being marked as malformed.

Date Duplication:

  • After processing all records, the program reverses the dates dictionary, where the keys become the dates and the values are lists of line numbers where that date occurs.
  • It then iterates over the reversed dictionary and reports any dates that appear in more than one line.

Output:

  • The program displays the number of valid and malformed records.
  • It also lists any dates that appear in duplicate along with the corresponding line numbers.

Source code in the csharp programming language

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.IO;

namespace TextProc2
{
    class Program
    {
        static void Main(string[] args)
        {
            Regex multiWhite = new Regex(@"\s+");
            Regex dateEx = new Regex(@"^\d{4}-\d{2}-\d{2}$");
            Regex valEx = new Regex(@"^\d+\.{1}\d{3}$");
            Regex flagEx = new Regex(@"^[1-9]{1}$");
            
            int missformcount = 0, totalcount = 0;
            Dictionary<int, string> dates = new Dictionary<int, string>();

            using (StreamReader sr = new StreamReader("readings.txt"))
            {
                string line = sr.ReadLine();
                while (line != null)
                {
                    line = multiWhite.Replace(line, @" ");                    
                    string[] splitLine = line.Split(' ');
                    if (splitLine.Length != 49)
                        missformcount++;
                    if (!dateEx.IsMatch(splitLine[0]))                        
                        missformcount++;                    
                    else
                        dates.Add(totalcount + 1, dateEx.Match(splitLine[0]).ToString());
                    int err = 0;                    
                    for (int i = 1; i < splitLine.Length; i++)
                    {
                        if (i%2 != 0)
                        {
                            if (!valEx.IsMatch(splitLine[i]))                          
                                err++;
                        }
                        else
                        {
                            if (!flagEx.IsMatch(splitLine[i]))
                                err++;                                                        
                        }                        
                    }
                    if (err != 0) missformcount++;
                    line = sr.ReadLine();
                    totalcount++;                    
                }
            }

            int goodEntries = totalcount - missformcount;
            Dictionary<string,List<int>> dateReverse = new Dictionary<string,List<int>>();

            foreach (KeyValuePair<int, string> kvp in dates)
            {
                if (!dateReverse.ContainsKey(kvp.Value))
                    dateReverse[kvp.Value] = new List<int>();
                dateReverse[kvp.Value].Add(kvp.Key);
            }

            Console.WriteLine(goodEntries + " valid Records out of " + totalcount);

            foreach (KeyValuePair<string, List<int>> kvp in dateReverse)
            {
                if (kvp.Value.Count > 1)
                    Console.WriteLine("{0} is duplicated at Lines : {1}", kvp.Key, string.Join(",", kvp.Value));                    
            }
        }
    }
}


  

You may also check:How to resolve the algorithm Happy numbers step by step in the Quackery programming language
You may also check:How to resolve the algorithm Array concatenation step by step in the M2000 Interpreter programming language
You may also check:How to resolve the algorithm Numerical integration step by step in the ALGOL W programming language
You may also check:How to resolve the algorithm User input/Text step by step in the Tcl programming language
You may also check:How to resolve the algorithm Multiplication tables step by step in the Icon and Unicon programming language