How to resolve the algorithm Text processing/2 step by step in the C# programming language
How to resolve the algorithm Text processing/2 step by step in the C# programming language
Table of Contents
Problem Statement
The following task concerns data that came from a pollution monitoring station with twenty-four instruments monitoring twenty-four aspects of pollution in the air. Periodically a record is added to the file, each record being a line of 49 fields separated by white-space, which can be one or more space or tab characters. The fields (from the left) are: i.e. a datestamp followed by twenty-four repetitions of a floating-point instrument value and that instrument's associated integer flag. Flag values are >= 1 if the instrument is working and < 1 if there is some problem with it, in which case that instrument's value should be ignored. A sample from the full data file readings.txt, which is also used in the Text processing/1 task, follows: Data is no longer available at that link. Zipped mirror available here
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Text processing/2 step by step in the C# programming language
Overview: This C# program processes a text file containing a series of records, performs various validations on each record, and reports errors and potential duplicates.
Code Explanation: Regex Expressions:
multiWhite
: Matches multiple whitespace characters and replaces them with a single space.dateEx
: Matches a date in the formatYYYY-MM-DD
.valEx
: Matches a number in the formatX.XXX
.flagEx
: Matches a single digit from 1 to 9.
Variables:
missformcount
: Counts the number of malformed records.totalcount
: Keeps track of the total number of records processed.dates
: A dictionary to store unique dates and their line numbers.
File Processing:
- The program reads the "readings.txt" file line by line using a
StreamReader
. - Each line is processed by removing multi-white spaces and splitting it on spaces.
Record Validation:
- The program first checks if the record has the expected number (49) of fields. Records with missing fields are marked as malformed.
- It then validates the date field using the
dateEx
regex. If the date format is incorrect, the record is marked as malformed. - For each non-date field, the program alternates between checking for valid float values (using
valEx
) and valid flag values (usingflagEx
). Invalid values result in the record being marked as malformed.
Date Duplication:
- After processing all records, the program reverses the
dates
dictionary, where the keys become the dates and the values are lists of line numbers where that date occurs. - It then iterates over the reversed dictionary and reports any dates that appear in more than one line.
Output:
- The program displays the number of valid and malformed records.
- It also lists any dates that appear in duplicate along with the corresponding line numbers.
Source code in the csharp programming language
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.IO;
namespace TextProc2
{
class Program
{
static void Main(string[] args)
{
Regex multiWhite = new Regex(@"\s+");
Regex dateEx = new Regex(@"^\d{4}-\d{2}-\d{2}$");
Regex valEx = new Regex(@"^\d+\.{1}\d{3}$");
Regex flagEx = new Regex(@"^[1-9]{1}$");
int missformcount = 0, totalcount = 0;
Dictionary<int, string> dates = new Dictionary<int, string>();
using (StreamReader sr = new StreamReader("readings.txt"))
{
string line = sr.ReadLine();
while (line != null)
{
line = multiWhite.Replace(line, @" ");
string[] splitLine = line.Split(' ');
if (splitLine.Length != 49)
missformcount++;
if (!dateEx.IsMatch(splitLine[0]))
missformcount++;
else
dates.Add(totalcount + 1, dateEx.Match(splitLine[0]).ToString());
int err = 0;
for (int i = 1; i < splitLine.Length; i++)
{
if (i%2 != 0)
{
if (!valEx.IsMatch(splitLine[i]))
err++;
}
else
{
if (!flagEx.IsMatch(splitLine[i]))
err++;
}
}
if (err != 0) missformcount++;
line = sr.ReadLine();
totalcount++;
}
}
int goodEntries = totalcount - missformcount;
Dictionary<string,List<int>> dateReverse = new Dictionary<string,List<int>>();
foreach (KeyValuePair<int, string> kvp in dates)
{
if (!dateReverse.ContainsKey(kvp.Value))
dateReverse[kvp.Value] = new List<int>();
dateReverse[kvp.Value].Add(kvp.Key);
}
Console.WriteLine(goodEntries + " valid Records out of " + totalcount);
foreach (KeyValuePair<string, List<int>> kvp in dateReverse)
{
if (kvp.Value.Count > 1)
Console.WriteLine("{0} is duplicated at Lines : {1}", kvp.Key, string.Join(",", kvp.Value));
}
}
}
}
You may also check:How to resolve the algorithm Happy numbers step by step in the Quackery programming language
You may also check:How to resolve the algorithm Array concatenation step by step in the M2000 Interpreter programming language
You may also check:How to resolve the algorithm Numerical integration step by step in the ALGOL W programming language
You may also check:How to resolve the algorithm User input/Text step by step in the Tcl programming language
You may also check:How to resolve the algorithm Multiplication tables step by step in the Icon and Unicon programming language