How to resolve the algorithm Statistics/Basic step by step in the C programming language

Published on 7 June 2024 03:52 AM
#C

How to resolve the algorithm Statistics/Basic step by step in the C programming language

Table of Contents

Problem Statement

Statistics is all about large groups of numbers.
When talking about a set of sampled data, most frequently used is their mean value and standard deviation (stddev).
If you have set of data xi where i =1,2,...n

When examining a large quantity of data, one often uses a histogram, which shows the counts of data samples falling into a prechosen set of intervals (or bins).
When plotted, often as bar graphs, it visually indicates how often each data value occurs. Task Using your language's random number routine, generate real numbers in the range of [0, 1]. It doesn't matter if you chose to use open or closed range.
Create 100 of such numbers (i.e. sample size 100) and calculate their mean and stddev.
Do so for sample size of 1,000 and 10,000, maybe even higher if you feel like.
Show a histogram of any of these sets.
Do you notice some patterns about the standard deviation? Extra Sometimes so much data need to be processed that it's impossible to keep all of them at once. Can you calculate the mean, stddev and histogram of a trillion numbers? (You don't really need to do a trillion numbers, just show how it can be done.) For a finite population with equal probabilities at all points, one can derive: Or, more verbosely. See also: Statistics/Normal distribution

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Statistics/Basic step by step in the C programming language

This C program performs statistical calculations, including calculating the mean, standard deviation, and histogram of random numbers. It also demonstrates moving averages and standard deviation calculations using a custom data structure. Here's a detailed explanation:

  1. Header Includes:

    • The program includes several standard C libraries:
      • <stdio.h>: Input and output functions
      • <stdlib.h>: General utilities, including rand() for generating random numbers
      • <math.h>: Mathematical functions like sqrt()
      • <stdint.h>: Integer types with well-defined sizes, including uint64_t
  2. Constants:

    • n_bins: Defines the number of bins for the histogram (set to 10).
  3. **Function rand01():

    • Generates a random number between 0 and 1 (inclusive) using the rand() function.
  4. **Function avg():

    • Calculates the average (mean) and standard deviation of a set of count random numbers.
    • It takes three arguments:
      • count: The number of random numbers to generate and analyze.
      • stddev: A pointer to a double variable to store the standard deviation.
      • hist: An array of integers to store the histogram data.
    • The function:
      • Generates count random numbers and stores them in an array x.
      • Computes the mean m as the sum of the random numbers divided by count.
      • Computes the standard deviation stddev as the square root of the variance, which is calculated as the sum of the squared differences between the random numbers and the mean, divided by count minus 1.
      • Updates the histogram hist by incrementing the count for each bin corresponding to the values of the generated random numbers.
    • The function returns the mean m.
  5. **Function hist_plot():

    • Plots a histogram based on the data in the hist array.
    • It calculates the maximum value in the histogram and adjusts the scale if necessary to make the plot readable.
    • It prints the histogram data in a table format, with each row representing a bin range, the count of numbers in that bin, and a graphical representation of the count using the '#' character.
  6. Struct moving_rec:

    • Defines a record to store data for moving average and standard deviation calculations.
    • It contains the following members:
      • size: The total number of data points added to the moving average.
      • sum: The cumulative sum of the data points.
      • x2: The cumulative sum of the squared data points.
      • hist: An array of integers to store the histogram data for the moving average.
  7. **Function moving_avg():

    • Updates the moving average and standard deviation in the moving_rec record.
    • It takes three arguments:
      • rec: A pointer to the moving_rec structure.
      • data: An array of double-precision floating-point numbers containing the new data points.
      • count: The number of data points in the array.
    • The function:
      • Computes the cumulative sum and sum of squares for the new data points.
      • Updates the corresponding histogram bins in moving_rec.
      • Adds the new data to the cumulative sum and sum of squares in moving_rec.
      • Increments the size member of moving_rec by the number of new data points.
  8. **Main Function (main()):

    • The main() function is the entry point of the program.
    • It performs the following steps:
      • Initializes variables m (mean), stddev (standard deviation), and hist (histogram) for the initial calculations.
      • Sets samples to 10 and enters a loop that increases samples by a factor of 10 in each iteration.
      • Calls avg() for each value of samples to calculate the mean and standard deviation of random numbers and print the results.
      • Calls hist_plot() to print the histogram based on the data gathered from all iterations.
      • Initializes a moving_rec variable rec to store data for moving average and standard deviation calculations.
      • Generates 100,000 sets of 100 random numbers and calls moving_avg() to update the moving average in the rec record incrementally.
      • Prints the moving average and standard deviation every 1000th iteration.

Source code in the c programming language

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdint.h>

#define n_bins 10

double rand01() { return rand() / (RAND_MAX + 1.0); }

double avg(int count, double *stddev, int *hist)
{
	double x[count];
	double m = 0, s = 0;

	for (int i = 0; i < n_bins; i++) hist[i] = 0;
	for (int i = 0; i < count; i++) {
		m += (x[i] = rand01());
		hist[(int)(x[i] * n_bins)] ++;
	}

	m /= count;
	for (int i = 0; i < count; i++)
		s += x[i] * x[i];
	*stddev = sqrt(s / count - m * m);
	return m;
}

void hist_plot(int *hist)
{
	int max = 0, step = 1;
	double inc = 1.0 / n_bins;

	for (int i = 0; i < n_bins; i++)
		if (hist[i] > max) max = hist[i];

	/* scale if numbers are too big */
	if (max >= 60) step = (max + 59) / 60;

	for (int i = 0; i < n_bins; i++) {
		printf("[%5.2g,%5.2g]%5d ", i * inc, (i + 1) * inc, hist[i]);
		for (int j = 0; j < hist[i]; j += step)
			printf("#");
		printf("\n");
	}
}

/*  record for moving average and stddev.  Values kept are sums and sum data^2
 *  to avoid excessive precision loss due to divisions, but some loss is inevitable
 */
typedef struct {
	uint64_t size;
	double sum, x2;
	uint64_t hist[n_bins];
} moving_rec;

void moving_avg(moving_rec *rec, double *data, int count)
{
	double sum = 0, x2 = 0;
	/* not adding data directly to the sum in case both recorded sum and 
	 * count of this batch are large; slightly less likely to lose precision*/
	for (int i = 0; i < count; i++) {
		sum += data[i];
		x2 += data[i] * data[i];
		rec->hist[(int)(data[i] * n_bins)]++;
	}

	rec->sum += sum;
	rec->x2 += x2;
	rec->size += count;
}

int main()
{
	double m, stddev;
	int hist[n_bins], samples = 10;

	while (samples <= 10000) {
		m = avg(samples, &stddev, hist);
		printf("size %5d: %g %g\n", samples, m, stddev);
		samples *= 10;
	}

	printf("\nHistograph:\n");
	hist_plot(hist);

	printf("\nMoving average:\n  N     Mean    Sigma\n");
	moving_rec rec = { 0, 0, 0, {0} };
	double data[100];
	for (int i = 0; i < 10000; i++) {
		for (int j = 0; j < 100; j++) data[j] = rand01();

		moving_avg(&rec, data, 100);

		if ((i % 1000) == 999) {
			printf("%4lluk %f %f\n",
				rec.size/1000,
				rec.sum / rec.size,
				sqrt(rec.x2 * rec.size - rec.sum * rec.sum)/rec.size
			);
		}
	}
}


  

You may also check:How to resolve the algorithm Universal Turing machine step by step in the Raku programming language
You may also check:How to resolve the algorithm Mandelbrot set step by step in the Scratch programming language
You may also check:How to resolve the algorithm War card game step by step in the Java programming language
You may also check:How to resolve the algorithm Checkpoint synchronization step by step in the Racket programming language
You may also check:How to resolve the algorithm Remove duplicate elements step by step in the Amazing Hopper programming language