How to resolve the algorithm Statistics/Basic step by step in the C programming language
How to resolve the algorithm Statistics/Basic step by step in the C programming language
Table of Contents
Problem Statement
Statistics is all about large groups of numbers.
When talking about a set of sampled data, most frequently used is their mean value and standard deviation (stddev).
If you have set of data xi where i =1,2,...n
When examining a large quantity of data, one often uses a histogram, which shows the counts of data samples falling into a prechosen set of intervals (or bins).
When plotted, often as bar graphs, it visually indicates how often each data value occurs.
Task Using your language's random number routine, generate real numbers in the range of [0, 1]. It doesn't matter if you chose to use open or closed range.
Create 100 of such numbers (i.e. sample size 100) and calculate their mean and stddev.
Do so for sample size of 1,000 and 10,000, maybe even higher if you feel like.
Show a histogram of any of these sets.
Do you notice some patterns about the standard deviation?
Extra Sometimes so much data need to be processed that it's impossible to keep all of them at once. Can you calculate the mean, stddev and histogram of a trillion numbers? (You don't really need to do a trillion numbers, just show how it can be done.)
For a finite population with equal probabilities at all points, one can derive:
Or, more verbosely.
See also: Statistics/Normal distribution
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Statistics/Basic step by step in the C programming language
This C program performs statistical calculations, including calculating the mean, standard deviation, and histogram of random numbers. It also demonstrates moving averages and standard deviation calculations using a custom data structure. Here's a detailed explanation:
-
Header Includes:
- The program includes several standard C libraries:
<stdio.h>
: Input and output functions<stdlib.h>
: General utilities, includingrand()
for generating random numbers<math.h>
: Mathematical functions likesqrt()
<stdint.h>
: Integer types with well-defined sizes, includinguint64_t
- The program includes several standard C libraries:
-
Constants:
n_bins
: Defines the number of bins for the histogram (set to 10).
-
**Function
rand01()
:- Generates a random number between 0 and 1 (inclusive) using the
rand()
function.
- Generates a random number between 0 and 1 (inclusive) using the
-
**Function
avg()
:- Calculates the average (mean) and standard deviation of a set of
count
random numbers. - It takes three arguments:
count
: The number of random numbers to generate and analyze.stddev
: A pointer to a double variable to store the standard deviation.hist
: An array of integers to store the histogram data.
- The function:
- Generates
count
random numbers and stores them in an arrayx
. - Computes the mean
m
as the sum of the random numbers divided bycount
. - Computes the standard deviation
stddev
as the square root of the variance, which is calculated as the sum of the squared differences between the random numbers and the mean, divided bycount
minus 1. - Updates the histogram
hist
by incrementing the count for each bin corresponding to the values of the generated random numbers.
- Generates
- The function returns the mean
m
.
- Calculates the average (mean) and standard deviation of a set of
-
**Function
hist_plot()
:- Plots a histogram based on the data in the
hist
array. - It calculates the maximum value in the histogram and adjusts the scale if necessary to make the plot readable.
- It prints the histogram data in a table format, with each row representing a bin range, the count of numbers in that bin, and a graphical representation of the count using the '#' character.
- Plots a histogram based on the data in the
-
Struct
moving_rec
:- Defines a record to store data for moving average and standard deviation calculations.
- It contains the following members:
size
: The total number of data points added to the moving average.sum
: The cumulative sum of the data points.x2
: The cumulative sum of the squared data points.hist
: An array of integers to store the histogram data for the moving average.
-
**Function
moving_avg()
:- Updates the moving average and standard deviation in the
moving_rec
record. - It takes three arguments:
rec
: A pointer to themoving_rec
structure.data
: An array of double-precision floating-point numbers containing the new data points.count
: The number of data points in the array.
- The function:
- Computes the cumulative sum and sum of squares for the new data points.
- Updates the corresponding histogram bins in
moving_rec
. - Adds the new data to the cumulative sum and sum of squares in
moving_rec
. - Increments the
size
member ofmoving_rec
by the number of new data points.
- Updates the moving average and standard deviation in the
-
**Main Function (
main()
):- The
main()
function is the entry point of the program. - It performs the following steps:
- Initializes variables
m
(mean),stddev
(standard deviation), andhist
(histogram) for the initial calculations. - Sets
samples
to 10 and enters a loop that increasessamples
by a factor of 10 in each iteration. - Calls
avg()
for each value ofsamples
to calculate the mean and standard deviation of random numbers and print the results. - Calls
hist_plot()
to print the histogram based on the data gathered from all iterations. - Initializes a
moving_rec
variablerec
to store data for moving average and standard deviation calculations. - Generates 100,000 sets of 100 random numbers and calls
moving_avg()
to update the moving average in therec
record incrementally. - Prints the moving average and standard deviation every 1000th iteration.
- Initializes variables
- The
Source code in the c programming language
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdint.h>
#define n_bins 10
double rand01() { return rand() / (RAND_MAX + 1.0); }
double avg(int count, double *stddev, int *hist)
{
double x[count];
double m = 0, s = 0;
for (int i = 0; i < n_bins; i++) hist[i] = 0;
for (int i = 0; i < count; i++) {
m += (x[i] = rand01());
hist[(int)(x[i] * n_bins)] ++;
}
m /= count;
for (int i = 0; i < count; i++)
s += x[i] * x[i];
*stddev = sqrt(s / count - m * m);
return m;
}
void hist_plot(int *hist)
{
int max = 0, step = 1;
double inc = 1.0 / n_bins;
for (int i = 0; i < n_bins; i++)
if (hist[i] > max) max = hist[i];
/* scale if numbers are too big */
if (max >= 60) step = (max + 59) / 60;
for (int i = 0; i < n_bins; i++) {
printf("[%5.2g,%5.2g]%5d ", i * inc, (i + 1) * inc, hist[i]);
for (int j = 0; j < hist[i]; j += step)
printf("#");
printf("\n");
}
}
/* record for moving average and stddev. Values kept are sums and sum data^2
* to avoid excessive precision loss due to divisions, but some loss is inevitable
*/
typedef struct {
uint64_t size;
double sum, x2;
uint64_t hist[n_bins];
} moving_rec;
void moving_avg(moving_rec *rec, double *data, int count)
{
double sum = 0, x2 = 0;
/* not adding data directly to the sum in case both recorded sum and
* count of this batch are large; slightly less likely to lose precision*/
for (int i = 0; i < count; i++) {
sum += data[i];
x2 += data[i] * data[i];
rec->hist[(int)(data[i] * n_bins)]++;
}
rec->sum += sum;
rec->x2 += x2;
rec->size += count;
}
int main()
{
double m, stddev;
int hist[n_bins], samples = 10;
while (samples <= 10000) {
m = avg(samples, &stddev, hist);
printf("size %5d: %g %g\n", samples, m, stddev);
samples *= 10;
}
printf("\nHistograph:\n");
hist_plot(hist);
printf("\nMoving average:\n N Mean Sigma\n");
moving_rec rec = { 0, 0, 0, {0} };
double data[100];
for (int i = 0; i < 10000; i++) {
for (int j = 0; j < 100; j++) data[j] = rand01();
moving_avg(&rec, data, 100);
if ((i % 1000) == 999) {
printf("%4lluk %f %f\n",
rec.size/1000,
rec.sum / rec.size,
sqrt(rec.x2 * rec.size - rec.sum * rec.sum)/rec.size
);
}
}
}
You may also check:How to resolve the algorithm Universal Turing machine step by step in the Raku programming language
You may also check:How to resolve the algorithm Mandelbrot set step by step in the Scratch programming language
You may also check:How to resolve the algorithm War card game step by step in the Java programming language
You may also check:How to resolve the algorithm Checkpoint synchronization step by step in the Racket programming language
You may also check:How to resolve the algorithm Remove duplicate elements step by step in the Amazing Hopper programming language