How to resolve the algorithm Statistics/Basic step by step in the jq programming language
How to resolve the algorithm Statistics/Basic step by step in the jq programming language
Table of Contents
Problem Statement
Statistics is all about large groups of numbers.
When talking about a set of sampled data, most frequently used is their mean value and standard deviation (stddev).
If you have set of data xi where i =1,2,...n
When examining a large quantity of data, one often uses a histogram, which shows the counts of data samples falling into a prechosen set of intervals (or bins).
When plotted, often as bar graphs, it visually indicates how often each data value occurs.
Task Using your language's random number routine, generate real numbers in the range of [0, 1]. It doesn't matter if you chose to use open or closed range.
Create 100 of such numbers (i.e. sample size 100) and calculate their mean and stddev.
Do so for sample size of 1,000 and 10,000, maybe even higher if you feel like.
Show a histogram of any of these sets.
Do you notice some patterns about the standard deviation?
Extra Sometimes so much data need to be processed that it's impossible to keep all of them at once. Can you calculate the mean, stddev and histogram of a trillion numbers? (You don't really need to do a trillion numbers, just show how it can be done.)
For a finite population with equal probabilities at all points, one can derive:
Or, more verbosely.
See also: Statistics/Normal distribution
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Statistics/Basic step by step in the jq programming language
Source code in the jq programming language
# Usage: prng N width
function prng {
cat /dev/urandom | tr -cd '0-9' | fold -w "$2" | head -n "$1"
}
# $histogram should be a JSON object, with buckets as keys and frequencies as values;
# $keys should be an array of all the potential bucket names (possibly integers)
# in the order to be used for display:
def pp($histogram; $keys):
([$histogram[]] | add) as $n # for scaling
| ($keys|length) as $length
| $keys[]
| "\(.) : \("*" * (($histogram[tostring] // 0) * 20 * $length / $n) // "" )" ;
# `basic_stats` computes the unadjusted standard deviation
# and assumes the sum of squares (ss) can be computed without concern for overflow.
# The histogram is based on allocation to a bucket, which is made
# using `bucketize`, e.g. `.*10|floor`
def basic_stats(stream; bucketize):
# Use
reduce stream as $x ({histogram: {}};
.count += 1
| .sum += $x
| .ss += $x * $x
| ($x | bucketize | tostring) as $bucket
| .histogram[$bucket] += 1 )
| .mean = (.sum / .count)
| .stddev = (((.ss/.count) - .mean*.mean) | sqrt) ;
basic_stats( "0." + inputs | tonumber; .*10|floor)
| "
Basic statistics for \(.count) PRNs in [0,1]:
mean: \(.mean)
stddev: \(.stddev)
Histogram dividing [0,1] into 10 equal intervals:",
pp(.histogram; [range(0;10)] )
for n in 100 1000 1000000 100000000; do
echo "Basic statistics for $n PRNs in [0,1]"
prng $n 10 | jq -nrR -f basicStats.jq
echo
done
You may also check:How to resolve the algorithm Wieferich primes step by step in the jq programming language
You may also check:How to resolve the algorithm URL decoding step by step in the Julia programming language
You may also check:How to resolve the algorithm Entropy step by step in the Fōrmulæ programming language
You may also check:How to resolve the algorithm Cholesky decomposition step by step in the Scilab programming language
You may also check:How to resolve the algorithm Associative array/Merging step by step in the SenseTalk programming language