How to resolve the algorithm Verify distribution uniformity/Chi-squared test step by step in the jq programming language
Published on 12 May 2024 09:40 PM
How to resolve the algorithm Verify distribution uniformity/Chi-squared test step by step in the jq programming language
Table of Contents
Problem Statement
Write a function to determine whether a given set of frequency counts could plausibly have come from a uniform distribution by using the
χ
2
{\displaystyle \chi ^{2}}
test with a significance level of 5%.
The function should return a boolean that is true if and only if the distribution is one that a uniform distribution (with appropriate number of degrees of freedom) may be expected to produce.
Note: normally a two-tailed test would be used for this kind of problem.
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Verify distribution uniformity/Chi-squared test step by step in the jq programming language
Source code in the jq programming language
def round($dec):
if type == "string" then .
else pow(10;$dec) as $m
| . * $m | floor / $m
end;
# sum of squares
def ss(s): reduce s as $x (0; . + ($x * $x));
# Cumulative density function of the chi-squared distribution with $k
# degrees of freedom
# The recursion formula for gamma is used for efficiency and robustness.
def Chi2_cdf($x; $k):
if $x == 0 then 0
elif $x > (1e3 * $k) then 1
else 1e-15 as $tol # for example
| { s: 0, m: 0, term: (1 / ((($k/2)+1)|gamma)) }
| until (.term|length < $tol; # length here is abs
.s += .term
| .m += 1
| .term *= (($x/2) / (($k/2) + .m )) )
| .s * ( ((-$x/2) + ($k/2)*(($x/2)|log)) | exp)
end ;
# Input: array of frequencies
def chi2UniformDistance:
(add / length) as $expected
| ss(.[] - $expected) / $expected;
# Input: a number
# Output: an indication of the probability of observing this value or higher
# assuming the value is drawn from a chi-squared distribution with $dof degrees
# of freedom
def chi2Probability($dof):
(1 - Chi2_cdf(.; $dof))
| if . < 1e-10 then "< 1e-10"
else .
end;
# Input: array of frequencies
# Output: result of a two-tailed test based on the chi-squared statistic
# assuming the sample size is large enough
def chiIsUniform($significance):
(length - 1) as $dof
| chi2UniformDistance
| Chi2_cdf(.; $dof) as $cdf
| if $cdf
then ($significance/2) as $s
| $cdf > $s and $cdf < (1-$s)
else false
end;
def dsets: [
[199809, 200665, 199607, 200270, 199649],
[522573, 244456, 139979, 71531, 21461],
[19,14,6,18,7,5,1], # low entropy
[9,11,9,10,15,11,5], # high entropy
[20,20,20] # made-up
];
def task:
dsets[]
| "Dataset: \(.)",
( chi2UniformDistance as $dist
| (length - 1) as $dof
| "DOF: \($dof) D (Distance): \($dist)",
" Estimated probability of observing a value >= D: \($dist|chi2Probability($dof)|round(2))",
" Uniform? \( (select(chiIsUniform(0.05)) | "Yes") // "No" )\n" ) ;
task
You may also check:How to resolve the algorithm Pointers and references step by step in the Racket programming language
You may also check:How to resolve the algorithm Optional parameters step by step in the Lua programming language
You may also check:How to resolve the algorithm URL decoding step by step in the LiveCode programming language
You may also check:How to resolve the algorithm Universal Turing machine step by step in the Mercury programming language
You may also check:How to resolve the algorithm Count the coins step by step in the BBC BASIC programming language