How to resolve the algorithm Verify distribution uniformity/Chi-squared test step by step in the jq programming language

Published on 12 May 2024 09:40 PM
#Jq

How to resolve the algorithm Verify distribution uniformity/Chi-squared test step by step in the jq programming language

Table of Contents

Problem Statement

Write a function to determine whether a given set of frequency counts could plausibly have come from a uniform distribution by using the

χ

2

{\displaystyle \chi ^{2}}

test with a significance level of 5%.
The function should return a boolean that is true if and only if the distribution is one that a uniform distribution (with appropriate number of degrees of freedom) may be expected to produce. Note: normally a two-tailed test would be used for this kind of problem.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Verify distribution uniformity/Chi-squared test step by step in the jq programming language

Source code in the jq programming language

def round($dec):
   if type == "string" then .
   else pow(10;$dec) as $m
   | . * $m | floor / $m
   end;

# sum of squares
def ss(s): reduce s as $x (0; . + ($x * $x));

# Cumulative density function of the chi-squared distribution with $k
# degrees of freedom
# The recursion formula for gamma is used for efficiency and robustness.
def Chi2_cdf($x; $k):
  if $x == 0 then 0
  elif $x > (1e3 * $k) then 1
  else 1e-15 as $tol  # for example
  | { s: 0, m: 0, term: (1 / ((($k/2)+1)|gamma)) }
  | until (.term|length < $tol; # length here is abs
      .s += .term
      | .m += 1
      | .term *= (($x/2) / (($k/2) + .m )) )
  | .s * ( ((-$x/2) + ($k/2)*(($x/2)|log)) | exp)
  end ;

# Input: array of frequencies
def chi2UniformDistance:
  (add / length) as $expected
  |  ss(.[] - $expected) / $expected;

# Input: a number
# Output: an indication of the probability of observing this value or higher 
#   assuming the value is drawn from a chi-squared distribution with $dof degrees
#   of freedom
def chi2Probability($dof):
  (1 - Chi2_cdf(.; $dof))
  | if . < 1e-10 then "< 1e-10"
    else .
    end;

# Input: array of frequencies
# Output: result of a two-tailed test based on the chi-squared statistic
# assuming the sample size is large enough
def chiIsUniform($significance):
  (length - 1) as $dof
  | chi2UniformDistance
  | Chi2_cdf(.; $dof) as $cdf
  | if $cdf
    then ($significance/2) as $s
    | $cdf > $s and $cdf < (1-$s)
    else false
    end;
  
def dsets: [
    [199809, 200665, 199607, 200270, 199649],
    [522573, 244456, 139979,  71531,  21461],
    [19,14,6,18,7,5,1],  # low entropy
    [9,11,9,10,15,11,5], # high entropy
    [20,20,20]           # made-up
];

def task:
  dsets[]
  | "Dataset: \(.)",
    ( chi2UniformDistance as $dist
      | (length - 1) as $dof
      | "DOF: \($dof)  D (Distance): \($dist)",
        "  Estimated probability of observing a value >= D: \($dist|chi2Probability($dof)|round(2))",
        "  Uniform? \( (select(chiIsUniform(0.05)) | "Yes") // "No" )\n" ) ;

task

  

You may also check:How to resolve the algorithm Pointers and references step by step in the Racket programming language
You may also check:How to resolve the algorithm Optional parameters step by step in the Lua programming language
You may also check:How to resolve the algorithm URL decoding step by step in the LiveCode programming language
You may also check:How to resolve the algorithm Universal Turing machine step by step in the Mercury programming language
You may also check:How to resolve the algorithm Count the coins step by step in the BBC BASIC programming language