How to resolve the algorithm Verify distribution uniformity/Chi-squared test step by step in the C programming language

Published on 7 June 2024 03:52 AM
#C

How to resolve the algorithm Verify distribution uniformity/Chi-squared test step by step in the C programming language

Table of Contents

Problem Statement

Write a function to determine whether a given set of frequency counts could plausibly have come from a uniform distribution by using the

χ

2

{\displaystyle \chi ^{2}}

test with a significance level of 5%.
The function should return a boolean that is true if and only if the distribution is one that a uniform distribution (with appropriate number of degrees of freedom) may be expected to produce. Note: normally a two-tailed test would be used for this kind of problem.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Verify distribution uniformity/Chi-squared test step by step in the C programming language

This C code defines functions for numerical integration using Simpson's 3/8 rule, calculating the incomplete gamma function, and performing a chi-square uniformity test on a set of data. A detailed explanation of each part of the code is provided below:

  1. Numerical Integration:

    • double Simpson3_8( Ifctn f, double a, double b, int N): This function performs numerical integration using the Simpson's 3/8 rule. It takes a function pointer f, the integration interval endpoints a and b, and the number of subintervals N. It calculates and returns the integral of f over the specified interval.

    • #define A 12: This defines a constant A that sets the number of approximations for the incomplete gamma function calculation.

    • double Gamma_Spouge( double z ): This function calculates an approximation of the incomplete gamma function using Spouge's approximation. It takes a complex number z and returns the approximation.

  2. Incomplete Gamma Function:

    • double aa1;: This global variable is used to store the value of a-1 for the incomplete gamma function calculation.

    • double f0( double t): This function defines the integrand for the incomplete gamma function calculation. It takes a value t and returns pow(t, aa1)*exp(-t).

    • double GammaIncomplete_Q( double a, double x): This function calculates the incomplete gamma function Q(a,x) using numerical integration and the Spouge approximation. It takes the parameters a and x and returns the value of Q(a,x).

  3. Chi-Square Uniformity Test:

    • double chi2UniformDistance( double *ds, int dslen): This function calculates the chi-square distance between a set of data ds of length dslen and a uniform distribution. It returns the chi-square distance.

    • double chi2Probability( int dof, double distance): This function calculates the chi-square probability for a given number of degrees of freedom dof and distance distance. It returns the chi-square probability.

    • int chiIsUniform( double *dset, int dslen, double significance): This function checks if a set of data dset of length dslen is uniformly distributed based on a given significance level significance. It returns 0 if the data is not uniformly distributed and 1 if it is.

  4. Main Function:

    • The main function creates two sets of data, dset1 and dset2, and calculates their chi-square uniform distribution test results.

    • It prints the data sets, their degrees of freedom, chi-square distances, chi-square probabilities, and whether they are uniformly distributed based on a significance level of 0.05.

    • The output shows whether each data set is uniformly distributed or not based on the chi-square test.

Source code in the c programming language

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif

typedef double (* Ifctn)( double t);
/* Numerical integration method */
double Simpson3_8( Ifctn f, double a, double b, int N)
{
    int j;
    double l1;
    double h = (b-a)/N;
    double h1 = h/3.0;
    double sum = f(a) + f(b);

    for (j=3*N-1; j>0; j--) {
        l1 = (j%3)? 3.0 : 2.0;
        sum += l1*f(a+h1*j) ;
    }
    return h*sum/8.0;
}

#define A 12
double Gamma_Spouge( double z )
{
    int k;
    static double cspace[A];
    static double *coefs = NULL;
    double accum;
    double a = A;

    if (!coefs) {
        double k1_factrl = 1.0;
        coefs = cspace;
        coefs[0] = sqrt(2.0*M_PI);
        for(k=1; k<A; k++) {
            coefs[k] = exp(a-k) * pow(a-k,k-0.5) / k1_factrl;
            k1_factrl *= -k;
        }
    }

    accum = coefs[0];
    for (k=1; k<A; k++) {
        accum += coefs[k]/(z+k);
    }
    accum *= exp(-(z+a)) * pow(z+a, z+0.5);
    return accum/z;
}

double aa1;
double f0( double t)
{
    return  pow(t, aa1)*exp(-t); 
}

double GammaIncomplete_Q( double a, double x)
{
    double y, h = 1.5e-2;  /* approximate integration step size */

    /* this cuts off the tail of the integration to speed things up */
    y = aa1 = a-1;
    while((f0(y) * (x-y) > 2.0e-8) && (y < x))   y += .4;
    if (y>x) y=x;

    return 1.0 - Simpson3_8( &f0, 0, y, (int)(y/h))/Gamma_Spouge(a);
}


double chi2UniformDistance( double *ds, int dslen)
{
    double expected = 0.0;
    double sum = 0.0;
    int k;

    for (k=0; k<dslen; k++) 
        expected += ds[k];
    expected /= k;

    for (k=0; k<dslen; k++) {
        double x = ds[k] - expected;
        sum += x*x;
    }
    return sum/expected;
}

double chi2Probability( int dof, double distance)
{
    return GammaIncomplete_Q( 0.5*dof, 0.5*distance);
}
    
int chiIsUniform( double *dset, int dslen, double significance)
{
    int dof = dslen -1;
    double dist = chi2UniformDistance( dset, dslen);
    return chi2Probability( dof, dist ) > significance;
}


int main(int argc, char **argv)
{
    double dset1[] = { 199809., 200665., 199607., 200270., 199649. };
    double dset2[] = { 522573., 244456., 139979.,  71531.,  21461. };
    double *dsets[] = { dset1, dset2 };
    int     dslens[] = { 5, 5 };
    int k, l;
    double  dist, prob;
    int dof;

    for (k=0; k<2; k++) {
        printf("Dataset: [ ");
        for(l=0;l<dslens[k]; l++) 
            printf("%.0f, ", dsets[k][l]);
            
        printf("]\n");
        dist = chi2UniformDistance(dsets[k], dslens[k]);
        dof = dslens[k]-1;
        printf("dof: %d  distance: %.4f", dof, dist);
        prob = chi2Probability( dof, dist );
        printf(" probability: %.6f", prob);
        printf(" uniform? %s\n", chiIsUniform(dsets[k], dslens[k], 0.05)? "Yes":"No");
    }
    return 0;
}


  

You may also check:How to resolve the algorithm Van Eck sequence step by step in the AArch64 Assembly programming language
You may also check:How to resolve the algorithm Jacobi symbol step by step in the Raku programming language
You may also check:How to resolve the algorithm Bernoulli numbers step by step in the Lua programming language
You may also check:How to resolve the algorithm Integer sequence step by step in the Wren programming language
You may also check:How to resolve the algorithm Water collected between towers step by step in the Raku programming language