How to resolve the algorithm Statistics/Basic step by step in the Go programming language

Published on 12 May 2024 09:40 PM
#Go

How to resolve the algorithm Statistics/Basic step by step in the Go programming language

Table of Contents

Problem Statement

Statistics is all about large groups of numbers.
When talking about a set of sampled data, most frequently used is their mean value and standard deviation (stddev).
If you have set of data xi where i =1,2,...n

When examining a large quantity of data, one often uses a histogram, which shows the counts of data samples falling into a prechosen set of intervals (or bins).
When plotted, often as bar graphs, it visually indicates how often each data value occurs. Task Using your language's random number routine, generate real numbers in the range of [0, 1]. It doesn't matter if you chose to use open or closed range.
Create 100 of such numbers (i.e. sample size 100) and calculate their mean and stddev.
Do so for sample size of 1,000 and 10,000, maybe even higher if you feel like.
Show a histogram of any of these sets.
Do you notice some patterns about the standard deviation? Extra Sometimes so much data need to be processed that it's impossible to keep all of them at once. Can you calculate the mean, stddev and histogram of a trillion numbers? (You don't really need to do a trillion numbers, just show how it can be done.) For a finite population with equal probabilities at all points, one can derive: Or, more verbosely. See also: Statistics/Normal distribution

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Statistics/Basic step by step in the Go programming language

First code: Sample data analysis

This Go program demonstrates how to perform statistical analysis on a sample of data. It takes an integer value n as input, generates n random floating-point numbers, and then calculates the mean (average), standard deviation, and a histogram of the data. Here is a breakdown of the code:

  1. main() function:

    • Calls the sample() function three times with different values of n to analyze samples of size 100, 1000, and 10000.
  2. sample() function:

    • Generates a slice of n random floating-point numbers.
    • Computes the sum and sum of squares of the data.
    • Calculates the mean (m) and standard deviation (math.Sqrt(ssq/float64(n)-m*m)).
    • Creates a histogram (h) with 10 bins, where each bin represents a range of 0.1 (e.g., bin 0 is 0-0.1, bin 1 is 0.1-0.2, etc.).
    • The histogram is represented as a series of asterisks (*), with the number of asterisks in a bin proportional to the count of data points in that bin.

Second code: Big sample data analysis using MapReduce

This Go program demonstrates how to parallelize statistical analysis using the MapReduce pattern for a very large dataset (10 million elements). The main idea is to divide the dataset into smaller chunks, process them in parallel, and then combine the results to obtain the final statistics. Here is a breakdown of the code:

  1. main() function:

    • Calls the bigSample() function with n set to 10 million.
  2. bigSample() function:

    • Computes the sum, sum of squares, and histogram by invoking the reduce() function with start and end values representing the range of data to process.
  3. reduce() function:

    • If n is less than a predefined threshold (threshold is set to 1 million), it calls getSegment() to fetch the data and computeSegment() to calculate the sum, sum of squares, and histogram for that segment.
    • If n exceeds the threshold, it recursively divides the task into two subproblems (reducing the size of n) and combines the results from the subproblems.
  4. getSegment() function:

    • Fetches a segment of the data (slice of floating-point numbers) based on the given range.
  5. computeSegment() function:

    • Computes the sum, sum of squares, and histogram for a segment of the data.

The MapReduce approach allows us to process large datasets in parallel, reducing the overall computation time and making it feasible to analyze massive amounts of data efficiently.

Source code in the go programming language

package main

import (
    "fmt"
    "math"
    "math/rand"
    "strings"
)

func main() {
    sample(100)
    sample(1000)
    sample(10000)
}

func sample(n int) {
    // generate data
    d := make([]float64, n)
    for i := range d {
        d[i] = rand.Float64()
    }
    // show mean, standard deviation
    var sum, ssq float64
    for _, s := range d {
        sum += s
        ssq += s * s
    }
    fmt.Println(n, "numbers")
    m := sum / float64(n)
    fmt.Println("Mean:  ", m)
    fmt.Println("Stddev:", math.Sqrt(ssq/float64(n)-m*m))
    // show histogram
    h := make([]int, 10)
    for _, s := range d {
        h[int(s*10)]++
    }
    for _, c := range h {
        fmt.Println(strings.Repeat("*", c*205/int(n)))
    }
    fmt.Println()
}


package main

import (
    "fmt"
    "math"
    "math/rand"
    "strings"
)

func main() {
    bigSample(1e7)
}

func bigSample(n int64) {
    sum, ssq, h := reduce(0, n)
    // compute final statistics and output as above
    fmt.Println(n, "numbers")
    m := sum / float64(n)
    fmt.Println("Mean:  ", m)
    fmt.Println("Stddev:", math.Sqrt(ssq/float64(n)-m*m))
    for _, c := range h {
        fmt.Println(strings.Repeat("*", c*205/int(n)))
    }
    fmt.Println()
}

const threshold = 1e6

func reduce(start, end int64) (sum, ssq float64, h []int) {
    n := end - start
    if n < threshold {
        d := getSegment(start, end)
        return computeSegment(d)
    }
    // map to two sub problems
    half := (start + end) / 2
    sum1, ssq1, h1 := reduce(start, half)
    sum2, ssq2, h2 := reduce(half, end)
    // combine results
    for i, c := range h2 {
        h1[i] += c
    }
    return sum1 + sum2, ssq1 + ssq2, h1
}

func getSegment(start, end int64) []float64 {
    d := make([]float64, end-start)
    for i := range d {
        d[i] = rand.Float64()
    }
    return d
}

func computeSegment(d []float64) (sum, ssq float64, h []int) {
    for _, s := range d {
        sum += s
        ssq += s * s
    }
    h = make([]int, 10)
    for _, s := range d {
        h[int(s*10)]++
    }
    return
}


  

You may also check:How to resolve the algorithm Polymorphic copy step by step in the REXX programming language
You may also check:How to resolve the algorithm Call an object method step by step in the CoffeeScript programming language
You may also check:How to resolve the algorithm Jaro similarity step by step in the Arturo programming language
You may also check:How to resolve the algorithm Knight's tour step by step in the REXX programming language
You may also check:How to resolve the algorithm Array concatenation step by step in the SenseTalk programming language