How to resolve the algorithm Statistics/Basic step by step in the Go programming language
How to resolve the algorithm Statistics/Basic step by step in the Go programming language
Table of Contents
Problem Statement
Statistics is all about large groups of numbers.
When talking about a set of sampled data, most frequently used is their mean value and standard deviation (stddev).
If you have set of data xi where i =1,2,...n
When examining a large quantity of data, one often uses a histogram, which shows the counts of data samples falling into a prechosen set of intervals (or bins).
When plotted, often as bar graphs, it visually indicates how often each data value occurs.
Task Using your language's random number routine, generate real numbers in the range of [0, 1]. It doesn't matter if you chose to use open or closed range.
Create 100 of such numbers (i.e. sample size 100) and calculate their mean and stddev.
Do so for sample size of 1,000 and 10,000, maybe even higher if you feel like.
Show a histogram of any of these sets.
Do you notice some patterns about the standard deviation?
Extra Sometimes so much data need to be processed that it's impossible to keep all of them at once. Can you calculate the mean, stddev and histogram of a trillion numbers? (You don't really need to do a trillion numbers, just show how it can be done.)
For a finite population with equal probabilities at all points, one can derive:
Or, more verbosely.
See also: Statistics/Normal distribution
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Statistics/Basic step by step in the Go programming language
First code: Sample data analysis
This Go program demonstrates how to perform statistical analysis on a sample of data. It takes an integer value n
as input, generates n
random floating-point numbers, and then calculates the mean (average), standard deviation, and a histogram of the data. Here is a breakdown of the code:
-
main()
function:- Calls the
sample()
function three times with different values ofn
to analyze samples of size 100, 1000, and 10000.
- Calls the
-
sample()
function:- Generates a slice of
n
random floating-point numbers. - Computes the sum and sum of squares of the data.
- Calculates the mean (
m
) and standard deviation (math.Sqrt(ssq/float64(n)-m*m)
). - Creates a histogram (
h
) with 10 bins, where each bin represents a range of 0.1 (e.g., bin 0 is 0-0.1, bin 1 is 0.1-0.2, etc.). - The histogram is represented as a series of asterisks (*), with the number of asterisks in a bin proportional to the count of data points in that bin.
- Generates a slice of
Second code: Big sample data analysis using MapReduce
This Go program demonstrates how to parallelize statistical analysis using the MapReduce pattern for a very large dataset (10 million elements). The main idea is to divide the dataset into smaller chunks, process them in parallel, and then combine the results to obtain the final statistics. Here is a breakdown of the code:
-
main()
function:- Calls the
bigSample()
function withn
set to 10 million.
- Calls the
-
bigSample()
function:- Computes the sum, sum of squares, and histogram by invoking the
reduce()
function withstart
andend
values representing the range of data to process.
- Computes the sum, sum of squares, and histogram by invoking the
-
reduce()
function:- If
n
is less than a predefined threshold (threshold
is set to 1 million), it callsgetSegment()
to fetch the data andcomputeSegment()
to calculate the sum, sum of squares, and histogram for that segment. - If
n
exceeds the threshold, it recursively divides the task into two subproblems (reducing the size ofn
) and combines the results from the subproblems.
- If
-
getSegment()
function:- Fetches a segment of the data (slice of floating-point numbers) based on the given range.
-
computeSegment()
function:- Computes the sum, sum of squares, and histogram for a segment of the data.
The MapReduce approach allows us to process large datasets in parallel, reducing the overall computation time and making it feasible to analyze massive amounts of data efficiently.
Source code in the go programming language
package main
import (
"fmt"
"math"
"math/rand"
"strings"
)
func main() {
sample(100)
sample(1000)
sample(10000)
}
func sample(n int) {
// generate data
d := make([]float64, n)
for i := range d {
d[i] = rand.Float64()
}
// show mean, standard deviation
var sum, ssq float64
for _, s := range d {
sum += s
ssq += s * s
}
fmt.Println(n, "numbers")
m := sum / float64(n)
fmt.Println("Mean: ", m)
fmt.Println("Stddev:", math.Sqrt(ssq/float64(n)-m*m))
// show histogram
h := make([]int, 10)
for _, s := range d {
h[int(s*10)]++
}
for _, c := range h {
fmt.Println(strings.Repeat("*", c*205/int(n)))
}
fmt.Println()
}
package main
import (
"fmt"
"math"
"math/rand"
"strings"
)
func main() {
bigSample(1e7)
}
func bigSample(n int64) {
sum, ssq, h := reduce(0, n)
// compute final statistics and output as above
fmt.Println(n, "numbers")
m := sum / float64(n)
fmt.Println("Mean: ", m)
fmt.Println("Stddev:", math.Sqrt(ssq/float64(n)-m*m))
for _, c := range h {
fmt.Println(strings.Repeat("*", c*205/int(n)))
}
fmt.Println()
}
const threshold = 1e6
func reduce(start, end int64) (sum, ssq float64, h []int) {
n := end - start
if n < threshold {
d := getSegment(start, end)
return computeSegment(d)
}
// map to two sub problems
half := (start + end) / 2
sum1, ssq1, h1 := reduce(start, half)
sum2, ssq2, h2 := reduce(half, end)
// combine results
for i, c := range h2 {
h1[i] += c
}
return sum1 + sum2, ssq1 + ssq2, h1
}
func getSegment(start, end int64) []float64 {
d := make([]float64, end-start)
for i := range d {
d[i] = rand.Float64()
}
return d
}
func computeSegment(d []float64) (sum, ssq float64, h []int) {
for _, s := range d {
sum += s
ssq += s * s
}
h = make([]int, 10)
for _, s := range d {
h[int(s*10)]++
}
return
}
You may also check:How to resolve the algorithm Polymorphic copy step by step in the REXX programming language
You may also check:How to resolve the algorithm Call an object method step by step in the CoffeeScript programming language
You may also check:How to resolve the algorithm Jaro similarity step by step in the Arturo programming language
You may also check:How to resolve the algorithm Knight's tour step by step in the REXX programming language
You may also check:How to resolve the algorithm Array concatenation step by step in the SenseTalk programming language