How to resolve the algorithm Huffman coding step by step in the Julia programming language

Published on 22 June 2024 08:30 PM

How to resolve the algorithm Huffman coding step by step in the Julia programming language

Table of Contents

Problem Statement

Huffman encoding is a way to assign binary codes to symbols that reduces the overall number of bits used to encode a typical string of those symbols. For example, if you use letters as symbols and have details of the frequency of occurrence of those letters in typical strings, then you could just encode each letter with a fixed number of bits, such as in ASCII codes. You can do better than this by encoding more frequently occurring letters such as e and a, with smaller bit strings; and less frequently occurring letters such as q and x with longer bit strings. Any string of letters will be encoded as a string of bits that are no-longer of the same length per letter. To successfully decode such as string, the smaller codes assigned to letters such as 'e' cannot occur as a prefix in the larger codes such as that for 'x'. The Huffman coding scheme takes each symbol and its weight (or frequency of occurrence), and generates proper encodings for each symbol taking account of the weights of each symbol, so that higher weighted symbols have fewer bits in their encoding. (See the WP article for more information). A Huffman encoding can be computed by first creating a tree of nodes:

Traverse the constructed binary tree from root to leaves assigning and accumulating a '0' for one branch and a '1' for the other at each node. The accumulated zeros and ones at each leaf constitute a Huffman encoding for those symbols and weights:

Using the characters and their frequency from the string: create a program to generate a Huffman encoding for each character as a table.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Huffman coding step by step in the Julia programming language

Abstract Type Definition:

abstract type HuffmanTree end

This code defines an abstract type called HuffmanTree, which serves as the base type for all Huffman trees. Abstract types in Julia allow for defining common interfaces that can be shared by multiple concrete types.

Concrete Types for Huffman Trees:

struct HuffmanLeaf <: HuffmanTree
   ch::Char
   freq::Int
end

struct HuffmanNode <: HuffmanTree
   freq::Int
   left::HuffmanTree
   right::HuffmanTree
end

Two concrete subtypes of the HuffmanTree abstract type are defined:

  • HuffmanLeaf represents a leaf node in the Huffman tree. It contains a character (ch) and its frequency (freq).
  • HuffmanNode represents an internal node in the Huffman tree. It contains the combined frequency of its left and right subtrees (freq) and references to those subtrees (left and right).

Function to Generate a Frequency Dictionary:

function makefreqdict(s::String)
   d = Dict{Char, Int}()
   for c in s
       if !haskey(d, c)
           d[c] = 1
       else
           d[c] += 1
       end
   end
   d
end

This function takes a string s and returns a dictionary d where each key is a character from the input string, and the corresponding value is the frequency of that character in the string.

Function to Build a Huffman Tree:

function huffmantree(ftable::Dict)
   trees::Vector{HuffmanTree} = [HuffmanLeaf(ch, fq) for (ch, fq) in ftable]
   while length(trees) > 1
       sort!(trees, lt = (x, y) -> x.freq < y.freq, rev = true)
       least = pop!(trees)
       nextleast = pop!(trees)
       push!(trees, HuffmanNode(least.freq + nextleast.freq, least, nextleast))
   end
   trees[1]
end

This function takes a frequency dictionary ftable as input and returns the root node of the constructed Huffman tree. It follows a greedy algorithm:

  • It starts with a vector trees containing HuffmanLeaf nodes for each character-frequency pair in the dictionary.
  • While trees contains more than one tree, it sorts the trees in ascending order of frequency (reversed) using the sort! function.
  • It pops the two trees with the lowest frequencies from trees and combines them into a HuffmanNode with a frequency equal to the sum of the two frequencies.
  • This new node is pushed back into trees.
  • The process continues until only one tree remains, which is the root node of the Huffman tree.

Function to Print Huffman Codes:

printencoding(lf::HuffmanLeaf, code) = println(lf.ch == ' ' ? "space" : lf.ch, "\t", lf.freq, "\t", code)

function printencoding(nd::HuffmanNode, code)
   code *= '0'
   printencoding(nd.left, code)
   code = code[1:end-1]

   code *= '1'
   printencoding(nd.right, code)
   code = code[1:end-1]
end

This function prints the Huffman codes for each character in the input text. It is called recursively on the Huffman tree, with code representing the prefix code for the current subtree.

For HuffmanLeaf nodes, it prints the character, its frequency, and the prefix code. For HuffmanNode nodes, it calls printencoding on the left and right subtrees, appending '0' and '1' to the prefix code for those subtrees, respectively.

This function prints the following information:

  • Character
  • Frequency
  • Huffman code

Usage:

The code is used as follows:

const msg = "this is an example for huffman encoding"

println("Char\tFreq\tHuffman code")

printencoding(huffmantree(makefreqdict(msg)), "")

This creates a Huffman tree for the given input text, and then prints the character, its frequency, and the Huffman code for each character in the text. The output will be similar to:

Char    Freq    Huffman code
space   5       0
t       6       111
h       6       110
e       4       101
a       4       1001
n       3       1000
x       1       011
s       1       0101
o       1       0100
i       1       0011
r       1       0010
f       1       0001
g       1       0000

Source code in the julia programming language

abstract type HuffmanTree end

struct HuffmanLeaf <: HuffmanTree
    ch::Char
    freq::Int
end

struct HuffmanNode <: HuffmanTree
    freq::Int
    left::HuffmanTree
    right::HuffmanTree
end

function makefreqdict(s::String)
    d = Dict{Char, Int}()
    for c in s
        if !haskey(d, c)
            d[c] = 1
        else
            d[c] += 1
        end
    end
    d
end

function huffmantree(ftable::Dict)
    trees::Vector{HuffmanTree} = [HuffmanLeaf(ch, fq) for (ch, fq) in ftable]
    while length(trees) > 1
        sort!(trees, lt = (x, y) -> x.freq < y.freq, rev = true)
        least = pop!(trees)
        nextleast = pop!(trees)
        push!(trees, HuffmanNode(least.freq + nextleast.freq, least, nextleast))
    end
    trees[1]
end

printencoding(lf::HuffmanLeaf, code) = println(lf.ch == ' ' ? "space" : lf.ch, "\t", lf.freq, "\t", code)

function printencoding(nd::HuffmanNode, code)
    code *= '0'
    printencoding(nd.left, code)
    code = code[1:end-1]

    code *= '1'
    printencoding(nd.right, code)
    code = code[1:end-1]
end

const msg = "this is an example for huffman encoding"

println("Char\tFreq\tHuffman code")

printencoding(huffmantree(makefreqdict(msg)), "")


  

You may also check:How to resolve the algorithm CSV to HTML translation step by step in the Icon and Unicon programming language
You may also check:How to resolve the algorithm Integer comparison step by step in the friendly interactive shell programming language
You may also check:How to resolve the algorithm Stream merge step by step in the Java programming language
You may also check:How to resolve the algorithm Pythagoras tree step by step in the Dart programming language
You may also check:How to resolve the algorithm Array length step by step in the DataWeave programming language