How to resolve the algorithm Huffman coding step by step in the C# programming language
How to resolve the algorithm Huffman coding step by step in the C# programming language
Table of Contents
Problem Statement
Huffman encoding is a way to assign binary codes to symbols that reduces the overall number of bits used to encode a typical string of those symbols. For example, if you use letters as symbols and have details of the frequency of occurrence of those letters in typical strings, then you could just encode each letter with a fixed number of bits, such as in ASCII codes. You can do better than this by encoding more frequently occurring letters such as e and a, with smaller bit strings; and less frequently occurring letters such as q and x with longer bit strings. Any string of letters will be encoded as a string of bits that are no-longer of the same length per letter. To successfully decode such as string, the smaller codes assigned to letters such as 'e' cannot occur as a prefix in the larger codes such as that for 'x'. The Huffman coding scheme takes each symbol and its weight (or frequency of occurrence), and generates proper encodings for each symbol taking account of the weights of each symbol, so that higher weighted symbols have fewer bits in their encoding. (See the WP article for more information). A Huffman encoding can be computed by first creating a tree of nodes:
Traverse the constructed binary tree from root to leaves assigning and accumulating a '0' for one branch and a '1' for the other at each node. The accumulated zeros and ones at each leaf constitute a Huffman encoding for those symbols and weights:
Using the characters and their frequency from the string: create a program to generate a Huffman encoding for each character as a table.
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Huffman coding step by step in the C# programming language
1. The first part of the code implements a priority queue, which is a data structure that maintains a sorted list of elements.
In this case the priority queue is generic and can contain elements of any type T
that implements the IComparable
interface.
The priority queue has an internal list LstHeap
which stores the elements, and the heap property is maintained by always keeping the first element of the list as the smallest element.
The Count
property returns the number of elements in the priority queue, and the Add
method adds an element to the priority queue and maintains the heap property by calling UpHeap
method.
The Peek
method returns the smallest element in the priority queue, and the Pop
method removes and returns the smallest element from the priority queue.
2. The second part of the code implements a binary tree node used in the Huffman algorithm.
Each node has a probability, a value, a left and right son, and a parent node.
It keeps track of whether it's a leaf node and whether it is a left or a right son.
The CompareTo
method compares the probabilities of two nodes and returns a negative number if the first node has a smaller probability than the second node, a positive number if the first node has a larger probability than the second node, and 0 if the probabilities are equal.
3. The third part of the code implements the Huffman algorithm to encode and decode data.
The algorithm works by creating a frequency table of the characters in the data, and then using this frequency table to create a binary tree.
The binary tree is then used to encode the data into a string of bits, and the string of bits can be used to decode the data.
The Encode
method takes a value and returns a list of integers representing the encoded value.
The Decode
method takes a list of integers and returns the decoded value.
4. The fourth part of the code implements a program that uses the Huffman algorithm to encode and decode a string of characters.
The program first creates a Huffman
object with the string of characters, and then calls the Encode
and Decode
methods to encode and decode the string.
The program then checks if the output string is equal to the input string and prints a message to the console.
Source code in the csharp programming language
using System;
using System.Collections.Generic;
namespace Huffman_Encoding
{
public class PriorityQueue<T> where T : IComparable
{
protected List<T> LstHeap = new List<T>();
public virtual int Count
{
get { return LstHeap.Count; }
}
public virtual void Add(T val)
{
LstHeap.Add(val);
SetAt(LstHeap.Count - 1, val);
UpHeap(LstHeap.Count - 1);
}
public virtual T Peek()
{
if (LstHeap.Count == 0)
{
throw new IndexOutOfRangeException("Peeking at an empty priority queue");
}
return LstHeap[0];
}
public virtual T Pop()
{
if (LstHeap.Count == 0)
{
throw new IndexOutOfRangeException("Popping an empty priority queue");
}
T valRet = LstHeap[0];
SetAt(0, LstHeap[LstHeap.Count - 1]);
LstHeap.RemoveAt(LstHeap.Count - 1);
DownHeap(0);
return valRet;
}
protected virtual void SetAt(int i, T val)
{
LstHeap[i] = val;
}
protected bool RightSonExists(int i)
{
return RightChildIndex(i) < LstHeap.Count;
}
protected bool LeftSonExists(int i)
{
return LeftChildIndex(i) < LstHeap.Count;
}
protected int ParentIndex(int i)
{
return (i - 1) / 2;
}
protected int LeftChildIndex(int i)
{
return 2 * i + 1;
}
protected int RightChildIndex(int i)
{
return 2 * (i + 1);
}
protected T ArrayVal(int i)
{
return LstHeap[i];
}
protected T Parent(int i)
{
return LstHeap[ParentIndex(i)];
}
protected T Left(int i)
{
return LstHeap[LeftChildIndex(i)];
}
protected T Right(int i)
{
return LstHeap[RightChildIndex(i)];
}
protected void Swap(int i, int j)
{
T valHold = ArrayVal(i);
SetAt(i, LstHeap[j]);
SetAt(j, valHold);
}
protected void UpHeap(int i)
{
while (i > 0 && ArrayVal(i).CompareTo(Parent(i)) > 0)
{
Swap(i, ParentIndex(i));
i = ParentIndex(i);
}
}
protected void DownHeap(int i)
{
while (i >= 0)
{
int iContinue = -1;
if (RightSonExists(i) && Right(i).CompareTo(ArrayVal(i)) > 0)
{
iContinue = Left(i).CompareTo(Right(i)) < 0 ? RightChildIndex(i) : LeftChildIndex(i);
}
else if (LeftSonExists(i) && Left(i).CompareTo(ArrayVal(i)) > 0)
{
iContinue = LeftChildIndex(i);
}
if (iContinue >= 0 && iContinue < LstHeap.Count)
{
Swap(i, iContinue);
}
i = iContinue;
}
}
}
internal class HuffmanNode<T> : IComparable
{
internal HuffmanNode(double probability, T value)
{
Probability = probability;
LeftSon = RightSon = Parent = null;
Value = value;
IsLeaf = true;
}
internal HuffmanNode(HuffmanNode<T> leftSon, HuffmanNode<T> rightSon)
{
LeftSon = leftSon;
RightSon = rightSon;
Probability = leftSon.Probability + rightSon.Probability;
leftSon.IsZero = true;
rightSon.IsZero = false;
leftSon.Parent = rightSon.Parent = this;
IsLeaf = false;
}
internal HuffmanNode<T> LeftSon { get; set; }
internal HuffmanNode<T> RightSon { get; set; }
internal HuffmanNode<T> Parent { get; set; }
internal T Value { get; set; }
internal bool IsLeaf { get; set; }
internal bool IsZero { get; set; }
internal int Bit
{
get { return IsZero ? 0 : 1; }
}
internal bool IsRoot
{
get { return Parent == null; }
}
internal double Probability { get; set; }
public int CompareTo(object obj)
{
return -Probability.CompareTo(((HuffmanNode<T>) obj).Probability);
}
}
public class Huffman<T> where T : IComparable
{
private readonly Dictionary<T, HuffmanNode<T>> _leafDictionary = new Dictionary<T, HuffmanNode<T>>();
private readonly HuffmanNode<T> _root;
public Huffman(IEnumerable<T> values)
{
var counts = new Dictionary<T, int>();
var priorityQueue = new PriorityQueue<HuffmanNode<T>>();
int valueCount = 0;
foreach (T value in values)
{
if (!counts.ContainsKey(value))
{
counts[value] = 0;
}
counts[value]++;
valueCount++;
}
foreach (T value in counts.Keys)
{
var node = new HuffmanNode<T>((double) counts[value] / valueCount, value);
priorityQueue.Add(node);
_leafDictionary[value] = node;
}
while (priorityQueue.Count > 1)
{
HuffmanNode<T> leftSon = priorityQueue.Pop();
HuffmanNode<T> rightSon = priorityQueue.Pop();
var parent = new HuffmanNode<T>(leftSon, rightSon);
priorityQueue.Add(parent);
}
_root = priorityQueue.Pop();
_root.IsZero = false;
}
public List<int> Encode(T value)
{
var returnValue = new List<int>();
Encode(value, returnValue);
return returnValue;
}
public void Encode(T value, List<int> encoding)
{
if (!_leafDictionary.ContainsKey(value))
{
throw new ArgumentException("Invalid value in Encode");
}
HuffmanNode<T> nodeCur = _leafDictionary[value];
var reverseEncoding = new List<int>();
while (!nodeCur.IsRoot)
{
reverseEncoding.Add(nodeCur.Bit);
nodeCur = nodeCur.Parent;
}
reverseEncoding.Reverse();
encoding.AddRange(reverseEncoding);
}
public List<int> Encode(IEnumerable<T> values)
{
var returnValue = new List<int>();
foreach (T value in values)
{
Encode(value, returnValue);
}
return returnValue;
}
public T Decode(List<int> bitString, ref int position)
{
HuffmanNode<T> nodeCur = _root;
while (!nodeCur.IsLeaf)
{
if (position > bitString.Count)
{
throw new ArgumentException("Invalid bitstring in Decode");
}
nodeCur = bitString[position++] == 0 ? nodeCur.LeftSon : nodeCur.RightSon;
}
return nodeCur.Value;
}
public List<T> Decode(List<int> bitString)
{
int position = 0;
var returnValue = new List<T>();
while (position != bitString.Count)
{
returnValue.Add(Decode(bitString, ref position));
}
return returnValue;
}
}
internal class Program
{
private const string Example = "this is an example for huffman encoding";
private static void Main()
{
var huffman = new Huffman<char>(Example);
List<int> encoding = huffman.Encode(Example);
List<char> decoding = huffman.Decode(encoding);
var outString = new string(decoding.ToArray());
Console.WriteLine(outString == Example ? "Encoding/decoding worked" : "Encoding/Decoding failed");
var chars = new HashSet<char>(Example);
foreach (char c in chars)
{
encoding = huffman.Encode(c);
Console.Write("{0}: ", c);
foreach (int bit in encoding)
{
Console.Write("{0}", bit);
}
Console.WriteLine();
}
Console.ReadKey();
}
}
}
You may also check:How to resolve the algorithm Find the missing permutation step by step in the Ursala programming language
You may also check:How to resolve the algorithm Luhn test of credit card numbers step by step in the MiniScript programming language
You may also check:How to resolve the algorithm Loops/Break step by step in the Phix programming language
You may also check:How to resolve the algorithm Stem-and-leaf plot step by step in the Maxima programming language
You may also check:How to resolve the algorithm Reverse words in a string step by step in the BASIC256 programming language