How to resolve the algorithm Burrows–Wheeler transform step by step in the C++ programming language
How to resolve the algorithm Burrows–Wheeler transform step by step in the C++ programming language
Table of Contents
Problem Statement
The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding. More importantly, the transformation is reversible, without needing to store any additional data. The BWT is thus a "free" method of improving the efficiency of text compression algorithms, costing only some extra computation.
Source: Burrows–Wheeler transform
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Burrows–Wheeler transform step by step in the C++ programming language
The provided C++ code demonstrates the Burrows-Wheeler Transform (BWT) and its inverse, the Inverse Burrows-Wheeler Transform (IBWT), which are used for data compression. Here's a breakdown of the code:
-
Constants:
STX
: Represents the Start-of-Text (SOH) character, used as a special marker at the beginning of the input string.ETX
: Represents the End-of-Text (EOT) character, used as a special marker at the end of the input string.
-
rotate
Function:- Rotates the characters of a string
a
by one position to the right, moving the last character to the beginning.
- Rotates the characters of a string
-
bwt
Function:- Takes a string
s
as input and applies the BWT algorithm:- Inserts the
STX
andETX
characters at the beginning and end ofs
, respectively. - Creates a table of rotated strings by repeatedly rotating
ss
by one position. - Sorts the table lexicographically.
- Returns the last character of each sorted string, concatenated into a new string
out
.
- Inserts the
- Takes a string
-
ibwt
Function:- Takes a string
r
(the result of BWT) and applies the IBWT algorithm to recover the original string:- Initializes a table of strings of length
len(r)
. - For each character in
r
, rotates all the strings in the table to the left and appends the character to the end of the string. - Sorts the table.
- Identifies the row that ends with
ETX
, removes theSTX
andETX
characters, and returns the substring as the original string.
- Initializes a table of strings of length
- Takes a string
-
makePrintable
Function:- Converts special characters
STX
andETX
to printable characters '^' and '|' for display purposes.
- Converts special characters
-
Main Function:
- Defines test strings
tests
. - Iterates over the test strings and demonstrates the BWT and IBWT operations:
- Displays the original string in printable format.
- Applies the BWT and displays the transformed string.
- Applies the IBWT and verifies if the original string is recovered.
- Defines test strings
Overall, this code illustrates the BWT and IBWT algorithms, which are commonly used in lossless data compression techniques like the Burrows-Wheeler Block Sorting Transform (BWT-BST). By rearranging characters in a way that promotes repeated sequences, the BWT helps reduce redundancy and improves compression ratios.
Source code in the cpp programming language
#include <algorithm>
#include <iostream>
#include <vector>
const int STX = 0x02;
const int ETX = 0x03;
void rotate(std::string &a) {
char t = a[a.length() - 1];
for (int i = a.length() - 1; i > 0; i--) {
a[i] = a[i - 1];
}
a[0] = t;
}
std::string bwt(const std::string &s) {
for (char c : s) {
if (c == STX || c == ETX) {
throw std::runtime_error("Input can't contain STX or ETX");
}
}
std::string ss;
ss += STX;
ss += s;
ss += ETX;
std::vector<std::string> table;
for (size_t i = 0; i < ss.length(); i++) {
table.push_back(ss);
rotate(ss);
}
//table.sort();
std::sort(table.begin(), table.end());
std::string out;
for (auto &s : table) {
out += s[s.length() - 1];
}
return out;
}
std::string ibwt(const std::string &r) {
int len = r.length();
std::vector<std::string> table(len);
for (int i = 0; i < len; i++) {
for (int j = 0; j < len; j++) {
table[j] = r[j] + table[j];
}
std::sort(table.begin(), table.end());
}
for (auto &row : table) {
if (row[row.length() - 1] == ETX) {
return row.substr(1, row.length() - 2);
}
}
return {};
}
std::string makePrintable(const std::string &s) {
auto ls = s;
for (auto &c : ls) {
if (c == STX) {
c = '^';
} else if (c == ETX) {
c = '|';
}
}
return ls;
}
int main() {
auto tests = {
"banana",
"appellee",
"dogwood",
"TO BE OR NOT TO BE OR WANT TO BE OR NOT?",
"SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES",
"\u0002ABC\u0003"
};
for (auto &test : tests) {
std::cout << makePrintable(test) << "\n";
std::cout << " --> ";
std::string t;
try {
t = bwt(test);
std::cout << makePrintable(t) << "\n";
} catch (std::runtime_error &e) {
std::cout << "Error " << e.what() << "\n";
}
std::string r = ibwt(t);
std::cout << " --> " << r << "\n\n";
}
return 0;
}
You may also check:How to resolve the algorithm Find the missing permutation step by step in the Raku programming language
You may also check:How to resolve the algorithm Sorting algorithms/Shell sort step by step in the OCaml programming language
You may also check:How to resolve the algorithm Extend your language step by step in the Delphi programming language
You may also check:How to resolve the algorithm Stair-climbing puzzle step by step in the Groovy programming language
You may also check:How to resolve the algorithm Trigonometric functions step by step in the ooRexx programming language