How to resolve the algorithm Burrows–Wheeler transform step by step in the C++ programming language

Published on 7 June 2024 03:52 AM

How to resolve the algorithm Burrows–Wheeler transform step by step in the C++ programming language

Table of Contents

Problem Statement

The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding. More importantly, the transformation is reversible, without needing to store any additional data. The BWT is thus a "free" method of improving the efficiency of text compression algorithms, costing only some extra computation.

Source: Burrows–Wheeler transform

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Burrows–Wheeler transform step by step in the C++ programming language

The provided C++ code demonstrates the Burrows-Wheeler Transform (BWT) and its inverse, the Inverse Burrows-Wheeler Transform (IBWT), which are used for data compression. Here's a breakdown of the code:

  1. Constants:

    • STX: Represents the Start-of-Text (SOH) character, used as a special marker at the beginning of the input string.
    • ETX: Represents the End-of-Text (EOT) character, used as a special marker at the end of the input string.
  2. rotate Function:

    • Rotates the characters of a string a by one position to the right, moving the last character to the beginning.
  3. bwt Function:

    • Takes a string s as input and applies the BWT algorithm:
      • Inserts the STX and ETX characters at the beginning and end of s, respectively.
      • Creates a table of rotated strings by repeatedly rotating ss by one position.
      • Sorts the table lexicographically.
      • Returns the last character of each sorted string, concatenated into a new string out.
  4. ibwt Function:

    • Takes a string r (the result of BWT) and applies the IBWT algorithm to recover the original string:
      • Initializes a table of strings of length len(r).
      • For each character in r, rotates all the strings in the table to the left and appends the character to the end of the string.
      • Sorts the table.
      • Identifies the row that ends with ETX, removes the STX and ETX characters, and returns the substring as the original string.
  5. makePrintable Function:

    • Converts special characters STX and ETX to printable characters '^' and '|' for display purposes.
  6. Main Function:

    • Defines test strings tests.
    • Iterates over the test strings and demonstrates the BWT and IBWT operations:
      • Displays the original string in printable format.
      • Applies the BWT and displays the transformed string.
      • Applies the IBWT and verifies if the original string is recovered.

Overall, this code illustrates the BWT and IBWT algorithms, which are commonly used in lossless data compression techniques like the Burrows-Wheeler Block Sorting Transform (BWT-BST). By rearranging characters in a way that promotes repeated sequences, the BWT helps reduce redundancy and improves compression ratios.

Source code in the cpp programming language

#include <algorithm>
#include <iostream>
#include <vector>

const int STX = 0x02;
const int ETX = 0x03;

void rotate(std::string &a) {
    char t = a[a.length() - 1];
    for (int i = a.length() - 1; i > 0; i--) {
        a[i] = a[i - 1];
    }
    a[0] = t;
}

std::string bwt(const std::string &s) {
    for (char c : s) {
        if (c == STX || c == ETX) {
            throw std::runtime_error("Input can't contain STX or ETX");
        }
    }

    std::string ss;
    ss += STX;
    ss += s;
    ss += ETX;

    std::vector<std::string> table;
    for (size_t i = 0; i < ss.length(); i++) {
        table.push_back(ss);
        rotate(ss);
    }
    //table.sort();
    std::sort(table.begin(), table.end());

    std::string out;
    for (auto &s : table) {
        out += s[s.length() - 1];
    }
    return out;
}

std::string ibwt(const std::string &r) {
    int len = r.length();
    std::vector<std::string> table(len);
    for (int i = 0; i < len; i++) {
        for (int j = 0; j < len; j++) {
            table[j] = r[j] + table[j];
        }
        std::sort(table.begin(), table.end());
    }
    for (auto &row : table) {
        if (row[row.length() - 1] == ETX) {
            return row.substr(1, row.length() - 2);
        }
    }
    return {};
}

std::string makePrintable(const std::string &s) {
    auto ls = s;
    for (auto &c : ls) {
        if (c == STX) {
            c = '^';
        } else if (c == ETX) {
            c = '|';
        }
    }
    return ls;
}

int main() {
    auto tests = {
        "banana",
        "appellee",
        "dogwood",
        "TO BE OR NOT TO BE OR WANT TO BE OR NOT?",
        "SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES",
        "\u0002ABC\u0003"
    };

    for (auto &test : tests) {
        std::cout << makePrintable(test) << "\n";
        std::cout << " --> ";

        std::string t;
        try {
            t = bwt(test);
            std::cout << makePrintable(t) << "\n";
        } catch (std::runtime_error &e) {
            std::cout << "Error " << e.what() << "\n";
        }

        std::string r = ibwt(t);
        std::cout << " --> " << r << "\n\n";
    }

    return 0;
}


  

You may also check:How to resolve the algorithm Find the missing permutation step by step in the Raku programming language
You may also check:How to resolve the algorithm Sorting algorithms/Shell sort step by step in the OCaml programming language
You may also check:How to resolve the algorithm Extend your language step by step in the Delphi programming language
You may also check:How to resolve the algorithm Stair-climbing puzzle step by step in the Groovy programming language
You may also check:How to resolve the algorithm Trigonometric functions step by step in the ooRexx programming language