How to resolve the algorithm Burrows–Wheeler transform step by step in the C programming language

Published on 7 June 2024 03:52 AM
#C

How to resolve the algorithm Burrows–Wheeler transform step by step in the C programming language

Table of Contents

Problem Statement

The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding. More importantly, the transformation is reversible, without needing to store any additional data. The BWT is thus a "free" method of improving the efficiency of text compression algorithms, costing only some extra computation.

Source: Burrows–Wheeler transform

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Burrows–Wheeler transform step by step in the C programming language

This C program demonstrates the Burrows-Wheeler Transform (BWT) and its inverse (IBWT), which are important techniques in lossless data compression. Here's a detailed explanation of the code:

  • Constants and Data Structures:

    • const char STX = '\002', ETX = '\003': These are constants representing special characters start-of-text (STX) and end-of-text (ETX).
    • int compareStrings(const void *a, const void *b): A function that compares two strings (used for sorting).
    • char **table: A pointer to an array of strings used in the BWT and IBWT algorithms.
  • BWT (Burrows-Wheeler Transform):

    • int bwt(const char *s, char r[]): This function performs the BWT on the input string s and stores the result in r.
    • It first creates a list of all circular rotations of s (cyclic shifts).
    • It sorts the list of rotations lexicographically.
    • The output string r is constructed by concatenating the last characters of the sorted rotations.
  • IBWT (Inverse Burrows-Wheeler Transform):

    • void ibwt(const char *r, char s[]): This function performs the inverse BWT to recover the original string s from the transformed string r.
    • It creates a table of strings initialized with r.
    • Each column of the table is sorted.
    • The original string s is extracted by finding the row where the last character is ETX.
  • Auxiliary Functions:

    • void makePrintable(const char *s, char t[]): This function converts special characters (STX and ETX) to human-readable characters (^) and (|).
  • Main Function:

    • It declares an array of test strings, tests.
    • For each test string:
      • It prints the original string.
      • Calls bwt to transform it and prints the result.
      • Calls ibwt to recover the original string and prints it.
  • Example Usage:

    • The program prints the original string, its BWT-transformed version, and the recovered string for each test case.
  • Note:

    • The program checks for special characters (STX or ETX) in the input string before performing the BWT. If found, it reports an error because these characters can interfere with the algorithm.

Source code in the c programming language

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

const char STX = '\002', ETX = '\003';

int compareStrings(const void *a, const void *b) {
    char *aa = *(char **)a;
    char *bb = *(char **)b;
    return strcmp(aa, bb);
}

int bwt(const char *s, char r[]) {
    int i, len = strlen(s) + 2;
    char *ss, *str;
    char **table;
    if (strchr(s, STX) || strchr(s, ETX)) return 1;
    ss = calloc(len + 1, sizeof(char));
    sprintf(ss, "%c%s%c", STX, s, ETX);
    table = malloc(len * sizeof(const char *));
    for (i = 0; i < len; ++i) {
        str = calloc(len + 1, sizeof(char));
        strcpy(str, ss + i);
        if (i > 0) strncat(str, ss, i);
        table[i] = str;
    }
    qsort(table, len, sizeof(const char *), compareStrings);
    for(i = 0; i < len; ++i) {
        r[i] = table[i][len - 1];
        free(table[i]);
    }
    free(table);
    free(ss);
    return 0;
}

void ibwt(const char *r, char s[]) {
    int i, j, len = strlen(r);
    char **table = malloc(len * sizeof(const char *));
    for (i = 0; i < len; ++i) table[i] = calloc(len + 1, sizeof(char));
    for (i = 0; i < len; ++i) {
        for (j = 0; j < len; ++j) {                        
            memmove(table[j] + 1, table[j], len);
            table[j][0] = r[j];
        }
        qsort(table, len, sizeof(const char *), compareStrings);
    }
    for (i = 0; i < len; ++i) {
        if (table[i][len - 1] == ETX) {
            strncpy(s, table[i] + 1, len - 2);
            break;
        }
    }
    for (i = 0; i < len; ++i) free(table[i]);
    free(table);
}

void makePrintable(const char *s, char t[]) {
    strcpy(t, s);
    for ( ; *t != '\0'; ++t) {
        if (*t == STX) *t = '^';
        else if (*t == ETX) *t = '|';
    }
}

int main() {
    int i, res, len;
    char *tests[6], *t, *r, *s;
    tests[0] = "banana";
    tests[1] = "appellee";
    tests[2] = "dogwood";
    tests[3] = "TO BE OR NOT TO BE OR WANT TO BE OR NOT?";
    tests[4] = "SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES",
    tests[5] = "\002ABC\003";
    for (i = 0; i < 6; ++i) {
        len = strlen(tests[i]);
        t = calloc(len + 1, sizeof(char));
        makePrintable(tests[i], t);
        printf("%s\n", t);
        printf(" --> ");
        r = calloc(len + 3, sizeof(char));
        res = bwt(tests[i], r);
        if (res == 1) {
            printf("ERROR: String can't contain STX or ETX\n");
        }
        else {
            makePrintable(r, t);
            printf("%s\n", t);
        }
        s = calloc(len + 1, sizeof(char));
        ibwt(r, s);
        makePrintable(s, t);
        printf(" --> %s\n\n", t);
        free(t);
        free(r);
        free(s);
    }
    return 0;
}


  

You may also check:How to resolve the algorithm Zebra puzzle step by step in the C programming language
You may also check:How to resolve the algorithm Babbage problem step by step in the Nim programming language
You may also check:How to resolve the algorithm System time step by step in the Pike programming language
You may also check:How to resolve the algorithm Hash join step by step in the jq programming language
You may also check:How to resolve the algorithm Calculating the value of e step by step in the BQN programming language