How to resolve the algorithm Burrows–Wheeler transform step by step in the Julia programming language

Published on 22 June 2024 08:30 PM

How to resolve the algorithm Burrows–Wheeler transform step by step in the Julia programming language

Table of Contents

Problem Statement

The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding. More importantly, the transformation is reversible, without needing to store any additional data. The BWT is thus a "free" method of improving the efficiency of text compression algorithms, costing only some extra computation.

Source: Burrows–Wheeler transform

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Burrows–Wheeler transform step by step in the Julia programming language

Burrows-Wheeler Transform (BWT) is a lossless data compression algorithm that is commonly used as a preprocessing step for other compression algorithms. It works by rearranging the characters of a string in a specific way, such that the resulting string has a higher probability of containing repeated substrings.

This Julia code implements both the encoding and decoding functions for the Burrows-Wheeler Transform.

Encoding Function

function burrowswheeler_encode(s)
   if match(r"\x02|\x03", s) != nothing
       throw("String for Burrows-Wheeler input cannot contain STX or ETX")
   end
   s = "\x02" * s * "\x03"
   String([t[end] for t in bwsort([circshift([c for c in s], n) for n in 0:length(s)-1])])
end

The burrowswheeler_encode function takes a string s as input and returns the encoded string. It first checks if the input string contains the special characters STX or ETX, which are used to mark the beginning and end of the transformed string. If these characters are present, an exception is thrown.

Next, the input string is surrounded by STX and ETX characters, and each character is converted to a byte array. The bwsort function is then used to sort all possible circular shifts of the input string. The last character of each sorted circular shift is extracted and concatenated to form the encoded string.

Decoding Function

function burrowswheeler_decode(s)
   len, v = length(s), [c for c in s]
   m = fill(' ', len, len)
   for col in len:-1:1
       m[:, col] .= v
       for (i, row) in enumerate(bwsort([collect(r) for r in eachrow(m)]))
           m[i, :] .= row
       end
   end
   String(m[findfirst(row -> m[row, end] == '\x03', 1:len), 2:end-1])
end

The burrowswheeler_decode function takes the encoded string s as input and returns the original string. It first initializes a square matrix m with empty spaces, where the number of rows and columns is equal to the length of the input string.

The encoded string is then converted to a byte array, and each character is copied into the last column of m. The matrix is then sorted by rows, and the sorted rows are copied back into m.

Finally, the original string is extracted by finding the row that ends with the ETX character and removing the STX and ETX characters from the resulting string.

Example The following example shows how to use the burrowswheeler_encode and burrowswheeler_decode functions to transform and inverse transform a string:

println("Original: ", s, "\nTransformation: ", burrowswheeler_encode(s),
       "\nInverse transformation: ", burrowswheeler_decode(burrowswheeler_encode(s)), "\n")

The output for the input string "BANANA" is:

Original: BANANA
Transformation: NAANAB
Inverse transformation: BANANA

As you can see, the transformed string "NAANAB" is different from the original string "BANANA", but when it is decoded, it returns the original string.

Source code in the julia programming language

bwsort(vec) = sort(vec, lt = (a, b) -> string(a) < string(b))

function burrowswheeler_encode(s)
    if match(r"\x02|\x03", s) != nothing
        throw("String for Burrows-Wheeler input cannot contain STX or ETX")
    end
    s = "\x02" * s * "\x03"
    String([t[end] for t in bwsort([circshift([c for c in s], n) for n in 0:length(s)-1])])
end
 
function burrowswheeler_decode(s)
    len, v = length(s), [c for c in s]
    m = fill(' ', len, len)
    for col in len:-1:1
        m[:, col] .= v
        for (i, row) in enumerate(bwsort([collect(r) for r in eachrow(m)]))
            m[i, :] .= row
        end
    end
    String(m[findfirst(row -> m[row, end] == '\x03', 1:len), 2:end-1])
end

for s in ["BANANA", "dogwood", "SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES",
    "TO BE OR NOT TO BE OR WANT TO BE OR NOT?", "Oops\x02"]
    println("Original: ", s, "\nTransformation: ", burrowswheeler_encode(s),
        "\nInverse transformation: ", burrowswheeler_decode(burrowswheeler_encode(s)), "\n")
end


  

You may also check:How to resolve the algorithm Sierpinski carpet step by step in the Erlang programming language
You may also check:How to resolve the algorithm N'th step by step in the REXX programming language
You may also check:How to resolve the algorithm Equilibrium index step by step in the Batch File programming language
You may also check:How to resolve the algorithm Terminal control/Cursor positioning step by step in the Racket programming language
You may also check:How to resolve the algorithm Factorial step by step in the Groovy programming language