How to resolve the algorithm Burrows–Wheeler transform step by step in the Julia programming language
How to resolve the algorithm Burrows–Wheeler transform step by step in the Julia programming language
Table of Contents
Problem Statement
The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding. More importantly, the transformation is reversible, without needing to store any additional data. The BWT is thus a "free" method of improving the efficiency of text compression algorithms, costing only some extra computation.
Source: Burrows–Wheeler transform
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Burrows–Wheeler transform step by step in the Julia programming language
Burrows-Wheeler Transform (BWT) is a lossless data compression algorithm that is commonly used as a preprocessing step for other compression algorithms. It works by rearranging the characters of a string in a specific way, such that the resulting string has a higher probability of containing repeated substrings.
This Julia code implements both the encoding and decoding functions for the Burrows-Wheeler Transform.
Encoding Function
function burrowswheeler_encode(s)
if match(r"\x02|\x03", s) != nothing
throw("String for Burrows-Wheeler input cannot contain STX or ETX")
end
s = "\x02" * s * "\x03"
String([t[end] for t in bwsort([circshift([c for c in s], n) for n in 0:length(s)-1])])
end
The burrowswheeler_encode
function takes a string s
as input and returns the encoded string. It first checks if the input string contains the special characters STX
or ETX
, which are used to mark the beginning and end of the transformed string. If these characters are present, an exception is thrown.
Next, the input string is surrounded by STX
and ETX
characters, and each character is converted to a byte array. The bwsort
function is then used to sort all possible circular shifts of the input string. The last character of each sorted circular shift is extracted and concatenated to form the encoded string.
Decoding Function
function burrowswheeler_decode(s)
len, v = length(s), [c for c in s]
m = fill(' ', len, len)
for col in len:-1:1
m[:, col] .= v
for (i, row) in enumerate(bwsort([collect(r) for r in eachrow(m)]))
m[i, :] .= row
end
end
String(m[findfirst(row -> m[row, end] == '\x03', 1:len), 2:end-1])
end
The burrowswheeler_decode
function takes the encoded string s
as input and returns the original string. It first initializes a square matrix m
with empty spaces, where the number of rows and columns is equal to the length of the input string.
The encoded string is then converted to a byte array, and each character is copied into the last column of m
. The matrix is then sorted by rows, and the sorted rows are copied back into m
.
Finally, the original string is extracted by finding the row that ends with the ETX
character and removing the STX
and ETX
characters from the resulting string.
Example
The following example shows how to use the burrowswheeler_encode
and burrowswheeler_decode
functions to transform and inverse transform a string:
println("Original: ", s, "\nTransformation: ", burrowswheeler_encode(s),
"\nInverse transformation: ", burrowswheeler_decode(burrowswheeler_encode(s)), "\n")
The output for the input string "BANANA" is:
Original: BANANA
Transformation: NAANAB
Inverse transformation: BANANA
As you can see, the transformed string "NAANAB" is different from the original string "BANANA", but when it is decoded, it returns the original string.
Source code in the julia programming language
bwsort(vec) = sort(vec, lt = (a, b) -> string(a) < string(b))
function burrowswheeler_encode(s)
if match(r"\x02|\x03", s) != nothing
throw("String for Burrows-Wheeler input cannot contain STX or ETX")
end
s = "\x02" * s * "\x03"
String([t[end] for t in bwsort([circshift([c for c in s], n) for n in 0:length(s)-1])])
end
function burrowswheeler_decode(s)
len, v = length(s), [c for c in s]
m = fill(' ', len, len)
for col in len:-1:1
m[:, col] .= v
for (i, row) in enumerate(bwsort([collect(r) for r in eachrow(m)]))
m[i, :] .= row
end
end
String(m[findfirst(row -> m[row, end] == '\x03', 1:len), 2:end-1])
end
for s in ["BANANA", "dogwood", "SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES",
"TO BE OR NOT TO BE OR WANT TO BE OR NOT?", "Oops\x02"]
println("Original: ", s, "\nTransformation: ", burrowswheeler_encode(s),
"\nInverse transformation: ", burrowswheeler_decode(burrowswheeler_encode(s)), "\n")
end
You may also check:How to resolve the algorithm Sierpinski carpet step by step in the Erlang programming language
You may also check:How to resolve the algorithm N'th step by step in the REXX programming language
You may also check:How to resolve the algorithm Equilibrium index step by step in the Batch File programming language
You may also check:How to resolve the algorithm Terminal control/Cursor positioning step by step in the Racket programming language
You may also check:How to resolve the algorithm Factorial step by step in the Groovy programming language