How to resolve the algorithm Levenshtein distance/Alignment step by step in the Go programming language
How to resolve the algorithm Levenshtein distance/Alignment step by step in the Go programming language
Table of Contents
Problem Statement
The Levenshtein distance algorithm returns the number of atomic operations (insertion, deletion or edition) that must be performed on a string in order to obtain an other one, but it does not say anything about the actual operations used or their order. An alignment is a notation used to describe the operations used to turn a string into an other. At some point in the strings, the minus character ('-') is placed in order to signify that a character must be added at this very place. For instance, an alignment between the words 'place' and 'palace' is:
Write a function that shows the alignment of two strings for the corresponding levenshtein distance.
As an example, use the words "rosettacode" and "raisethysword".
You can either implement an algorithm, or use a dedicated library (thus showing us how it is named in your language).
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Levenshtein distance/Alignment step by step in the Go programming language
The provided Go program showcases the use of the biogo library for sequence alignment. It performs a Needleman-Wunsch alignment between two sequences and displays the results. Here's a detailed explanation:
-
Importing necessary modules: The program imports the required modules from the biogo library, including align for sequence alignment, ab for defining alphabets, feat for working with sequence features, and linear for creating linear sequence objects.
-
Defining custom alphabet: Since the example sequences use lowercase letters, a custom alphabet named
lc
is defined usingab.Must
. This alphabet allows only lowercase letters and hyphens, and considers hyphens as undefined characters with a value of 0. -
Creating the scoring matrix: The Needleman-Wunsch algorithm requires a scoring matrix. Here, a 26 x 26 matrix named
nw
is created for the custom alphabet. Each element in the matrix represents the cost of aligning two characters. The diagonal elements (indicating exact matches) are set to 0, while all other elements are set to -1 (indicating the cost of an edit operation). -
Defining input sequences: Two sequences named
a
andb
are created as linear sequences using the custom alphabetlc
. The sequences represent the strings "rosettacode" and "raisethysword," respectively. -
Performing sequence alignment: The
Align
method of thenw
scoring matrix is used to align sequencesa
andb
. It returns analign.Alignment
object containing the aligned sequences. -
Formatting and displaying the result: The
align.Format
function is used to format the aligned sequences alongside the original sequences (a
andb
), using hyphens (-) as padding characters. The results are printed to the console. -
Calculating and displaying the match/mismatch indicator: To further highlight the alignment, the program calculates and prints a third line below the aligned sequences, where spaces indicate matching characters, and vertical bars (
|
) indicate mismatches.
Overall, this program demonstrates how to perform a Needleman-Wunsch sequence alignment using the biogo library. It leverages the library's pre-built alphabets and scoring matrices to simplify the alignment process and analyze the similarity between sequences.
Source code in the go programming language
package main
import (
"fmt"
"github.com/biogo/biogo/align"
ab "github.com/biogo/biogo/alphabet"
"github.com/biogo/biogo/feat"
"github.com/biogo/biogo/seq/linear"
)
func main() {
// Alphabets for things like DNA are predefined in biogo, but we
// define our own here.
lc := ab.Must(ab.NewAlphabet("-abcdefghijklmnopqrstuvwxyz",
feat.Undefined, '-', 0, true))
// Construct scoring matrix for Needleman-Wunch algorithm.
// We leave zeros on the diagonal for the Levenshtein distance of an
// exact match and put -1s everywhere else for the Levenshtein distance
// of an edit.
nw := make(align.NW, lc.Len())
for i := range nw {
r := make([]int, lc.Len())
nw[i] = r
for j := range r {
if j != i {
r[j] = -1
}
}
}
// define input sequences
a := &linear.Seq{Seq: ab.BytesToLetters([]byte("rosettacode"))}
a.Alpha = lc
b := &linear.Seq{Seq: ab.BytesToLetters([]byte("raisethysword"))}
b.Alpha = lc
// perform alignment
aln, err := nw.Align(a, b)
// format and display result
if err != nil {
fmt.Println(err)
return
}
fa := align.Format(a, b, aln, '-')
fmt.Printf("%s\n%s\n", fa[0], fa[1])
aa := fmt.Sprint(fa[0])
ba := fmt.Sprint(fa[1])
ma := make([]byte, len(aa))
for i := range ma {
if aa[i] == ba[i] {
ma[i] = ' '
} else {
ma[i] = '|'
}
}
fmt.Println(string(ma))
}
You may also check:How to resolve the algorithm Greatest common divisor step by step in the Python programming language
You may also check:How to resolve the algorithm Terminal control/Preserve screen step by step in the Common Lisp programming language
You may also check:How to resolve the algorithm Doubly-linked list/Element definition step by step in the C# programming language
You may also check:How to resolve the algorithm Define a primitive data type step by step in the Ursala programming language
You may also check:How to resolve the algorithm 9 billion names of God the integer step by step in the Kotlin programming language