How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the Haskell programming language
Published on 7 June 2024 03:52 AM
How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the Haskell programming language
Table of Contents
Problem Statement
Given a string of characters A, C, G, and T representing a DNA sequence write a routine to mutate the sequence, (string) by:
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the Haskell programming language
This code simulates the mutation of a DNA sequence by randomly applying one of three operations (swap, delete, or insert) to the sequence. The code is written in Haskell, a functional programming language known for its conciseness and purity.
The following is a detailed explanation of the code:
-
Data Structures:
Mutation
: Represents the type of mutation that can be applied to the DNA sequence. It can beSwap
,Delete
, orInsert
.DNABase
: Represents the type of DNA base. It can beA
,C
,G
, orT
.DNASequence
: Represents the type of DNA sequence. It is a list ofDNABase
.Result
: Represents the result of a mutation. It can beSwapped
, which includes the mutation type, index, and the two bases that were swapped. It can also beInsertDeleted
, which includes the mutation type, index, and the base that was inserted or deleted.
-
Random Generation: The code uses the
Random
andrandom
functions from theSystem.Random
module to generate random values.randomR (a, b) g
generates a random value betweena
andb
(inclusive) using the random generatorg
.random
generates a random value using a randomly generated random generator.
-
Instance Declarations:
Instance Random DNABase
andInstance Random Mutation
: These declarations makeDNABase
andMutation
instances of theRandom
type class. This allows us to generate random values of these types using therandom
function.Instance PrintfArg DNABase
andInstance PrintfArg Mutation
: These declarations makeDNABase
andMutation
instances of thePrintfArg
type class. This allows us to use them as arguments to theprintf
function.Instance IsChar DNABase
: This declaration makesDNABase
an instance of theIsChar
type class. This allows us to useDNABase
values as characters in strings.
-
Auxiliary Functions:
chunkedDNASequence ds
: Splits the DNA sequence into chunks of 50 bases and returns a list of tuples containing the starting index of each chunk and the chunk itself.baseCounts ds
: Counts the occurrences of each base in the DNA sequence and returns a list of tuples containing the base and its count.
-
Main Functions:
newSequence n
: Generates a new DNA sequence of lengthn
using random values.mutateSequence ds
: Mutates the DNA sequenceds
by randomly selecting a mutation type, index, and (in case of insertion or deletion) a base. The function returns the result of the mutation and the mutated sequence.mutate n s
: Mutates the DNA sequences
n
times by repeatedly callingmutateSequence
and printing the result of each mutation.
-
Main Program:
- Generates a new DNA sequence of length 200 and stores it in
ds
. - Prints the initial DNA sequence and its base counts.
- Calls
mutate
to mutate the DNA sequence10
times and stores the mutated sequence inms
. - Prints the mutated DNA sequence and its base counts.
- Prints the total number of bases in the mutated sequence.
- Generates a new DNA sequence of length 200 and stores it in
Source code in the haskell programming language
import Data.List (group, sort)
import Data.List.Split (chunksOf)
import System.Random (Random, randomR, random, newStdGen, randoms, getStdRandom)
import Text.Printf (PrintfArg(..), fmtChar, fmtPrecision, formatString, IsChar(..), printf)
data Mutation = Swap | Delete | Insert deriving (Show, Eq, Ord, Enum, Bounded)
data DNABase = A | C | G | T deriving (Show, Read, Eq, Ord, Enum, Bounded)
type DNASequence = [DNABase]
data Result = Swapped Mutation Int (DNABase, DNABase)
| InsertDeleted Mutation Int DNABase
instance Random DNABase where
randomR (a, b) g = case randomR (fromEnum a, fromEnum b) g of (x, y) -> (toEnum x, y)
random = randomR (minBound, maxBound)
instance Random Mutation where
randomR (a, b) g = case randomR (fromEnum a, fromEnum b) g of (x, y) -> (toEnum x, y)
random = randomR (minBound, maxBound)
instance PrintfArg DNABase where
formatArg x fmt = formatString (show x) (fmt { fmtChar = 's', fmtPrecision = Nothing })
instance PrintfArg Mutation where
formatArg x fmt = formatString (show x) (fmt { fmtChar = 's', fmtPrecision = Nothing })
instance IsChar DNABase where
toChar = head . show
fromChar = read . pure
chunkedDNASequence :: DNASequence -> [(Int, [DNABase])]
chunkedDNASequence = zip [50,100..] . chunksOf 50
baseCounts :: DNASequence -> [(DNABase, Int)]
baseCounts = fmap ((,) . head <*> length) . group . sort
newSequence :: Int -> IO DNASequence
newSequence n = take n . randoms <$> newStdGen
mutateSequence :: DNASequence -> IO (Result, DNASequence)
mutateSequence [] = fail "empty dna sequence"
mutateSequence ds = randomMutation >>= mutate ds
where
randomMutation = head . randoms <$> newStdGen
mutate xs m = do
i <- randomIndex (length xs)
case m of
Swap -> randomDNA >>= \d -> pure (Swapped Swap i (xs !! pred i, d), swapElement i d xs)
Insert -> randomDNA >>= \d -> pure (InsertDeleted Insert i d, insertElement i d xs)
Delete -> pure (InsertDeleted Delete i (xs !! pred i), dropElement i xs)
where
dropElement i xs = take (pred i) xs <> drop i xs
insertElement i e xs = take i xs <> [e] <> drop i xs
swapElement i a xs = take (pred i) xs <> [a] <> drop i xs
randomIndex n = getStdRandom (randomR (1, n))
randomDNA = head . randoms <$> newStdGen
mutate :: Int -> DNASequence -> IO DNASequence
mutate 0 s = pure s
mutate n s = do
(r, ms) <- mutateSequence s
case r of
Swapped m i (a, b) -> printf "%6s @ %-3d : %s -> %s \n" m i a b
InsertDeleted m i a -> printf "%6s @ %-3d : %s\n" m i a
mutate (pred n) ms
main :: IO ()
main = do
ds <- newSequence 200
putStrLn "\nInitial Sequence:" >> showSequence ds
putStrLn "\nBase Counts:" >> showBaseCounts ds
showSumBaseCounts ds
ms <- mutate 10 ds
putStrLn "\nMutated Sequence:" >> showSequence ms
putStrLn "\nBase Counts:" >> showBaseCounts ms
showSumBaseCounts ms
where
showSequence = mapM_ (uncurry (printf "%3d: %s\n")) . chunkedDNASequence
showBaseCounts = mapM_ (uncurry (printf "%s: %3d\n")) . baseCounts
showSumBaseCounts xs = putStrLn (replicate 6 '-') >> printf "Σ: %d\n\n" (length xs)
You may also check:How to resolve the algorithm Compiler/lexical analyzer step by step in the Flex programming language
You may also check:How to resolve the algorithm Operator precedence step by step in the Plain English programming language
You may also check:How to resolve the algorithm Check that file exists step by step in the BASIC programming language
You may also check:How to resolve the algorithm Hunt the Wumpus step by step in the Red programming language
You may also check:How to resolve the algorithm Arrays step by step in the ChucK programming language