How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the Haskell programming language

Problem Statement
Step by Step Solution
Sourcecode

Problem Statement

Given a string of characters A, C, G, and T representing a DNA sequence write a routine to mutate the sequence, (string) by:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the Haskell programming language

This code simulates the mutation of a DNA sequence by randomly applying one of three operations (swap, delete, or insert) to the sequence. The code is written in Haskell, a functional programming language known for its conciseness and purity.

The following is a detailed explanation of the code:

Data Structures:
- Mutation: Represents the type of mutation that can be applied to the DNA sequence. It can be Swap, Delete, or Insert.
- DNABase: Represents the type of DNA base. It can be A, C, G, or T.
- DNASequence: Represents the type of DNA sequence. It is a list of DNABase.
- Result: Represents the result of a mutation. It can be Swapped, which includes the mutation type, index, and the two bases that were swapped. It can also be InsertDeleted, which includes the mutation type, index, and the base that was inserted or deleted.
Random Generation: The code uses the Random and random functions from the System.Random module to generate random values.
- randomR (a, b) g generates a random value between a and b (inclusive) using the random generator g.
- random generates a random value using a randomly generated random generator.
Instance Declarations:
- Instance Random DNABase and Instance Random Mutation: These declarations make DNABase and Mutation instances of the Random type class. This allows us to generate random values of these types using the random function.
- Instance PrintfArg DNABase and Instance PrintfArg Mutation: These declarations make DNABase and Mutation instances of the PrintfArg type class. This allows us to use them as arguments to the printf function.
- Instance IsChar DNABase: This declaration makes DNABase an instance of the IsChar type class. This allows us to use DNABase values as characters in strings.
Auxiliary Functions:
- chunkedDNASequence ds: Splits the DNA sequence into chunks of 50 bases and returns a list of tuples containing the starting index of each chunk and the chunk itself.
- baseCounts ds: Counts the occurrences of each base in the DNA sequence and returns a list of tuples containing the base and its count.
Main Functions:
- newSequence n: Generates a new DNA sequence of length n using random values.
- mutateSequence ds: Mutates the DNA sequence ds by randomly selecting a mutation type, index, and (in case of insertion or deletion) a base. The function returns the result of the mutation and the mutated sequence.
- mutate n s: Mutates the DNA sequence s n times by repeatedly calling mutateSequence and printing the result of each mutation.
Main Program:
- Generates a new DNA sequence of length 200 and stores it in ds.
- Prints the initial DNA sequence and its base counts.
- Calls mutate to mutate the DNA sequence 10 times and stores the mutated sequence in ms.
- Prints the mutated DNA sequence and its base counts.
- Prints the total number of bases in the mutated sequence.

Source code in the haskell programming language

import Data.List       (group, sort)
import Data.List.Split (chunksOf)
import System.Random   (Random, randomR, random, newStdGen, randoms, getStdRandom)
import Text.Printf     (PrintfArg(..), fmtChar, fmtPrecision, formatString, IsChar(..), printf)

data Mutation = Swap | Delete | Insert deriving (Show, Eq, Ord, Enum, Bounded)
data DNABase = A | C | G | T deriving (Show, Read, Eq, Ord, Enum, Bounded)
type DNASequence = [DNABase]

data Result = Swapped Mutation Int (DNABase, DNABase)
            | InsertDeleted Mutation Int DNABase

instance Random DNABase where
  randomR (a, b) g = case randomR (fromEnum a, fromEnum b) g of (x, y) -> (toEnum x, y)
  random = randomR (minBound, maxBound)

instance Random Mutation where
  randomR (a, b) g = case randomR (fromEnum a, fromEnum b) g of (x, y) -> (toEnum x, y)
  random = randomR (minBound, maxBound)

instance PrintfArg DNABase where
  formatArg x fmt = formatString (show x) (fmt { fmtChar = 's', fmtPrecision = Nothing })

instance PrintfArg Mutation where
  formatArg x fmt = formatString (show x) (fmt { fmtChar = 's', fmtPrecision = Nothing })

instance IsChar DNABase where
  toChar = head . show
  fromChar = read . pure

chunkedDNASequence :: DNASequence -> [(Int, [DNABase])]
chunkedDNASequence = zip [50,100..] . chunksOf 50

baseCounts :: DNASequence -> [(DNABase, Int)]
baseCounts = fmap ((,) . head <*> length) . group . sort

newSequence :: Int -> IO DNASequence
newSequence n = take n . randoms <$> newStdGen

mutateSequence :: DNASequence -> IO (Result, DNASequence)
mutateSequence [] = fail "empty dna sequence"
mutateSequence ds = randomMutation >>= mutate ds
  where
    randomMutation = head . randoms <$> newStdGen
    mutate xs m = do
      i <- randomIndex (length xs)
      case m of
        Swap   -> randomDNA >>= \d -> pure (Swapped Swap i (xs !! pred i, d), swapElement i d xs)
        Insert -> randomDNA >>= \d -> pure (InsertDeleted Insert i d, insertElement i d xs)
        Delete -> pure (InsertDeleted Delete i (xs !! pred i), dropElement i xs)
      where
        dropElement i xs = take (pred i) xs <> drop i xs
        insertElement i e xs = take i xs <> [e] <> drop i xs
        swapElement i a xs = take (pred i) xs <> [a] <> drop i xs
        randomIndex n = getStdRandom (randomR (1, n))
        randomDNA = head . randoms <$> newStdGen

mutate :: Int -> DNASequence -> IO DNASequence
mutate 0 s = pure s
mutate n s = do
  (r, ms) <- mutateSequence s
  case r of
    Swapped m i (a, b)  -> printf "%6s @ %-3d : %s -> %s \n" m i a b
    InsertDeleted m i a -> printf "%6s @ %-3d : %s\n" m i a
  mutate (pred n) ms

main :: IO ()
main = do
  ds <- newSequence 200
  putStrLn "\nInitial Sequence:" >> showSequence ds
  putStrLn "\nBase Counts:" >> showBaseCounts ds
  showSumBaseCounts ds
  ms <- mutate 10 ds
  putStrLn "\nMutated Sequence:" >> showSequence ms
  putStrLn "\nBase Counts:" >> showBaseCounts ms
  showSumBaseCounts ms
  where
    showSequence   = mapM_ (uncurry (printf "%3d: %s\n")) . chunkedDNASequence
    showBaseCounts = mapM_ (uncurry (printf "%s: %3d\n")) . baseCounts
    showSumBaseCounts xs = putStrLn (replicate 6 '-') >> printf "Σ: %d\n\n" (length xs)

You may also check:How to resolve the algorithm Compiler/lexical analyzer step by step in the Flex programming language
You may also check:How to resolve the algorithm Operator precedence step by step in the Plain English programming language
You may also check:How to resolve the algorithm Check that file exists step by step in the BASIC programming language
You may also check:How to resolve the algorithm Hunt the Wumpus step by step in the Red programming language
You may also check:How to resolve the algorithm Arrays step by step in the ChucK programming language

How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the Haskell programming language

Table of Contents

Problem Statement

Step by Step solution about How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the Haskell programming language

Source code in the haskell programming language