How to resolve the algorithm FASTA format step by step in the Haskell programming language

Problem Statement
Step by Step Solution
Sourcecode

Problem Statement

In bioinformatics, long character strings are often encoded in a format called FASTA.
A FASTA file can contain several strings, each identified by a name marked by a > (greater than) character at the beginning of the line.

Write a program that reads a FASTA file such as: Note that a high-quality implementation will not hold the entire file in memory at once; real FASTA files can be multiple gigabytes in size.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm FASTA format step by step in the Haskell programming language

First code:
- The function parseFasta takes a file name as an argument and reads the file using the readFile function.
- The readFasta function parses the file contents into a list of tuples, where each tuple contains a sequence name and a sequence code.
- The groupBy function is used to group the lines of the file by whether they start with a > character (indicating the start of a new sequence) or not.
- The pair function is used to combine the grouped lines into tuples.
- The mapM_ function is used to apply the putStrLn function to each tuple, which prints the sequence name and code to the standard output.
Second code:
- The function parseFasta takes a file name as an argument and reads the file using the readFile function.
- The readP_to_S function is used to parse the file contents using the readFasta parser.
- The readFasta parser is defined using the ReadP library, which provides a convenient way to define parsers for structured data.
- The pair parser combines the name and code parsers into a single parser that returns a tuple.
- The name parser matches a > character followed by a sequence of alphanumeric characters and underscores.
- The code parser matches a sequence of lines, each of which contains a sequence of alphabetic characters.
- The newline parser matches a newline character.

Source code in the haskell programming language

import Data.List ( groupBy )

parseFasta :: FilePath -> IO ()
parseFasta fileName = do
  file <- readFile fileName
  let pairedFasta = readFasta $ lines file
  mapM_ (\(name, code) -> putStrLn $ name ++ ": " ++ code) pairedFasta

readFasta :: [String] -> [(String, String)]
readFasta = pair . map concat . groupBy (\x y -> notName x && notName y)
 where
  notName :: String -> Bool
  notName = (/=) '>' . head

  pair :: [String] -> [(String, String)]
  pair []           = []
  pair (x : y : xs) = (drop 1 x, y) : pair xs


import Text.ParserCombinators.ReadP
import Control.Applicative ( (<|>) )
import Data.Char ( isAlpha, isAlphaNum )

parseFasta :: FilePath -> IO ()
parseFasta fileName = do
  file <- readFile fileName
  let pairs = fst . last . readP_to_S readFasta $ file
  mapM_ (\(name, code) -> putStrLn $ name ++ ": " ++ code) pairs


readFasta :: ReadP [(String, String)]
readFasta = many pair <* eof
 where
  pair    = (,) <$> name <*> code
  name    = char '>' *> many (satisfy isAlphaNum <|> char '_') <* newline
  code    = concat <$> many (many (satisfy isAlpha) <* newline)
  newline = char '\n'

You may also check:How to resolve the algorithm Rosetta Code/Rank languages by number of users step by step in the zkl programming language
You may also check:How to resolve the algorithm Keyboard input/Keypress check step by step in the Axe programming language
You may also check:How to resolve the algorithm Gaussian elimination step by step in the PL/I programming language
You may also check:How to resolve the algorithm Date format step by step in the Ruby programming language
You may also check:How to resolve the algorithm Remove duplicate elements step by step in the Bracmat programming language

How to resolve the algorithm FASTA format step by step in the Haskell programming language

Table of Contents

Problem Statement

Step by Step solution about How to resolve the algorithm FASTA format step by step in the Haskell programming language

Source code in the haskell programming language