How to resolve the algorithm Bioinformatics/base count step by step in the jq programming language

Published on 12 May 2024 09:40 PM
#Jq

How to resolve the algorithm Bioinformatics/base count step by step in the jq programming language

Table of Contents

Problem Statement

Given this string representing ordered DNA bases:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Bioinformatics/base count step by step in the jq programming language

Source code in the jq programming language

def lpad($len; $fill): tostring | ($len - length) as $l | ($fill * $l)[:$l] + .;

# Create a bag of words, i.e. a JSON object with counts of the items in the stream
def bow(stream): 
  reduce stream as $word ({}; .[($word|tostring)] += 1);

def read_seq:
  reduce inputs as $line (""; . + $line);

# Emit a bow of the letters in the input string
def counts:
  . as $in | bow(range(0;length) | $in[.:.+1]);

def pp_counts:
  "BASE COUNTS:",
   (counts | to_entries | sort[] | "    \(.key):  \(.value | lpad(6;" "))"),
   "Total: \(length|lpad(7;" "))" ;

def pp_sequence($cols):
  range(0; length / $cols) as $i
    | "\($i*$cols | lpad(5; " ")): " +  .[ $i * $cols : ($i+1) * $cols] ;

read_seq | pp_sequence(50), "", pp_counts

    0: CGTAAAAAATTACAACGTCCTTTGGCTATCTCTTAAACTCCTGCTAAATG
   50: CTCGTGCTTTCCAATTATGTAAGCGTTCCGAGACGGGGTGGTCGATTCTG
  100: AGGACAAAGGTCAAGATGGAGCGCATCGAACGCAATAAGGATCATTTGAT
  150: GGGACGTTTCGTCGACAAAGTCTTGTTTCGAGAGTAACGGCTACCGTCTT
  200: CGATTCTGCTTATAACACTATGTTCTTATGAAATGGATGTTCTGAGTTGG
  250: TCAGTCCCAATGTGCGGGGTTTCTTTTAGTACGTCGGGAGTGGTATTATA
  300: TTTAATTTTTCTATATAGCGATCTGTATTTAAGCAATTCATTTAGGTTAT
  350: CGCCGCGATGCTCGGTTCGGACCGCCAAGCATCTGGCTCCACTGCTAGTG
  400: TCCTAAATTTGAATGGCAAACACAAATAAGATTTAGCAATTCGTGTAGAC
  450: GACCGGGGACTTGCATGATGGGAGCAGCTTTGTTAAACTACGAACGTAAT

BASE COUNTS:
    A:     129
    C:      97
    G:     119
    T:     155
Total:     500


def lpad($len; $fill): tostring | ($len - length) as $l | ($fill * $l)[:$l] + .;

# "bow" = bag of words, i.e. a JSON object with counts
# Input: a bow or null
# Output: augmented bow
def bow(stream): 
  reduce stream as $word (.; .[($word|tostring)] += 1);

# The main function ignores its input in favor of `stream`:
def report(stream; $cols):

  # input: a string, possibly longer than $cols
  def pp_sequence($start):
  range(0; length / $cols) as $i
    | "\($start + ($i*$cols) | lpad(5; " ")): " +  .[ $i * $cols : ($i+1) * $cols] ;

  # input: a bow
  def pp_counts:
    "BASE COUNTS:",
     (to_entries | sort[] | "    \(.key):  \(.value | lpad(6;" "))"),
     "Total: \( [.[]] | add | lpad(7;" "))" ;

  # state: {bow, emit, pending, start}
  foreach (stream,null) as $line ({start: - $cols};
    .start += $cols
    | if $line == null
      then .emit = .pending
      else .bow |= bow(range(0; $line|length) | $line[.:.+1])
      | (($line|length) + (.pending|length) ) as $len
      | if $len >= $cols
        then (.pending + $line) as $new
        | .emit = $new[:$cols]
        | .pending = $new[$cols:]
        else .pending = $line
        end
      end;
    (select(.emit|length > 0) | .start as $start | .emit | pp_sequence($start)),
    (select($line == null) | "", (.bow|pp_counts) ) )
    ;

# To illustrate reformatting:
report(inputs; 33)

    0: CGTAAAAAATTACAACGTCCTTTGGCTATCTCT
   33: TAAACTCCTGCTAAATGCTCGTGCTTTCCAATT
   66: ATGTAAGCGTTCCGAGACGGGGTGGTCGATTCT
   99: GAGGACAAAGGTCAAGATGGAGCGCATCGAACG
  132: CAATAAGGATCATTTGATGGGACGTTTCGTCGA
  165: CAAAGTCTTGTTTCGAGAGTAACGGCTACCGTC
  198: TTCGATTCTGCTTATAACACTATGTTCTTATGA
  231: AATGGATGTTCTGAGTTGGTCAGTCCCAATGTG
  264: CGGGGTTTCTTTTAGTACGTCGGGAGTGGTATT
  297: ATATTTAATTTTTCTATATAGCGATCTGTATTT
  330: AAGCAATTCATTTAGGTTATCGCCGCGATGCTC
  363: GGTTCGGACCGCCAAGCATCTGGCTCCACTGCT
  396: AGTGTCCTAAATTTGAATGGCAAACACAAATAA
  429: GATTTAGCAATTCGTGTAGACGACCGGGGACTT
  462: GCATGATGGGAGCAGCTTTGTTAAACTACGAAC
  495: GTAAT

BASE COUNTS:
    A:     129
    C:      97
    G:     119
    T:     155
Total:     500


  

You may also check:How to resolve the algorithm Elliptic curve arithmetic step by step in the Racket programming language
You may also check:How to resolve the algorithm Stack step by step in the Frink programming language
You may also check:How to resolve the algorithm Monte Carlo methods step by step in the Phix programming language
You may also check:How to resolve the algorithm Digital root step by step in the OCaml programming language
You may also check:How to resolve the algorithm Super-Poulet numbers step by step in the Phix programming language