How to resolve the algorithm Bioinformatics/base count step by step in the jq programming language
Published on 12 May 2024 09:40 PM
How to resolve the algorithm Bioinformatics/base count step by step in the jq programming language
Table of Contents
Problem Statement
Given this string representing ordered DNA bases:
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Bioinformatics/base count step by step in the jq programming language
Source code in the jq programming language
def lpad($len; $fill): tostring | ($len - length) as $l | ($fill * $l)[:$l] + .;
# Create a bag of words, i.e. a JSON object with counts of the items in the stream
def bow(stream):
reduce stream as $word ({}; .[($word|tostring)] += 1);
def read_seq:
reduce inputs as $line (""; . + $line);
# Emit a bow of the letters in the input string
def counts:
. as $in | bow(range(0;length) | $in[.:.+1]);
def pp_counts:
"BASE COUNTS:",
(counts | to_entries | sort[] | " \(.key): \(.value | lpad(6;" "))"),
"Total: \(length|lpad(7;" "))" ;
def pp_sequence($cols):
range(0; length / $cols) as $i
| "\($i*$cols | lpad(5; " ")): " + .[ $i * $cols : ($i+1) * $cols] ;
read_seq | pp_sequence(50), "", pp_counts
0: CGTAAAAAATTACAACGTCCTTTGGCTATCTCTTAAACTCCTGCTAAATG
50: CTCGTGCTTTCCAATTATGTAAGCGTTCCGAGACGGGGTGGTCGATTCTG
100: AGGACAAAGGTCAAGATGGAGCGCATCGAACGCAATAAGGATCATTTGAT
150: GGGACGTTTCGTCGACAAAGTCTTGTTTCGAGAGTAACGGCTACCGTCTT
200: CGATTCTGCTTATAACACTATGTTCTTATGAAATGGATGTTCTGAGTTGG
250: TCAGTCCCAATGTGCGGGGTTTCTTTTAGTACGTCGGGAGTGGTATTATA
300: TTTAATTTTTCTATATAGCGATCTGTATTTAAGCAATTCATTTAGGTTAT
350: CGCCGCGATGCTCGGTTCGGACCGCCAAGCATCTGGCTCCACTGCTAGTG
400: TCCTAAATTTGAATGGCAAACACAAATAAGATTTAGCAATTCGTGTAGAC
450: GACCGGGGACTTGCATGATGGGAGCAGCTTTGTTAAACTACGAACGTAAT
BASE COUNTS:
A: 129
C: 97
G: 119
T: 155
Total: 500
def lpad($len; $fill): tostring | ($len - length) as $l | ($fill * $l)[:$l] + .;
# "bow" = bag of words, i.e. a JSON object with counts
# Input: a bow or null
# Output: augmented bow
def bow(stream):
reduce stream as $word (.; .[($word|tostring)] += 1);
# The main function ignores its input in favor of `stream`:
def report(stream; $cols):
# input: a string, possibly longer than $cols
def pp_sequence($start):
range(0; length / $cols) as $i
| "\($start + ($i*$cols) | lpad(5; " ")): " + .[ $i * $cols : ($i+1) * $cols] ;
# input: a bow
def pp_counts:
"BASE COUNTS:",
(to_entries | sort[] | " \(.key): \(.value | lpad(6;" "))"),
"Total: \( [.[]] | add | lpad(7;" "))" ;
# state: {bow, emit, pending, start}
foreach (stream,null) as $line ({start: - $cols};
.start += $cols
| if $line == null
then .emit = .pending
else .bow |= bow(range(0; $line|length) | $line[.:.+1])
| (($line|length) + (.pending|length) ) as $len
| if $len >= $cols
then (.pending + $line) as $new
| .emit = $new[:$cols]
| .pending = $new[$cols:]
else .pending = $line
end
end;
(select(.emit|length > 0) | .start as $start | .emit | pp_sequence($start)),
(select($line == null) | "", (.bow|pp_counts) ) )
;
# To illustrate reformatting:
report(inputs; 33)
0: CGTAAAAAATTACAACGTCCTTTGGCTATCTCT
33: TAAACTCCTGCTAAATGCTCGTGCTTTCCAATT
66: ATGTAAGCGTTCCGAGACGGGGTGGTCGATTCT
99: GAGGACAAAGGTCAAGATGGAGCGCATCGAACG
132: CAATAAGGATCATTTGATGGGACGTTTCGTCGA
165: CAAAGTCTTGTTTCGAGAGTAACGGCTACCGTC
198: TTCGATTCTGCTTATAACACTATGTTCTTATGA
231: AATGGATGTTCTGAGTTGGTCAGTCCCAATGTG
264: CGGGGTTTCTTTTAGTACGTCGGGAGTGGTATT
297: ATATTTAATTTTTCTATATAGCGATCTGTATTT
330: AAGCAATTCATTTAGGTTATCGCCGCGATGCTC
363: GGTTCGGACCGCCAAGCATCTGGCTCCACTGCT
396: AGTGTCCTAAATTTGAATGGCAAACACAAATAA
429: GATTTAGCAATTCGTGTAGACGACCGGGGACTT
462: GCATGATGGGAGCAGCTTTGTTAAACTACGAAC
495: GTAAT
BASE COUNTS:
A: 129
C: 97
G: 119
T: 155
Total: 500
You may also check:How to resolve the algorithm Elliptic curve arithmetic step by step in the Racket programming language
You may also check:How to resolve the algorithm Stack step by step in the Frink programming language
You may also check:How to resolve the algorithm Monte Carlo methods step by step in the Phix programming language
You may also check:How to resolve the algorithm Digital root step by step in the OCaml programming language
You may also check:How to resolve the algorithm Super-Poulet numbers step by step in the Phix programming language