How to resolve the algorithm Bioinformatics/base count step by step in the C programming language
Published on 7 June 2024 03:52 AM
How to resolve the algorithm Bioinformatics/base count step by step in the C programming language
Table of Contents
Problem Statement
Given this string representing ordered DNA bases:
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Bioinformatics/base count step by step in the C programming language
The provided C program reads a text file containing the sequence of a genome and calculates the length of the genome, the count of each nucleotide (Adenine, Thymine, Cytosine, and Guanine), and prints the genome sequence and the nucleotide counts in a visually pleasing format.
A breakdown of the code:
-
Data Structures:
- A
genome
struct is defined with three members:strand
: A pointer to a character array representing the DNA strand.length
: The length of the DNA strand.next
: A pointer to the nextgenome
in a linked list.
- A
-
Global Variables:
genomeData
: A pointer to the head of the linked list of genomes.totalLength
: The total length of all the DNA strands in the genome.Adenine
,Cytosine
,Guanine
,Thymine
: Variables to keep track of the nucleotide counts.
-
Function
numDigits
:- Calculates the number of digits in an integer. This function is used later for formatting the output.
-
Function
buildGenome
:- Takes a character array
str
representing a DNA strand as input.- Calculates the length of the strand and adds it to
totalLength
. - Increments the count of the corresponding nucleotide for each character in the strand.
- Creates a new
genome
node, allocates memory for itsstrand
member, copies the DNA strand into it, and sets its length. - If
genomeData
is NULL (i.e., this is the first genome), it assigns the new node togenomeData
. - Otherwise, it traverses the linked list of genomes to find the last node and appends the new node to it.
- Calculates the length of the strand and adds it to
- Takes a character array
-
Function
printGenome
:- Calculates the width of the output based on the
totalLength
. - Iterates through the linked list of genomes, printing each strand with its length and position in the genome.
- After printing all strands, it prints the base count of each nucleotide and the total base count.
- Frees the allocated memory for the
genomeData
linked list.
- Calculates the width of the output based on the
-
Main Function:
- Opens the file specified in the command-line argument and reads the DNA strands.
- For each DNA strand read from the file, it calls the
buildGenome
function to add it to thegenomeData
linked list. - Finally, it calls the
printGenome
function to display the genome sequence and nucleotide counts.
Source code in the c programming language
#include<string.h>
#include<stdlib.h>
#include<stdio.h>
typedef struct genome{
char* strand;
int length;
struct genome* next;
}genome;
genome* genomeData;
int totalLength = 0, Adenine = 0, Cytosine = 0, Guanine = 0, Thymine = 0;
int numDigits(int num){
int len = 1;
while(num>10){
num = num/10;
len++;
}
return len;
}
void buildGenome(char str[100]){
int len = strlen(str),i;
genome *genomeIterator, *newGenome;
totalLength += len;
for(i=0;i<len;i++){
switch(str[i]){
case 'A': Adenine++;
break;
case 'T': Thymine++;
break;
case 'C': Cytosine++;
break;
case 'G': Guanine++;
break;
};
}
if(genomeData==NULL){
genomeData = (genome*)malloc(sizeof(genome));
genomeData->strand = (char*)malloc(len*sizeof(char));
strcpy(genomeData->strand,str);
genomeData->length = len;
genomeData->next = NULL;
}
else{
genomeIterator = genomeData;
while(genomeIterator->next!=NULL)
genomeIterator = genomeIterator->next;
newGenome = (genome*)malloc(sizeof(genome));
newGenome->strand = (char*)malloc(len*sizeof(char));
strcpy(newGenome->strand,str);
newGenome->length = len;
newGenome->next = NULL;
genomeIterator->next = newGenome;
}
}
void printGenome(){
genome* genomeIterator = genomeData;
int width = numDigits(totalLength), len = 0;
printf("Sequence:\n");
while(genomeIterator!=NULL){
printf("\n%*d%3s%3s",width+1,len,":",genomeIterator->strand);
len += genomeIterator->length;
genomeIterator = genomeIterator->next;
}
printf("\n\nBase Count\n----------\n\n");
printf("%3c%3s%*d\n",'A',":",width+1,Adenine);
printf("%3c%3s%*d\n",'T',":",width+1,Thymine);
printf("%3c%3s%*d\n",'C',":",width+1,Cytosine);
printf("%3c%3s%*d\n",'G',":",width+1,Guanine);
printf("\n%3s%*d\n","Total:",width+1,Adenine + Thymine + Cytosine + Guanine);
free(genomeData);
}
int main(int argc,char** argv)
{
char str[100];
int counter = 0, len;
if(argc!=2){
printf("Usage : %s <Gene file name>\n",argv[0]);
return 0;
}
FILE *fp = fopen(argv[1],"r");
while(fscanf(fp,"%s",str)!=EOF)
buildGenome(str);
fclose(fp);
printGenome();
return 0;
}
You may also check:How to resolve the algorithm Tokenize a string step by step in the F# programming language
You may also check:How to resolve the algorithm Ascending primes step by step in the PicoLisp programming language
You may also check:How to resolve the algorithm Box the compass step by step in the Go programming language
You may also check:How to resolve the algorithm String concatenation step by step in the Scheme programming language
You may also check:How to resolve the algorithm Loop over multiple arrays simultaneously step by step in the NewLISP programming language