How to resolve the algorithm Bioinformatics/base count step by step in the C programming language

Problem Statement
Step by Step Solution
Sourcecode

Problem Statement

Given this string representing ordered DNA bases:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Bioinformatics/base count step by step in the C programming language

The provided C program reads a text file containing the sequence of a genome and calculates the length of the genome, the count of each nucleotide (Adenine, Thymine, Cytosine, and Guanine), and prints the genome sequence and the nucleotide counts in a visually pleasing format.

A breakdown of the code:

Data Structures:
- A genome struct is defined with three members:
  - strand: A pointer to a character array representing the DNA strand.
  - length: The length of the DNA strand.
  - next: A pointer to the next genome in a linked list.
Global Variables:
- genomeData: A pointer to the head of the linked list of genomes.
- totalLength: The total length of all the DNA strands in the genome.
- Adenine, Cytosine, Guanine, Thymine: Variables to keep track of the nucleotide counts.
Function numDigits:
- Calculates the number of digits in an integer. This function is used later for formatting the output.
Function buildGenome:
- Takes a character array str representing a DNA strand as input.
  - Calculates the length of the strand and adds it to totalLength.
  - Increments the count of the corresponding nucleotide for each character in the strand.
  - Creates a new genome node, allocates memory for its strand member, copies the DNA strand into it, and sets its length.
  - If genomeData is NULL (i.e., this is the first genome), it assigns the new node to genomeData.
  - Otherwise, it traverses the linked list of genomes to find the last node and appends the new node to it.
Function printGenome:
- Calculates the width of the output based on the totalLength.
- Iterates through the linked list of genomes, printing each strand with its length and position in the genome.
- After printing all strands, it prints the base count of each nucleotide and the total base count.
- Frees the allocated memory for the genomeData linked list.
Main Function:
- Opens the file specified in the command-line argument and reads the DNA strands.
- For each DNA strand read from the file, it calls the buildGenome function to add it to the genomeData linked list.
- Finally, it calls the printGenome function to display the genome sequence and nucleotide counts.

Source code in the c programming language

#include<string.h>
#include<stdlib.h>
#include<stdio.h>

typedef struct genome{
    char* strand;
    int length;
    struct genome* next;
}genome;

genome* genomeData;
int totalLength = 0, Adenine = 0, Cytosine = 0, Guanine = 0, Thymine = 0;

int numDigits(int num){
    int len = 1;

    while(num>10){
        num = num/10;
        len++;
    }

    return len;
}

void buildGenome(char str[100]){
    int len = strlen(str),i;
    genome *genomeIterator, *newGenome; 

    totalLength += len;

    for(i=0;i<len;i++){
        switch(str[i]){
            case 'A': Adenine++;
                break;
            case 'T': Thymine++;
                break;
            case 'C': Cytosine++;
                break;
            case 'G': Guanine++;
                break;
        };
    }

    if(genomeData==NULL){
        genomeData = (genome*)malloc(sizeof(genome));

        genomeData->strand = (char*)malloc(len*sizeof(char));
        strcpy(genomeData->strand,str);
        genomeData->length = len;

        genomeData->next = NULL;
    }

    else{
        genomeIterator = genomeData;

        while(genomeIterator->next!=NULL)
            genomeIterator = genomeIterator->next;

        newGenome = (genome*)malloc(sizeof(genome));

        newGenome->strand = (char*)malloc(len*sizeof(char));
        strcpy(newGenome->strand,str);
        newGenome->length = len;

        newGenome->next = NULL;
        genomeIterator->next = newGenome;
    }
}

void printGenome(){
    genome* genomeIterator = genomeData;

    int width = numDigits(totalLength), len = 0;

    printf("Sequence:\n");

    while(genomeIterator!=NULL){
        printf("\n%*d%3s%3s",width+1,len,":",genomeIterator->strand);
        len += genomeIterator->length;

        genomeIterator = genomeIterator->next;
    }

    printf("\n\nBase Count\n----------\n\n");

    printf("%3c%3s%*d\n",'A',":",width+1,Adenine);
    printf("%3c%3s%*d\n",'T',":",width+1,Thymine);
    printf("%3c%3s%*d\n",'C',":",width+1,Cytosine);
    printf("%3c%3s%*d\n",'G',":",width+1,Guanine);
    printf("\n%3s%*d\n","Total:",width+1,Adenine + Thymine + Cytosine + Guanine);

    free(genomeData);
}

int main(int argc,char** argv)
{
    char str[100];
    int counter = 0, len;
    
    if(argc!=2){
        printf("Usage : %s <Gene file name>\n",argv[0]);
        return 0;
    }

    FILE *fp = fopen(argv[1],"r");

    while(fscanf(fp,"%s",str)!=EOF)
        buildGenome(str);
    fclose(fp);

    printGenome();

    return 0;
}

You may also check:How to resolve the algorithm Tokenize a string step by step in the F# programming language
You may also check:How to resolve the algorithm Ascending primes step by step in the PicoLisp programming language
You may also check:How to resolve the algorithm Box the compass step by step in the Go programming language
You may also check:How to resolve the algorithm String concatenation step by step in the Scheme programming language
You may also check:How to resolve the algorithm Loop over multiple arrays simultaneously step by step in the NewLISP programming language

How to resolve the algorithm Bioinformatics/base count step by step in the C programming language

Table of Contents

Problem Statement

Step by Step solution about How to resolve the algorithm Bioinformatics/base count step by step in the C programming language

Source code in the c programming language