How to resolve the algorithm Bioinformatics/base count step by step in the C programming language

Published on 7 June 2024 03:52 AM
#C

How to resolve the algorithm Bioinformatics/base count step by step in the C programming language

Table of Contents

Problem Statement

Given this string representing ordered DNA bases:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Bioinformatics/base count step by step in the C programming language

The provided C program reads a text file containing the sequence of a genome and calculates the length of the genome, the count of each nucleotide (Adenine, Thymine, Cytosine, and Guanine), and prints the genome sequence and the nucleotide counts in a visually pleasing format.

A breakdown of the code:

  1. Data Structures:

    • A genome struct is defined with three members:
      • strand: A pointer to a character array representing the DNA strand.
      • length: The length of the DNA strand.
      • next: A pointer to the next genome in a linked list.
  2. Global Variables:

    • genomeData: A pointer to the head of the linked list of genomes.
    • totalLength: The total length of all the DNA strands in the genome.
    • Adenine, Cytosine, Guanine, Thymine: Variables to keep track of the nucleotide counts.
  3. Function numDigits:

    • Calculates the number of digits in an integer. This function is used later for formatting the output.
  4. Function buildGenome:

    • Takes a character array str representing a DNA strand as input.
      • Calculates the length of the strand and adds it to totalLength.
      • Increments the count of the corresponding nucleotide for each character in the strand.
      • Creates a new genome node, allocates memory for its strand member, copies the DNA strand into it, and sets its length.
      • If genomeData is NULL (i.e., this is the first genome), it assigns the new node to genomeData.
      • Otherwise, it traverses the linked list of genomes to find the last node and appends the new node to it.
  5. Function printGenome:

    • Calculates the width of the output based on the totalLength.
    • Iterates through the linked list of genomes, printing each strand with its length and position in the genome.
    • After printing all strands, it prints the base count of each nucleotide and the total base count.
    • Frees the allocated memory for the genomeData linked list.
  6. Main Function:

    • Opens the file specified in the command-line argument and reads the DNA strands.
    • For each DNA strand read from the file, it calls the buildGenome function to add it to the genomeData linked list.
    • Finally, it calls the printGenome function to display the genome sequence and nucleotide counts.

Source code in the c programming language

#include<string.h>
#include<stdlib.h>
#include<stdio.h>

typedef struct genome{
    char* strand;
    int length;
    struct genome* next;
}genome;

genome* genomeData;
int totalLength = 0, Adenine = 0, Cytosine = 0, Guanine = 0, Thymine = 0;

int numDigits(int num){
    int len = 1;

    while(num>10){
        num = num/10;
        len++;
    }

    return len;
}

void buildGenome(char str[100]){
    int len = strlen(str),i;
    genome *genomeIterator, *newGenome; 

    totalLength += len;

    for(i=0;i<len;i++){
        switch(str[i]){
            case 'A': Adenine++;
                break;
            case 'T': Thymine++;
                break;
            case 'C': Cytosine++;
                break;
            case 'G': Guanine++;
                break;
        };
    }

    if(genomeData==NULL){
        genomeData = (genome*)malloc(sizeof(genome));

        genomeData->strand = (char*)malloc(len*sizeof(char));
        strcpy(genomeData->strand,str);
        genomeData->length = len;

        genomeData->next = NULL;
    }

    else{
        genomeIterator = genomeData;

        while(genomeIterator->next!=NULL)
            genomeIterator = genomeIterator->next;

        newGenome = (genome*)malloc(sizeof(genome));

        newGenome->strand = (char*)malloc(len*sizeof(char));
        strcpy(newGenome->strand,str);
        newGenome->length = len;

        newGenome->next = NULL;
        genomeIterator->next = newGenome;
    }
}

void printGenome(){
    genome* genomeIterator = genomeData;

    int width = numDigits(totalLength), len = 0;

    printf("Sequence:\n");

    while(genomeIterator!=NULL){
        printf("\n%*d%3s%3s",width+1,len,":",genomeIterator->strand);
        len += genomeIterator->length;

        genomeIterator = genomeIterator->next;
    }

    printf("\n\nBase Count\n----------\n\n");

    printf("%3c%3s%*d\n",'A',":",width+1,Adenine);
    printf("%3c%3s%*d\n",'T',":",width+1,Thymine);
    printf("%3c%3s%*d\n",'C',":",width+1,Cytosine);
    printf("%3c%3s%*d\n",'G',":",width+1,Guanine);
    printf("\n%3s%*d\n","Total:",width+1,Adenine + Thymine + Cytosine + Guanine);

    free(genomeData);
}

int main(int argc,char** argv)
{
    char str[100];
    int counter = 0, len;
    
    if(argc!=2){
        printf("Usage : %s <Gene file name>\n",argv[0]);
        return 0;
    }

    FILE *fp = fopen(argv[1],"r");

    while(fscanf(fp,"%s",str)!=EOF)
        buildGenome(str);
    fclose(fp);

    printGenome();

    return 0;
}


  

You may also check:How to resolve the algorithm Tokenize a string step by step in the F# programming language
You may also check:How to resolve the algorithm Ascending primes step by step in the PicoLisp programming language
You may also check:How to resolve the algorithm Box the compass step by step in the Go programming language
You may also check:How to resolve the algorithm String concatenation step by step in the Scheme programming language
You may also check:How to resolve the algorithm Loop over multiple arrays simultaneously step by step in the NewLISP programming language