How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the C++ programming language

Published on 7 June 2024 03:52 AM

How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the C++ programming language

Table of Contents

Problem Statement

Given a string of characters A, C, G, and T representing a DNA sequence write a routine to mutate the sequence, (string) by:

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the C++ programming language

The provided code defines a program that operates on DNA sequences. It's developed in C++ and utilizes several C++ Standard Library components. Let's break down the code section by section:

Class Definition (sequence_generator):

The sequence_generator class is the heart of this program. It handles the generation and mutation of DNA sequences.

  • Constructor: The constructor initializes essential members, including a random number generator (engine_) and a uniform integer distribution for generating random bases (base_dist_). It also sets up a simple uniform distribution for mutation operations (op_dist). Initially, all mutation operations are given equal weight (1), stored in the operation_weight_ array.

  • get_random_base: This method returns a random DNA base ('A', 'C', 'G', or 'T') based on the base_dist_ distribution.

  • get_random_operation: This method returns a random mutation operation (change, erase, or insert) based on the weights specified in the operation_weight_ array. By default, all operations have equal weight.

  • set_weight: This method allows you to customize the weights of different mutation operations. For instance, you could increase the weight of the 'change' operation to make it more likely to occur during mutations.

  • generate_sequence: This method generates a DNA sequence of a given length. It uses the get_random_base method to generate each base in the sequence.

  • mutate_sequence: This method applies a random mutation operation to a DNA sequence. It first selects a random position in the sequence and then performs the selected operation (change, erase, or insert) based on the get_random_operation method.

  • print_sequence: This static method is used to print a DNA sequence along with the corresponding base counts. It also displays the positions of mutated bases.

Main Function:

  • sequence_generator gen: This line creates an instance of the sequence_generator class.

  • Customizing Mutation Weight: The gen.set_weight call increases the weight of the 'change' operation to 2. This means that it is twice as likely to be selected as the other operations during mutations.

  • Initial Sequence Generation: The gen.generate_sequence call creates an initial DNA sequence of length 250 and stores it in the sequence variable.

  • Printing Initial Sequence: The sequence_generator::print_sequence call prints the initial sequence and its base counts.

  • Mutation Loop: The code enters a loop that performs 10 mutations on the sequence using the gen.mutate_sequence method.

  • Printing Mutated Sequence: After the mutations, the sequence_generator::print_sequence call prints the mutated sequence and its base counts.

Program Summary:

This program generates a random DNA sequence and then performs a series of mutations on it. The mutation operations include changing, erasing, or inserting DNA bases. You can customize the weights of different mutation operations to bias the mutation process towards certain types of changes. The program provides a detailed view of the mutated sequence, including the positions of mutated bases and the updated base counts.

Source code in the cpp programming language

#include <array>
#include <iomanip>
#include <iostream>
#include <random>
#include <string>

class sequence_generator {
public:
    sequence_generator();
    std::string generate_sequence(size_t length);
    void mutate_sequence(std::string&);
    static void print_sequence(std::ostream&, const std::string&);
    enum class operation { change, erase, insert };
    void set_weight(operation, unsigned int);
private:
    char get_random_base() {
        return bases_[base_dist_(engine_)];
    }
    operation get_random_operation();
    static const std::array<char, 4> bases_;
    std::mt19937 engine_;
    std::uniform_int_distribution<size_t> base_dist_;
    std::array<unsigned int, 3> operation_weight_;
    unsigned int total_weight_;
};

const std::array<char, 4> sequence_generator::bases_{ 'A', 'C', 'G', 'T' };

sequence_generator::sequence_generator() : engine_(std::random_device()()),
    base_dist_(0, bases_.size() - 1),
    total_weight_(operation_weight_.size()) {
    operation_weight_.fill(1);
}

sequence_generator::operation sequence_generator::get_random_operation() {
    std::uniform_int_distribution<unsigned int> op_dist(0, total_weight_ - 1);
    unsigned int n = op_dist(engine_), op = 0, weight = 0;
    for (; op < operation_weight_.size(); ++op) {
        weight += operation_weight_[op];
        if (n < weight)
            break;
    }
    return static_cast<operation>(op);
}

void sequence_generator::set_weight(operation op, unsigned int weight) {
    total_weight_ -= operation_weight_[static_cast<size_t>(op)];
    operation_weight_[static_cast<size_t>(op)] = weight;
    total_weight_ += weight;
}

std::string sequence_generator::generate_sequence(size_t length) {
    std::string sequence;
    sequence.reserve(length);
    for (size_t i = 0; i < length; ++i)
        sequence += get_random_base();
    return sequence;
}

void sequence_generator::mutate_sequence(std::string& sequence) {
    std::uniform_int_distribution<size_t> dist(0, sequence.length() - 1);
    size_t pos = dist(engine_);
    char b;
    switch (get_random_operation()) {
    case operation::change:
        b = get_random_base();
        std::cout << "Change base at position " << pos << " from "
            << sequence[pos] << " to " << b << '\n';
        sequence[pos] = b;
        break;
    case operation::erase:
        std::cout << "Erase base " << sequence[pos] << " at position "
            << pos << '\n';
        sequence.erase(pos, 1);
        break;
    case operation::insert:
        b = get_random_base();
        std::cout << "Insert base " << b << " at position "
            << pos << '\n';
        sequence.insert(pos, 1, b);
        break;
    }
}

void sequence_generator::print_sequence(std::ostream& out, const std::string& sequence) {
    constexpr size_t base_count = bases_.size();
    std::array<size_t, base_count> count = { 0 };
    for (size_t i = 0, n = sequence.length(); i < n; ++i) {
        if (i % 50 == 0) {
            if (i != 0)
                out << '\n';
            out << std::setw(3) << i << ": ";
        }
        out << sequence[i];
        for (size_t j = 0; j < base_count; ++j) {
            if (bases_[j] == sequence[i]) {
                ++count[j];
                break;
            }
        }
    }
    out << '\n';
    out << "Base counts:\n";
    size_t total = 0;
    for (size_t j = 0; j < base_count; ++j) {
        total += count[j];
        out << bases_[j] << ": " << count[j] << ", ";
    }
    out << "Total: " << total << '\n';
}

int main() {
    sequence_generator gen;
    gen.set_weight(sequence_generator::operation::change, 2);
    std::string sequence = gen.generate_sequence(250);
    std::cout << "Initial sequence:\n";
    sequence_generator::print_sequence(std::cout, sequence);
    constexpr int count = 10;
    for (int i = 0; i < count; ++i)
        gen.mutate_sequence(sequence);
    std::cout << "After " << count << " mutations:\n";
    sequence_generator::print_sequence(std::cout, sequence);
    return 0;
}


  

You may also check:How to resolve the algorithm Bitmap/Bresenham's line algorithm step by step in the F# programming language
You may also check:How to resolve the algorithm Constrained genericity step by step in the Ruby programming language
You may also check:How to resolve the algorithm Filter step by step in the Pop11 programming language
You may also check:How to resolve the algorithm Maze generation step by step in the Rust programming language
You may also check:How to resolve the algorithm Draw a sphere step by step in the Maple programming language