How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the C++ programming language
How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the C++ programming language
Table of Contents
Problem Statement
Given a string of characters A, C, G, and T representing a DNA sequence write a routine to mutate the sequence, (string) by:
Let's start with the solution:
Step by Step solution about How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the C++ programming language
The provided code defines a program that operates on DNA sequences. It's developed in C++ and utilizes several C++ Standard Library components. Let's break down the code section by section:
Class Definition (sequence_generator):
The sequence_generator
class is the heart of this program. It handles the generation and mutation of DNA sequences.
-
Constructor: The constructor initializes essential members, including a random number generator (
engine_
) and a uniform integer distribution for generating random bases (base_dist_
). It also sets up a simple uniform distribution for mutation operations (op_dist
). Initially, all mutation operations are given equal weight (1), stored in theoperation_weight_
array. -
get_random_base: This method returns a random DNA base ('A', 'C', 'G', or 'T') based on the
base_dist_
distribution. -
get_random_operation: This method returns a random mutation operation (change, erase, or insert) based on the weights specified in the
operation_weight_
array. By default, all operations have equal weight. -
set_weight: This method allows you to customize the weights of different mutation operations. For instance, you could increase the weight of the 'change' operation to make it more likely to occur during mutations.
-
generate_sequence: This method generates a DNA sequence of a given length. It uses the
get_random_base
method to generate each base in the sequence. -
mutate_sequence: This method applies a random mutation operation to a DNA sequence. It first selects a random position in the sequence and then performs the selected operation (change, erase, or insert) based on the
get_random_operation
method. -
print_sequence: This static method is used to print a DNA sequence along with the corresponding base counts. It also displays the positions of mutated bases.
Main Function:
-
sequence_generator gen: This line creates an instance of the
sequence_generator
class. -
Customizing Mutation Weight: The
gen.set_weight
call increases the weight of the 'change' operation to 2. This means that it is twice as likely to be selected as the other operations during mutations. -
Initial Sequence Generation: The
gen.generate_sequence
call creates an initial DNA sequence of length 250 and stores it in thesequence
variable. -
Printing Initial Sequence: The
sequence_generator::print_sequence
call prints the initial sequence and its base counts. -
Mutation Loop: The code enters a loop that performs 10 mutations on the
sequence
using thegen.mutate_sequence
method. -
Printing Mutated Sequence: After the mutations, the
sequence_generator::print_sequence
call prints the mutated sequence and its base counts.
Program Summary:
This program generates a random DNA sequence and then performs a series of mutations on it. The mutation operations include changing, erasing, or inserting DNA bases. You can customize the weights of different mutation operations to bias the mutation process towards certain types of changes. The program provides a detailed view of the mutated sequence, including the positions of mutated bases and the updated base counts.
Source code in the cpp programming language
#include <array>
#include <iomanip>
#include <iostream>
#include <random>
#include <string>
class sequence_generator {
public:
sequence_generator();
std::string generate_sequence(size_t length);
void mutate_sequence(std::string&);
static void print_sequence(std::ostream&, const std::string&);
enum class operation { change, erase, insert };
void set_weight(operation, unsigned int);
private:
char get_random_base() {
return bases_[base_dist_(engine_)];
}
operation get_random_operation();
static const std::array<char, 4> bases_;
std::mt19937 engine_;
std::uniform_int_distribution<size_t> base_dist_;
std::array<unsigned int, 3> operation_weight_;
unsigned int total_weight_;
};
const std::array<char, 4> sequence_generator::bases_{ 'A', 'C', 'G', 'T' };
sequence_generator::sequence_generator() : engine_(std::random_device()()),
base_dist_(0, bases_.size() - 1),
total_weight_(operation_weight_.size()) {
operation_weight_.fill(1);
}
sequence_generator::operation sequence_generator::get_random_operation() {
std::uniform_int_distribution<unsigned int> op_dist(0, total_weight_ - 1);
unsigned int n = op_dist(engine_), op = 0, weight = 0;
for (; op < operation_weight_.size(); ++op) {
weight += operation_weight_[op];
if (n < weight)
break;
}
return static_cast<operation>(op);
}
void sequence_generator::set_weight(operation op, unsigned int weight) {
total_weight_ -= operation_weight_[static_cast<size_t>(op)];
operation_weight_[static_cast<size_t>(op)] = weight;
total_weight_ += weight;
}
std::string sequence_generator::generate_sequence(size_t length) {
std::string sequence;
sequence.reserve(length);
for (size_t i = 0; i < length; ++i)
sequence += get_random_base();
return sequence;
}
void sequence_generator::mutate_sequence(std::string& sequence) {
std::uniform_int_distribution<size_t> dist(0, sequence.length() - 1);
size_t pos = dist(engine_);
char b;
switch (get_random_operation()) {
case operation::change:
b = get_random_base();
std::cout << "Change base at position " << pos << " from "
<< sequence[pos] << " to " << b << '\n';
sequence[pos] = b;
break;
case operation::erase:
std::cout << "Erase base " << sequence[pos] << " at position "
<< pos << '\n';
sequence.erase(pos, 1);
break;
case operation::insert:
b = get_random_base();
std::cout << "Insert base " << b << " at position "
<< pos << '\n';
sequence.insert(pos, 1, b);
break;
}
}
void sequence_generator::print_sequence(std::ostream& out, const std::string& sequence) {
constexpr size_t base_count = bases_.size();
std::array<size_t, base_count> count = { 0 };
for (size_t i = 0, n = sequence.length(); i < n; ++i) {
if (i % 50 == 0) {
if (i != 0)
out << '\n';
out << std::setw(3) << i << ": ";
}
out << sequence[i];
for (size_t j = 0; j < base_count; ++j) {
if (bases_[j] == sequence[i]) {
++count[j];
break;
}
}
}
out << '\n';
out << "Base counts:\n";
size_t total = 0;
for (size_t j = 0; j < base_count; ++j) {
total += count[j];
out << bases_[j] << ": " << count[j] << ", ";
}
out << "Total: " << total << '\n';
}
int main() {
sequence_generator gen;
gen.set_weight(sequence_generator::operation::change, 2);
std::string sequence = gen.generate_sequence(250);
std::cout << "Initial sequence:\n";
sequence_generator::print_sequence(std::cout, sequence);
constexpr int count = 10;
for (int i = 0; i < count; ++i)
gen.mutate_sequence(sequence);
std::cout << "After " << count << " mutations:\n";
sequence_generator::print_sequence(std::cout, sequence);
return 0;
}
You may also check:How to resolve the algorithm Bitmap/Bresenham's line algorithm step by step in the F# programming language
You may also check:How to resolve the algorithm Constrained genericity step by step in the Ruby programming language
You may also check:How to resolve the algorithm Filter step by step in the Pop11 programming language
You may also check:How to resolve the algorithm Maze generation step by step in the Rust programming language
You may also check:How to resolve the algorithm Draw a sphere step by step in the Maple programming language