analyzeAlignments
Estimating sequence diversity from sequence alignments
|
Classes | |
struct | AlignmentStatistics |
Collection of alignment statistics. More... | |
class | ParseFASTA |
FASTA alignment parser. More... | |
Functions | |
void | parseCL (int &argc, char **argv, std::unordered_map< std::string, std::string > &cli) |
Command line parser. | |
void | extractCLinfo (const std::unordered_map< std::string, std::string > &parsedCLI, std::unordered_map< std::string, int > &intVariables, std::unordered_map< std::string, std::string > &stringVariables) |
Extract parameters from parsed command line interface flags. | |
void | saveDiversityTable (const std::vector< std::pair< size_t, std::vector< uint32_t > > > &diversityTable, std::fstream &outFile) |
Save the diversity table. | |
void | saveUniqueSequences (const std::unordered_map< std::string, uint32_t > &uniqueSequences, const std::string &consensus, const std::string &fileType, std::fstream &outFile) |
Save unique sequences. | |
void | saveUniqueSequences (const std::vector< std::pair< std::string, uint32_t > > &uniqueSequences, const std::string &consensus, const std::string &fileType, std::fstream &outFile) |
Save sorted unique sequences. | |
void | saveUniqueSequences (const std::unordered_map< std::string, uint32_t > &uniqueSequences, const std::string &consensus, const AlignmentStatistics &alignStats, const std::string &query, const std::string &fileType, std::fstream &outFile) |
Save unique sequences with query. | |
void | saveUniqueSequences (const std::vector< std::pair< std::string, uint32_t > > &uniqueSequences, const std::string &consensus, const AlignmentStatistics &alignStats, const std::string &query, const std::string &fileType, std::fstream &outFile) |
Save sorted unique sequences with query. | |
void BayesicSpace::extractCLinfo | ( | const std::unordered_map< std::string, std::string > & | parsedCLI, |
std::unordered_map< std::string, int > & | intVariables, | ||
std::unordered_map< std::string, std::string > & | stringVariables | ||
) |
Extract parameters from parsed command line interface flags.
Extracts needed variable values, indexed by std::string
encoded variable names.
void BayesicSpace::parseCL | ( | int & | argc, |
char ** | argv, | ||
std::unordered_map< std::string, std::string > & | cli | ||
) |
Command line parser.
Maps flags to values. Flags assumed to be of the form --flag-name value
.
[in] | argc | size of the argv array |
[in] | argv | command line input array |
[out] | cli | map of tags to values |
void BayesicSpace::saveDiversityTable | ( | const std::vector< std::pair< size_t, std::vector< uint32_t > > > & | diversityTable, |
std::fstream & | outFile | ||
) |
Save the diversity table.
Save the diversity table. The output file will have two columns: (1) window start position (repeated for every unique sequence). (2) number of unique sequence occurrences.
[in] | diversityTable | the diversity table data |
[in,out] | outFile | output file stream |
void BayesicSpace::saveUniqueSequences | ( | const std::unordered_map< std::string, uint32_t > & | uniqueSequences, |
const std::string & | consensus, | ||
const AlignmentStatistics & | alignStats, | ||
const std::string & | query, | ||
const std::string & | fileType, | ||
std::fstream & | outFile | ||
) |
Save unique sequences with query.
Save unique sequences in an alignment window. If in FASTA format, the number of times each sequence appears in an alignment is in the header. If in TAB format, sequence and the number of occurrences are on the same line, separated by a tab. The query sequence is displayed on the top line, may be different length than the rest of the sequences if there are insertions/deletions. The consensus is displayed on the second line, marked by "C" in the TAB format. The start position and length of the widow are also included. They are explicitly described in the consensus FASTA header, or included with a "|" delimiter in the TAB format. Nucleotides that are the same as the consensus are displayed as '.', the different residues are shown.
[in] | uniqueSequences | table of unique sequences and their counts |
[in] | consensus | consensus sequence for the window |
[in] | alignStats | alignment statistics |
[in] | query | query sequence |
[in] | fileType | TAB or FASTA, otherwise throws |
[in,out] | outFile | output stream |
void BayesicSpace::saveUniqueSequences | ( | const std::unordered_map< std::string, uint32_t > & | uniqueSequences, |
const std::string & | consensus, | ||
const std::string & | fileType, | ||
std::fstream & | outFile | ||
) |
Save unique sequences.
Save unique sequences in an alignment window. If in FASTA format, the number of times each sequence appears in an alignment is in the header. If in TAB format, sequence and the number of occurrences are on the same line, separated by a tab. The consensus is displayed on the top line. Nucleotides that are the same as the consensus are displayed as '.', the different residues are shown.
[in] | uniqueSequences | table of unique sequences and their counts |
[in] | consensus | consensus sequence for the window |
[in] | fileType | TAB or FASTA, otherwise throws |
[in,out] | outFile | output stream |
void BayesicSpace::saveUniqueSequences | ( | const std::vector< std::pair< std::string, uint32_t > > & | uniqueSequences, |
const std::string & | consensus, | ||
const AlignmentStatistics & | alignStats, | ||
const std::string & | query, | ||
const std::string & | fileType, | ||
std::fstream & | outFile | ||
) |
Save sorted unique sequences with query.
Save unique sequences in an alignment window. If in FASTA format, the number of times each sequence appears in an alignment is in the header. If in TAB format, sequence and the number of occurrences are on the same line, separated by a tab. The query sequence is displayed on the top line, may be different length than the rest of the sequences if there are insertions/deletions. The consensus is displayed on the second line, marked by "C" in the TAB format. The start position and length of the widow are also included. They are explicitly described in the consensus FASTA header, or included with a "|" delimiter in the TAB format. Nucleotides that are the same as the consensus are displayed as '.', the different residues are shown. Sequences are sorted by the number of occurrences in descending order.
[in] | uniqueSequences | table of unique sequences and their counts |
[in] | consensus | consensus sequence for the window |
[in] | alignStats | alignment statistics |
[in] | query | query sequence |
[in] | fileType | TAB or FASTA, otherwise throws |
[in,out] | outFile | output stream |
void BayesicSpace::saveUniqueSequences | ( | const std::vector< std::pair< std::string, uint32_t > > & | uniqueSequences, |
const std::string & | consensus, | ||
const std::string & | fileType, | ||
std::fstream & | outFile | ||
) |
Save sorted unique sequences.
Save unique sequences in an alignment window. If in FASTA format, the number of times each sequence appears in an alignment is in the header. If in TAB format, sequence and the number of occurrences are on the same line, separated by a tab. The consensus is displayed on the top line. Nucleotides that are the same as the consensus are displayed as '.', the different residues are shown. Sequences are sorted by the number of occurrences in descending order.
[in] | uniqueSequences | table of unique sequences and their counts |
[in] | consensus | consensus sequence for the window |
[in] | fileType | TAB or FASTA, otherwise throws |
[in,out] | outFile | output stream |