analyzeAlignments
Estimating sequence diversity from sequence alignments
Loading...
Searching...
No Matches
BayesicSpace::ParseFASTA Class Reference

FASTA alignment parser. More...

#include <fastaParser.hpp>

Public Member Functions

 ParseFASTA ()=default
 Default constructor.
 
 ParseFASTA (const std::string &fastaFileName)
 Constructor from FASTA file.
 
 ParseFASTA (const ParseFASTA &toCopy)
 Copy constructor.
 
 ParseFASTA (ParseFASTA &&toMove) noexcept
 Move constructor.
 
ParseFASTAoperator= (const ParseFASTA &toCopy)
 Copy assignment operator.
 
ParseFASTAoperator= (ParseFASTA &&toMove) noexcept
 Move assignment operator.
 
 ~ParseFASTA ()=default
 Destructor.
 
size_t sequenceNumber () const noexcept
 Number of sequences in alignment.
 
size_t alignmentLength () const
 Alignment length.
 
std::string extractConsensusWindow (const size_t &startIdx, const size_t &windowLength) const
 Extract a consensus region.
 
std::vector< std::pair< size_t, std::vector< uint32_t > > > diversityInWindows (const size_t &windowSize, const size_t &stepSize) const
 Sequence diversity in windows.
 
std::unordered_map< std::string, uint32_t > extractWindow (const size_t &windowStartPosition, const size_t &windowSize) const
 Extract an alignment window.
 
std::vector< std::pair< std::string, uint32_t > > extractWindowSorted (const size_t &windowStartPosition, const size_t &windowSize) const
 Extract an alignment window and sort.
 
AlignmentStatistics extractSequence (const std::string &querySequence) const
 Extract a region matching a sequence.
 
void imputeMissing ()
 Impute missing values.
 

Detailed Description

FASTA alignment parser.

Reads a FASTA alignment file, separates the sequences and headers, and provides analysis methods. The data are stored in memory, so users should pay attention to file sizes.

Constructor & Destructor Documentation

◆ ParseFASTA() [1/4]

BayesicSpace::ParseFASTA::ParseFASTA ( )
default

Default constructor.

◆ ParseFASTA() [2/4]

ParseFASTA::ParseFASTA ( const std::string &  fastaFileName)

Constructor from FASTA file.

Read data from a FASTA file.

Parameters
[in]fastaFileNameinput FASTA file name

◆ ParseFASTA() [3/4]

ParseFASTA::ParseFASTA ( const ParseFASTA toCopy)

Copy constructor.

Parameters
[in]toCopyobject to copy

◆ ParseFASTA() [4/4]

ParseFASTA::ParseFASTA ( ParseFASTA &&  toMove)
noexcept

Move constructor.

Parameters
[in]toMoveobject to move

◆ ~ParseFASTA()

BayesicSpace::ParseFASTA::~ParseFASTA ( )
default

Destructor.

Member Function Documentation

◆ alignmentLength()

size_t BayesicSpace::ParseFASTA::alignmentLength ( ) const
inline

Alignment length.

Returns
alignment length

◆ diversityInWindows()

std::vector< std::pair< size_t, std::vector< uint32_t > > > ParseFASTA::diversityInWindows ( const size_t &  windowSize,
const size_t &  stepSize 
) const

Sequence diversity in windows.

Calculate the number of different sequences in window sliding along a sequence alignment. Reports the number of times each unique sequence occurs by window position.

Parameters
[in]windowSizewindow size in base pairs
[in]stepSizewindow movement steps in base pairs
Returns
vector of pairs that contain window start positions and unique sequence counts

◆ extractConsensusWindow()

std::string ParseFASTA::extractConsensusWindow ( const size_t &  startIdx,
const size_t &  windowLength 
) const

Extract a consensus region.

Extract a window of the consensus sequence.

Parameters
[in]startIdxindex of the window start
[in]windowLengthnumber of nucleotides in the window

◆ extractSequence()

AlignmentStatistics ParseFASTA::extractSequence ( const std::string &  querySequence) const

Extract a region matching a sequence.

Report all unique sequences (and their counts) matching the query sequence. Matching performed using striped Smith-Waterman alignment.

Parameters
[in]querySequencethe query sequence
Returns
matching window start and length

◆ extractWindow()

std::unordered_map< std::string, uint32_t > ParseFASTA::extractWindow ( const size_t &  windowStartPosition,
const size_t &  windowSize 
) const

Extract an alignment window.

Calculates the number of different sequences in a window. Reports the number of times each unique sequence occurs in the provided window.

Parameters
[in]windowStartPositionwindow start
[in]windowSizewindow size in base pairs
Returns
map of sequences to the number of times each occurs in the alignment

◆ extractWindowSorted()

std::vector< std::pair< std::string, uint32_t > > ParseFASTA::extractWindowSorted ( const size_t &  windowStartPosition,
const size_t &  windowSize 
) const

Extract an alignment window and sort.

Calculates the number of different sequences in a window. Reports the number of times each unique sequence occurs in the provided window. The output is sorted by the number of times a sequence is present, in descending order.

Parameters
[in]windowStartPositionwindow start
[in]windowSizewindow size in base pairs
Returns
map of sequences to the number of times each occurs in the alignment, sorted

◆ imputeMissing()

void ParseFASTA::imputeMissing ( )

Impute missing values.

Replaces missing (N or other variants, e.g. Y, S, etc.) nucleotides with the consensus value.

◆ operator=() [1/2]

ParseFASTA & ParseFASTA::operator= ( const ParseFASTA toCopy)

Copy assignment operator.

Parameters
[in]toCopyobject to copy

◆ operator=() [2/2]

ParseFASTA & ParseFASTA::operator= ( ParseFASTA &&  toMove)
noexcept

Move assignment operator.

Parameters
[in]toMoveobject to move

◆ sequenceNumber()

size_t BayesicSpace::ParseFASTA::sequenceNumber ( ) const
inlinenoexcept

Number of sequences in alignment.

Returns
number of sequences in the alignment

The documentation for this class was generated from the following files: