Alphabets¶

The following alphabets are available

Common functionality¶

All functions are guaranteed to not throw.
Alphabets starting with d_ are alphabets with a additional delimiter $ as rank 0.
static size_t size()

Returns the number of elements of the alphabet.
static uint8_t char_to_rank(char c)

Converts ASCII symbol c to an ranked uint8_t representation (0 ≤ r < size()). Invalid ASCII symbols will return value 255.
static char rank_to_char(uint8_t r)

Converts the ranked value r to its ACSII corresponding value. Value r must be fulfill (0 ≤ r < size()).
static char normalize_char(char c)

Normalizes the ASCII value c. Normalizing depends on the alphabet. Typically this includes representing the value in captial letter.

Additionall functionality for Nucleotide based values:

static char complement_char(char c)

Computes the complement of the ASCII value c. Example given: the complement of dna4 for 'A' is 'T'.
static uint8_t complement_rank(uint8_t)

Computes the complement in rank space. Example given: the complement of dna4 of rank value 0 is 3. (A -> T).

Some alphabets have ambiguous bases. Like in dna5 the letter 'N' can stand for 'A', 'C', 'G' or 'T'. For this we provide a sepecial functionality:

static auto ambigous_bases() -> std::array<uint8_t, /*alphabet dependent*/>

Returns an array of bases that are ambiguous.
static auto base_alternatives(uint8_t base) -> std::vector<uint8_t>

Returns a list of values that base could stand for.