Welcome to the odgi documentation!¶
In standard genomic approaches sequences are related to a single linear reference genome introducing reference bias. Pangenome graphs encoded in the variation graph data model describe the all versus all alignment of many sequences. Representing large pangenome graphs with minimal memory overhead requires a careful encoding of the graph entities. It is possible to build succinct, static data structures to store queryable graphs, as in xg, but dynamic data structures are more tricky to implement.
The optimized dynamic genome/graph implementation odgi follows the dynamic
GBWT in developing
a byte-packed version of the graph, edges, and paths through it. The node's id is stored as a
uint64_t and its
sequence is stored as a plain
std::string. Bit-compressed dynamic byte arrays, with a local alphabet encoder,
represent the local neighbourhood
of the node:
The node's edges, and
the paths crossing the node.
To ensure minimal memory occupation, only the deltas of the neighbouring steps of a path are hold.
odgi provides a set of tools ranging from graph manipulation, layouting, extracting loci, over graph statistics to graph
visualization, validation, and gene annotation lift overs. The following figure gives an overview.
Methods provided by
odgi (in black) and their supported input (in blue) and output (in red) data formats.
odgi build transforms GFAv1 graphs into
odgi's binary, node-centric encoding format.
Such a built graph represents everything that is in the input GFAv1 graph, without any loss of information!
For a light dive into
odgi, just visit the Quick Start section.
odgi does not construct graphs from scratch nor is it capable of extending them! A pangenome graph construction tool for
long read input sequences is for example PGGB.
A Reference-biased alternative would be Minigraph. Which's output can then be plugged into
If you want to extend an existing pangenome graph, please take a look at How can I import reads from a FASTQ or FASTA file into an existing graph?
Click on the images below for more details.
Detect Complex Regions
Extract Selected Loci
Sorting and Layouting
Navigating and Annotating Graphs
Remove Artifacts and Complex Regions
MultiQC Report of Graph Statistics