Welcome to the odgi documentation!
In standard genomic approaches sequences are related to a single linear reference genome introducing reference bias. Pangenome graphs encoded in the variation graph data model describe the all versus all alignment of many sequences. Representing large pangenome graphs with minimal memory overhead requires a careful encoding of the graph entities. It is possible to build succinct, static data structures to store queryable graphs, as in xg, but dynamic data structures are more tricky to implement.
The optimized dynamic genome/graph implementation odgi follows the dynamic
GBWT in developing
a byte-packed version of the graph, edges, and paths through it. The node's id is stored as a uint64_t
and its
sequence is stored as a plain std::string
. Bit-compressed dynamic byte arrays, with a local alphabet encoder,
represent the local neighbourhood
of the node:
The node's edges, and
the paths crossing the node.
To ensure minimal memory occupation, only the deltas of the neighbouring steps of a path are hold.
odgi
provides a set of tools ranging from graph manipulation, layouting, extracting loci, over graph statistics to graph
visualization, validation, and gene annotation lift overs. The following figure gives an overview.
Methods provided by odgi
(in black) and their supported input (in blue) and output (in red) data formats.
odgi build transforms GFAv1 graphs into odgi
's binary, node-centric encoding format.
Such a built graph represents everything that is in the input GFAv1 graph, without any loss of information!
For a light dive into odgi
, just visit the Quick Start section.
Warning
odgi
does not construct graphs from scratch nor is it capable of extending them! A pangenome graph construction tool for
long read input sequences is for example PGGB.
A Reference-biased alternative would be Minigraph. Which's output can then be plugged into
Cactus.
If you want to extend an existing pangenome graph, please take a look at How can I import reads from a FASTQ or FASTA file into an existing graph?
Citation
Core Functionalities
Click on the images below for more details.
Exploratory Analysis
|
|
Detect Complex Regions
|
|
Extract Selected Loci
|
|
Sorting and Layouting
|
|
Navigating and Annotating Graphs
|
|
Remove Artifacts and Complex Regions
|
|
MultiQC Report of Graph Statistics
|