Welcome to the odgi documentation!

In standard genomic approaches sequences are related to a single linear reference genome introducing reference bias. Pangenome graphs encoded in the variation graph data model describe the all versus all alignment of many sequences. Representing large pangenome graphs with minimal memory overhead requires a careful encoding of the graph entities. It is possible to build succinct, static data structures to store queryable graphs, as in xg, but dynamic data structures are more tricky to implement.

The optimized dynamic genome/graph implementation odgi follows the dynamic GBWT in developing a byte-packed version of the graph, edges, and paths through it. The node's id is stored as a uint64_t and its sequence is stored as a plain std::string. Bit-compressed dynamic byte arrays, with a local alphabet encoder, represent the local neighbourhood of the node:

The node's edges, and

the paths crossing the node.

To ensure minimal memory occupation, only the deltas of the neighbouring steps of a path are hold.

odgi provides a set of tools ranging from graph manipulation, layouting, extracting loci, over graph statistics to graph visualization, validation, and gene annotation lift overs. The following figure gives an overview.

Methods provided by odgi (in black) and their supported input (in blue) and output (in red) data formats. odgi build transforms GFAv1 graphs into odgi's binary, node-centric encoding format. Such a built graph represents everything that is in the input GFAv1 graph, without any loss of information!

For a light dive into odgi, just visit the Quick Start section.

Warning

odgi does not construct graphs from scratch nor is it capable of extending them! A pangenome graph construction tool for long read input sequences is for example PGGB. A Reference-biased alternative would be Minigraph. Which's output can then be plugged into Cactus.

If you want to extend an existing pangenome graph, please take a look at How can I import reads from a FASTQ or FASTA file into an existing graph?

Citation

Andrea Guarracino*, Simon Heumos*, Sven Nahnsen, Pjotr Prins, Erik Garrison. ODGI: understanding pangenome graphs, Bioinformatics, 2022.

*Shared first authorship

Core Functionalities

Click on the images below for more details.

	Exploratory Analysis Translate GFAv1 to ODGI format Highlight different graph features in 1D Create 1D visualization of a particular region
	Detect Complex Regions Download human chr8 pangenome Calculate depth over pangenome Plot the depth Explore the centromer's organization
	Extract Selected Loci Extract a subgraph of LPA graph Visualize subgraph Extract MHC locus of human chr6 Visualize MHC locus
	Sorting and Layouting Sort DRB1-3123 graph Metrics of sorted and unsorted graph Compare 1D visualizations 2D layout of DRB1-3123 graph 2D drawing of DRB1-3123 graph gfaestus for interactive visualization
	Navigating and Annotating Graphs Path to graph position mapping Path to path position mapping Graph to path position mapping Graph offset to path position mapping Graph to reference position mapping Graph to graph position mapping Node annotation for Bandage
	Remove Artifacts and Complex Regions Identify problematic regions Remove identified regions Display graph stats Generate 1D visualization
	MultiQC Report of Graph Statistics Create graph statistics Apply MultiQC to statistics YAML Integrate 1D and 2D visualizations into the report

Welcome to the odgi documentation!

Citation

Core Functionalities

Index