odgi
dynamic succinct variation graph tool
SYNOPSIS
odgi bin -i graph.og -j -w 100 -s -g
odgi break -i graph.og -o graph.broken.og -s 100 -d
odgi build -g graph.gfa -o graph.og
odgi chop -i graph.og -o graph.choped.og -c 1000
odgi cover -i graph.og -o graph.paths.og
odgi degree -i graph.og -S
odgi depth -i graph.og
odgi draw -i graph.og -c coords.lay -p .png -x 1920 -y 1080 -R -t 28
odgi explode -i graph.og -p prefix
odgi extract -i graph.og -p prefix -r path_name:0-17
odgi flatten -i graph.og -f FASTA.fa -b BED.tsv
odgi groom -i graph.og -o graph.groomed.og
odgi heaps -i graph.og
odgi kmers -i graph.og -c -k 23 -e 34 -D 50
odgi layout -i graph.og -o graph.og.lay
odgi matrix -i graph.og -e -d
odgi normalize -i graph.og -o graph.normalized.og -I 100 -d
odgi overlap -i graph.og -r path_name
odgi panpos -i graph.og -p Chr1 -n 4
odgi pathindex -i graph.og -o graph.xp
odgi paths -i graph.og -f
odgi pav -i graph.og -b bed.bed
odgi position -i target_graph.og -g
odgi prune -i graph.og -o graph.pruned.og -c 3 -C 345 -T
odgi server -i graph.og -p 4000 -ip 192.168.8.9
odgi sort -i graph.og -o graph.sorted.og -p bSnSnS
odgi squeeze -f input_graphs.txt -o graphs.og
odgi stats -i graph.og -y
odgi stepindex -i graph.og -o graph.og.stpidx
odgi tips -i graph.og -q "query_name"
odgi unchop -i graph.og -o graph.unchopped.og
odgi unitig -i graph.og -f -t 1324 -l 120
odgi untangle -i graph.og -q "query_name" - r "reference_name" -m 1000 -t 16 -P
odgi validate -i graph.og
odgi view -i graph.og -g
odgi viz -i graph.og -o graph.og.png -x 1920 -y 1080 -R -t 28
DESCRIPTION
odgi, the Optimized Dynamic (genome) Graph Interface, links a thrifty dynamic in-memory variation graph data model to a set of algorithms designed for scalable sorting, pruning, transformation, and visualization of very large genome graphs. odgi includes Python Binding that can be used to directly interface with its data model. This odgi manual provides detailed information about its features and subcommands, including examples.
COMMANDS
Each command has its own man page which can be viewed using e.g. man odgi_build. Below we have a brief summary of syntax and subcommand description.
odgi degree [-i, --idx=FILE] [OPTION]… The odgi degree command describes the graph in terms of node degree. In summarization mode, it shows the node.count, edge.count, avg.degree, min.degree, and max.degree. One can also specify degree ranges streaming these into a BED file.
odgi depth [-i, --input=FILE] [OPTION]… The odgi depth command finds the depth of graph as defined by query criteria.
odgi draw [-i, --idx=FILE] [-c, --coords-in=FILE] [-p, --png=FILE] [OPTION]… The odgi draw command draws previously-determined 2D layouts of the graph with diverse annotations.
odgi extract [-f, --input-graphs=FILE] [-o, –out=FILE] [OPTION]… The odgi extract command extracts parts of the graph as defined by query criteria.
odgi overlap [-i, --input=FILE] [OPTION]… The odgi overlap command finds the paths touched by the input paths.
odgi position [-i, --target=FILE] [OPTION]… The odgi position command position parts of the graph as defined by query criteria.
A topological sort: A graph can be sorted via breadth-first search (BFS) or depth-first search (DFS). Optionally, a chunk size specifies how much of the graph to grab at once in each topological sorting phase. The sorting algorithm will continue the sort from the next node in the prior graph order that has not been sorted, yet. The cycle breaking algorithm applies a DFS sort until a cycle is found. We break and start a new DFS sort phase from where we stopped.
A random sort: The graph is randomly sorted. The node order is randomly shuffled from Mersenne Twister pseudo-random generated numbers.
A 1D linear SGD sort: ODGI implements a 1D linear, variation graph adjusted, multi-threaded version of the Graph Drawing by Stochastic Gradient Descent algorithm. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level. It applies stochastic gradient descent (SGD) to move a single pair of nodes at a time.
Sorting the paths in a graph my refine the sorting process. For the users’ convenience, it is possible to specify a whole pipeline of sorts within one parameter.
odgi squeeze [-f, --input-graphs=FILE] [-o, –out=FILE] [OPTION]… The odgi squeeze command squeezes multiple graphs into the same file.
In order to save memory, a sampled step index is implemented here. We solve memory issues by only indexing every node with node identifier fitting mod(node_id, step-index-sample-rate) == 0 in the graph. From a given step, we can find its position by walking backwards until a node fitting our sampling criteria is found. We can retrieve this position easily, adding up the walked distance to retrieve the actual position of the step. Effectively, the sample rate is only allowed to be a number by the power of 2, because we can use bit shift operations to calculate the modulo in O(1)! (https://www.geeksforgeeks.org/compute-modulus-division-by-a-power-of-2-number/). As evaluated, the default sample rate is 8, which represents a good compromise between performance and memory usage. For ultra large graphs with hundreds of gigabytes in size, a sample rate of 16 might suite better.
As a bonus, the step index includes all the lengths of the paths, too. This allows us to efficiently get the length in nucleotides of a path by a given path handle.
Current ODGI tools that work with a step index are odgi untangle and odgi tips.
the graph or of tips of given path(s). Prints BED records to stdout. Each record consists of:
chrom: The query path name.
start: The 0-based start position of the query we hit in the node.
end: The 1-based end position of the query we hit in the node.
median_range: The 0-based median of the whole query path range of the node we hit. It is possible that a node contains several steps, so we want to mirror that here.
path: The name of the path we walked.
path_pos: The 0-based position of the path we walked when we hit the node of the query path.
walk_from_front: If 1 we walked from the front of the target path. Else it is 0.
BUGS
Refer to the odgi issue tracker at https://github.com/pangenome/odgi/issues.
RESOURCES
Project web site: https://github.com/pangenome/odgi
Git source repository on GitHub: https://github.com/pangenome/odgi
GitHub organization: https://github.com/pangenome
Discussion list / forum: https://github.com/pangenome/odgi/issues
COPYING
The MIT License (MIT)
Copyright (c) 2019-2021 Erik Garrison
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.