odgi stats

Metrics describing a variation graph.

SYNOPSIS

odgi stats [-i, --idx=FILE] [OPTION]…

DESCRIPTION

The odgi stats command produces statistics of a variation graph. Among other metrics, it can calculate the #nodes, #edges, #paths and the total nucleotide length of the graph. It can also produce a YAML file that is perfectly curated for the input of MultiQC's ODGI module.

OPTIONS

MANDATORY OPTIONS

-i, --idx=FILE
Load the succinct variation graph in ODGI format from this FILE. The file name usually ends with .og. It also accepts GFAv1, but the on-the-fly conversion to the ODGI format requires additional time!

Summary Options

-S, --summarize
Summarize the graph properties and dimensions. Print to stdout the #nucleotides, #nodes, #edges, #paths, and #steps in a tab-delimited format.
-W, --weak-connected-components
Shows the properties of the weakly connected components.
-L, --self-loops
Number of nodes with a self-loop.
-N, --nondeterministic-edges
Show nondeterministic edges (those that extend to the same next base).
-b, --base-content
Describe the base content of the graph. Print to stdout the #A, #C, #G and #T in a tab-delimited format.
-D, --delim=STRING
The part of each path name before this delimiter is a group identifier, which when specified will ensure that odgi stats collects the summary information per group and not per path.
-f, --file-size
Show the file size in bytes.
-a, --pangenome-sequence-class-counts=DELIM,POS
Show counted pangenome sequence class counts of all samples. Classes are Private (only one sample visiting the node), Core (all samples visiting the node), and Shell (not Core or Private). The given OPTION determines how to find the sample name in the path names: DELIM,POS. Split the whole path name by DELIM and access the actual sample name at POS of the split result. If the full path name is the sample name, select a DELIM that is not in the path names and set POS to 0. If -m,--multiqc was set, this OPTION has to be set implicitly.

Sorting Goodness Eval Options

-c, --coords-in=FILE
Load the 2D layout coordinates in binary layout format from this FILE. The file name usually ends with .lay. The sorting goodness evaluation will then be performed for this FILE. When the layout coordinates are provided, the mean links length and the sum path nodes distances statistics are evaluated in 2D, else in 1D. Such a file can be generated with odgi layout.
-l, --mean-links-length
Calculate the mean links length. This metric is path-guided and computable in 1D and 2D.
-g, --no-gap-links
Don’t penalize gap links in the mean links length. A gap link is a link which connects two nodes that are consecutive in the linear pangenomic order. This option is specifiable only to compute the mean links length in 1D.
-s, --sum-path-nodes-distances
Calculate the sum of path nodes distances. This metric is path-guided and computable in 1D and 2D. For each path, it iterates from node to node, summing their distances, and normalizing by the path length. In 1D, if a link goes back in the linearized viewpoint of the graph, this is penalized (adding 3 times its length in the sum).
-d, --penalize-different-orientation
If a link connects two nodes which have different orientations, this is penalized (adding 2 times its length in the sum).
-p, --path-statistics
Display the statistics (mean links length or sum path nodes distances) for each path.
-w, --weighted-feedback-arc
Compute the sum of weights of all feedback arcs, i.e. backward pointing edges the statistics (the weight is the number of times the edge is traversed by paths).
-j, --weighted-reversing-join
Compute the sum of weights of all reversing joins, i.e. edges joining two in- or two out-sides (the weight is the number of times the edge is traversed by paths).
-q, --links_length_per_nuc
Compute the links length per nucleotide, i.e. sum up the links lengths of all paths and divide this value by the nucleotide lengths of all paths. This metric can be used to compare the linearity of different graphs. By default we don't count gap links.

IO Format Options

-y, --yaml
Setting this option prints all selected statistics in YAML format instead of pseudo TSV to stdout.
-m, --multiqc
Setting this option prints all! statistics in YAML format instead of pseudo TSV to stdout. This includes -S,--summarize, -W,--weak-connected-components, -L,--self-loops, -b,--base-content, -l,--mean-links-length, -g,--no-gap-links, -s,--sum-path-nodes-distances, -f,--file-size, and -d,--penalize-different-orientation. -p,path-statistics is still optional. Not applicable to -N,--nondeterministic-edges. Overwrites all other given OPTIONs! The output is perfectly curated for the ODGI MultiQC module.

Threading

-t, --threads=N
Number of threads to use for parallel operations.

Processing Information

-P, --progress
Print information about the operations and the progress to stderr.

Program Information

-h, --help
Print a help message for odgi stats.