odgi stepindex
Generate a step index from a given graph. If no output file is provided via -o, --out, the index will be directly written to INPUT_GRAPH.stpidx.
odgi stepindex [-i, --idx=FILE] [-o, --out=FILE] [OPTION]…
The odgi stepindex command generates a step index from a given graph. Such an index allows us to efficiently retrieve the nucleotide position of a given graph step. In order to save memory, a sampled step index is implemented here. We solve memory issues by only indexing every node with node identifier fitting mod(node_id, step-index-sample-rate) == 0 in the graph. From a given step, we can find its position by walking backwards until a node fitting our sampling criteria is found. We can retrieve this position easily, adding up the walked distance to retrieve the actual position of the step. Effectively, the sample rate is only allowed to be a number by the power of 2, because we can use bit shift operations to calculate the modulo in O(1)! (https://www.geeksforgeeks.org/compute-modulus-division-by-a-power-of-2-number/). As evaluated, the default sample rate is 8, which represents a good compromise between performance and memory usage. For ultra large graphs with hundreds of gigabytes in size, a sample rate of 16 might suite better.
As a bonus, the step index includes all the lengths of the paths, too. This allows us to efficiently get the length in nucleotides of a path by a given path handle.
Current ODGI tools that work with a step index are odgi untangle and odgi tips.