# odgi sort

Apply different kinds of sorting algorithms to a graph. The most prominent one is the PG-SGD sorting algorithm.

## SYNOPSIS

**odgi sort** [**-i, --idx**=*FILE*] [**-o, --out**=*FILE*]
[*OPTION*]…

## DESCRIPTION

The odgi sort command sorts a succinct variation graph. Odgi sort offers a diverse palette of sorting algorithms to determine the node order:

A topological sort: A graph can be sorted via breadth-first search (BFS) or depth-first search (DFS). Optionally, a chunk size specifies how much of the graph to grab at once in each topological sorting phase. The sorting algorithm will continue the sort from the next node in the prior graph order that has not been sorted, yet. The cycle breaking algorithm applies a DFS sort until a cycle is found. We break and start a new DFS sort phase from where we stopped.

A random sort: The graph is randomly sorted. The node order is randomly shuffled from Mersenne Twister pseudo-random generated numbers.

A 1D linear SGD sort: ODGI implements a 1D linear, variation graph adjusted, multi-threaded version of the Graph Drawing by Stochastic Gradient Descent algorithm. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level. It applies stochastic gradient descent (SGD) to move a single pair of nodes at a time.

A path guided, 1D linear SGD sort: ODGI implements a 1D linear, variation graph adjusted, multi-threaded version of the Graph Drawing by Stochastic Gradient Descent algorithm. The force-directed graph drawing algorithm minimizes the graph’s energy function or stress level. It applies stochastic gradient descent (SGD) to move a single pair of nodes at a time. The path index is used to pick the terms to move stochastically. For more details about the algorithm, please take a look at https://www.biorxiv.org/content/10.1101/2023.09.22.558964v2.

Sorting the paths in a graph my refine the sorting process. For the users’ convenience, it is possible to specify a whole pipeline of sorts within one parameter.

## OPTIONS

### MANDATORY OPTIONS

**-i, --idx**=

*FILE*

*FILE*. The file name usually ends with

*.og*. It also accepts GFAv1, but the on-the-fly conversion to the ODGI format requires additional time!

**-o, --out**=

*FILE*

*.og*is recommended.

### Files IO Options

**-X, --path-index**=

*FILE*

*FILE*. The file name usually ends with

*.xp*.

**-s, --sort-order**=

*FILE*

*FILE*containing the sort order. Each line contains one node identifier.

**-C, --temp-dir**=

*PATH*

### Topological Sort Options

**-b, --breadth-first**

**-B, --breadth-first-chunk**=

*N*

**-c, --cycle-breaking**

**-z, --depth-first**

**-Z, --depth-first-chunk**=

*N*

**-w, --two-way**

**-n, --no-seeds**

### Random Sort Options

**-r, --random**

### DAGify Sort Options

**-d, --dagify-sort**

### Path Guided 1D Linear SGD Sort

**-Y, --path-sgd**

**-f, --path-sgd-use-paths**=FILE

**-G, --path-sgd-min-term-updates-paths**=

*N*

*1.0*). Can be overwritten by

*-U, -path-sgd-min-term-updates-nodes=N*.

**-U, --path-sgd-min-term-updates-nodes**=

*N*

*-G,path-sgd-min-term-updates-paths=N*is used).

**-j, --path-sgd-delta**=

*N*

*0.0*).

**-g, --path-sgd-eps**=

*N*

*0.01*).

**-v, --path-sgd-eta-max**=

*N*

*squared steps of longest path in graph*).

**-a, --path-sgd-zipf-theta**=

*N*

*0.99*).

**-x, --path-sgd-iter-max**=

*N*

**-F, --iteration-max-learning-rate**=

*N*

*0*).

**-k, --path-sgd-zipf-space**=

*N*

*longest path length*).

**-I, --path-sgd-zipf-space-max**=

*N*

*100*).

**-l, --path-sgd-zipf-space-quantization-step**=

*N*

*100*).

**-y, --path-sgd-zipf-max-num-distributions**=

*N*

*100*).

**-q, --path-sgd-seed**=

*N*

*pangenomic!*).

**-u, --path-sgd-snapshot**=

*STRING*

*-Y, --path-sgd*was specified. Not applicable in a pipeline of sorts.

**-H, --target-paths**=

*FILE*

*FILE*. PG-SGD will keep the nodes of the given paths fixed. A path's rank determines it's weight for decision making and is given by its position in the given

*FILE*.

### Pipeline Sorting Options

**-p, --pipeline**=

*STRING*

*s*: Topolocigal sort, heads only.

*n*: Topological sort, no heads, no tails.

*d*: DAGify sort.

*c*: Cycle breaking sort.

*b*: Breadth first topological sort.

*z*: Depth first topological sort.

*w*: Two-way topological sort.

*r*: Random sort.

*Y*: PG-SGD 1D sort.

*f*: Reverse order.

*g*: Groom the graph. An example could be

*Ygs*.

### Path Sorting Options

**-L, --paths-min**

**-M, --paths-max**

**-A, --paths-avg**

**-R, --paths-avg-rev**

**-D, --path-delim**=

*path-delim*

### Optimize Options

**-O, --optimize**

### Threading

**-t, --threads**=

*N*

### Processing Information

**-P, --progress**

### Program Information

**-h, --help**

**odgi sort**.