Quick Start

Author:

Rohit Goswami

1 Quick Start

1.1 Installation

1.1.2 From GitHub releases

Pre-built binaries are available for Linux (x8664, aarch64), macOS (x8664, arm64), and Windows from the Releases page.

1.1.3 Via pixi

cd rsx-rs
pixi run build

1.2 Example workflow

Given demultiplexed RAD-seq reads in reads/ and a population map:

# popmap.tsv
ind1    M
ind2    M
ind3    F
ind4    F

1.2.1 Step 1: Build markers table

rsx process -i reads/ -o markers.tsv -T 4 -d 5

1.2.2 Step 2: Check marker frequencies

rsx freq -t markers.tsv -o freq.tsv -d 5

1.2.3 Step 3: Compute sex-bias distribution

rsx distrib -t markers.tsv -p popmap.tsv -o distrib.tsv -d 5 -G M,F

1.2.4 Step 4: Extract significant markers

rsx signif -t markers.tsv -p popmap.tsv -o signif.tsv -d 5 -G M,F

1.2.5 Step 5: Map to reference genome

rsx map -t markers.tsv -p popmap.tsv -g genome.fa -o aligned.tsv -d 5 -G M,F

1.2.6 Step 6: Merge multiple tables

rsx merge -o combined.tsv pop1_markers.tsv pop2_markers.tsv pop3_markers.tsv

Uses bounded-memory external sort (~500MB) for arbitrarily large datasets.

1.2.7 Step 7: Streaming PCA

rsx pca -t combined.tsv -o pca_results/ -d 5 -r 10

Produces eigenvalues, loadings, and summary in the output directory. PC1 typically separates males and females for sex-linked markers.

1.3 Output format

All outputs are tab-separated with an optional #source: comment line. The format is identical to the original C++ RADSex tool, so existing R scripts work without modification.

1.4 Memory guarantees

All commands operate in bounded memory regardless of input size:

Command

Memory

distrib, freq

O(nindividuals)

signif, subset

O(nindividuals)

map

O(genomeindex)

depth (small)

O(nmarkers* nind)

depth (> 2GB)

O(buffersize)

merge

O(buffersize)

pca

O(nindividuals2)

For 200 individuals and 75M markers, typical peak memory is < 500MB (except map which loads the minimap2 genome index).