Quick Start ¶

Author:

Rohit Goswami

1 Quick Start ¶

1.1 Installation ¶

1.1.1 From source (recommended)¶

git clone https://github.com/HaoZeke/rsx-rs.git
cd rsx-rs
cargo build --release
# Binary at target/release/rsx

1.1.2 From GitHub releases ¶

Pre-built binaries are available for Linux (x86₆₄, aarch64), macOS (x86₆₄, arm64), and Windows from the Releases page.

1.1.3 Via pixi ¶

cd rsx-rs
pixi run build

1.2 Example workflow ¶

Given demultiplexed RAD-seq reads in reads/ and a population map:

# popmap.tsv
ind1    M
ind2    M
ind3    F
ind4    F

1.2.1 Step 1: Build markers table ¶

rsx process -i reads/ -o markers.tsv -T 4 -d 5

1.2.2 Step 2: Check marker frequencies ¶

rsx freq -t markers.tsv -o freq.tsv -d 5

1.2.3 Step 3: Compute sex-bias distribution ¶

rsx distrib -t markers.tsv -p popmap.tsv -o distrib.tsv -d 5 -G M,F

1.2.4 Step 4: Extract significant markers ¶

rsx signif -t markers.tsv -p popmap.tsv -o signif.tsv -d 5 -G M,F

1.2.5 Step 5: Map to reference genome ¶

rsx map -t markers.tsv -p popmap.tsv -g genome.fa -o aligned.tsv -d 5 -G M,F

1.2.6 Step 6: Merge multiple tables ¶

rsx merge -o combined.tsv pop1_markers.tsv pop2_markers.tsv pop3_markers.tsv

Uses bounded-memory external sort (~500MB) for arbitrarily large datasets.

1.2.7 Step 7: Streaming PCA ¶

rsx pca -t combined.tsv -o pca_results/ -d 5 -r 10

Produces eigenvalues, loadings, and summary in the output directory. PC1 typically separates males and females for sex-linked markers.

1.3 Output format ¶

All outputs are tab-separated with an optional #source: comment line. The format is identical to the original C++ RADSex tool, so existing R scripts work without modification.

1.4 Memory guarantees ¶

All commands operate in bounded memory regardless of input size:

Command	Memory
distrib, freq	O(n_individuals)
signif, subset	O(n_individuals)
map	O(genome_index)
depth (small)	O(n_markers* n_ind)
depth (> 2GB)	O(buffer_size)
merge	O(buffer_size)
pca	O(n_individuals²)

For 200 individuals and 75M markers, typical peak memory is < 500MB (except map which loads the minimap2 genome index).