Architecture¶
- Author:
Rohit Goswami
1 Architecture¶
rsx-rs follows the rgpot pattern: a Rust core library with auto-generated C headers, wrapped by a thin CLI binary.
1.1 Crate structure¶
rsx-rs/
rsx-core/ Rust library (staticlib + cdylib + lib)
src/
lib.rs Module declarations
status.rs C API error handling (status codes, thread-local errors)
stats.rs Chi-squared, Bonferroni, group bias, Cg formatter
popmap.rs Population map (individual -> group)
marker.rs Marker struct (sequence + depths + p-values)
markers_table.rs Streaming parser (crossbeam channels)
io/
seq_reader.rs FASTQ/FASTA reader (needletail)
table_io.rs TSV read/write, fast integer parsing
commands/
process.rs Parallel file ingestion (rayon + ahash)
distrib.rs Marker distribution between groups
signif.rs Significant marker extraction (two-pass Bonferroni)
depth.rs Per-individual read statistics
freq.rs Marker frequency distribution
map.rs Genome alignment (minimap2)
subset.rs Filtered marker extraction
c_api/
types.rs Opaque handles, constructors, destructors
commands.rs extern "C" wrappers for each command
rsx-cli/ Binary crate (clap subcommands)
1.2 C API contract¶
Following the metatensor pattern:
All public types are
#[repr(C)]– binary-compatible with CAll public functions are
extern "C"withcatch_unwindfor panic safetyStatus code first: every function returns
rsx_status_tThread-local errors: details via
rsx_last_error()Naming:
rsx_prefix,SCREAMING_SNAKE_CASEenums,_tsuffix structs
The C header is auto-generated by cbindgen:
cargo build --manifest-path rsx-core/Cargo.toml --features gen-header
# Produces rsx-core/include/rsx.h
1.3 Data flow¶
FASTQ files --[process]--> markers.tsv --[distrib/signif/freq/depth/subset]--> analysis.tsv
--[map + genome.fa]--> aligned.tsv
The process command is the only one that reads raw sequence files. All
other commands consume the markers depth table (TSV).
1.4 Streaming parser¶
The markers table parser uses a producer-consumer pattern:
Producer thread: reads TSV file in 64KB chunks, parses fields character-by-character (matching C++ approach), sends
Markerbatches through acrossbeam::channel::boundedchannelConsumer: receives batches via
recv()(blocking, no polling)
This replaces the C++ std::queue + std::mutex + sleep(10us) pattern.
1.5 Parallelism¶
process uses rayon::par_iter() over input files. Each file is processed
independently (count sequences with needletail, accumulate in ahash::AHashMap).
For >= 8 individuals, a DashMap enables lock-free concurrent insertion
(no sequential merge phase). Sequences are stored as 2-bit packed DNA
(4x key compression: 100bp -> 26 bytes).
1.6 External sort-merge (merge command)¶
The merge command handles datasets that exceed available RAM (75M+
unique sequences across hundreds of samples). Algorithm:
Read all input files streaming, buffer N entries in memory
When buffer full: sort by 2-bit packed sequence, write lz4 temp file
K-way merge sorted temp files using
BinaryHeap(min-heap)Coalesce entries with equal sequence keys, merge depth columns
Write TSV output with unified sample columns
Memory: O(buffersize), not O(totalsequences). Default buffer: 2M entries (~400MB). Temp files: lz4-compressed sorted chunks on disk.
1.7 Bounded-memory streaming (all commands)¶
All commands operate in bounded memory regardless of input size.
1.7.1 Two-pass streaming (signif, subset, map)¶
Commands requiring Bonferroni correction use two passes over the mmap’d table:
Pass 1: count
n_markers(fast path, no sequence parsing)Compute
corrected_threshold = threshold / n_markersPass 2: re-mmap same file (kernel page cache, zero I/O), compute p-values, write qualifying markers directly to output
Memory: O(nindividuals) for bitset masks + output buffer.
No Vec<Marker> accumulation.
The map command builds the minimap2 index between passes (~100-300MB),
then aligns and writes in pass 2.
1.7.2 External sort for depth¶
The depth command computes exact per-individual median via external
sort when the input exceeds 2GB:
Stream markers, emit
(individual_idx, depth)pairs to bufferWhen buffer full: sort by
(individual_idx, depth), write lz4 tempK-way merge: depths per individual are contiguous and sorted
Pick middle element for exact median
Memory: O(buffersize), default 200MB. Auto-detected by file size.
1.7.3 Streaming accumulators (distrib, freq)¶
distrib and freq use fixed-size accumulators (O(nindividuals2) and
O(nindividuals) respectively) that do not grow with nmarkers. These
are inherently streaming.
1.8 Statistics¶
Chi-squared with Yates continuity correction (exact port of C++
stats.cpp)P-value via
erfc(sqrt(chi2/2))(SymPy-derived identity, replaces gamma CDF)Sollya degree-40 minimax polynomial available for GPU (
fast_erfc_poly)Bonferroni correction:
p_corrected = min(1.0, p * n_markers)Float formatting via
Cgstruct: matches C++%g(6 significant digits)