Internals
How salib guarantees bit-deterministic results regardless of thread count, and the engineering decisions behind the parallel implementation.
Bit-determinism
salib produces identical floating-point results for a given seed regardless of how many threads are available. This is not approximate reproducibility — it is exact, bitwise identity.
Most parallel numerical libraries sacrifice determinism because floating-point addition is not associative: (a + b) + c \neq a + (b + c) in IEEE 754. When a thread pool splits work differently across runs, the reduction order changes, and the sum changes.
salib avoids this by never allowing the thread pool to determine reduction order.
Tree-structured reductions
All parallel reductions in salib use a fixed binary tree structure. Given N elements:
- Partition into fixed-size leaves (determined by N, not by thread count)
- Reduce each leaf sequentially
- Merge leaf results pairwise up the tree, always left-to-right
The tree shape is determined entirely by N. Whether rayon uses 1 thread or 128, the same pairs get merged in the same order. The result is bitwise identical.
\text{reduce}([a, b, c, d, e, f, g, h]) = \text{merge}\!\Big(\text{merge}\big(\text{sum}(a,b), \text{sum}(c,d)\big),\, \text{merge}\big(\text{sum}(e,f), \text{sum}(g,h)\big)\Big)
This applies to every accumulation in the library: variance estimates, mean computations, Fourier coefficient sums, KDE evaluations.
The rayon contract
salib uses rayon behind the parallel feature flag (on by default). The contract:
- rayon decides which threads run which work. salib does not pin tasks to threads or assume any scheduling order.
- salib decides which values get combined with which. The reduction tree is fixed before rayon sees it.
- Disabling
parallelproduces identical results. The serial fallback uses the same tree structure with the same merge order. The only difference is wall-clock time.
# Serial-only build (identical results, single-threaded)
[dependencies]
salib = { version = "0.1", default-features = false, features = ["estimators", "samplers"] }
RNG architecture
All randomness flows through RngState, a wrapper around ChaCha20. The design:
- Seed → RNG → samples is a pure function. Same seed, same samples, always.
- Splitting: when parallel work needs independent streams,
RngState::split()derives child RNGs deterministically from the parent state. The split is positional — child k always gets the same stream regardless of thread assignment. - No thread-local RNG. Every random draw comes from a specific, deterministic position in the ChaCha stream.
let mut rng = RngState::from_seed([0u8; 32]);
// These two calls always produce the same samples,
// regardless of thread count or scheduling:
let a = sampler.unit_sample(1024, &mut rng);
let b = sampler.unit_sample(1024, &mut rng);
Variance computation
Numerical stability matters. salib uses Welford’s online algorithm for single-pass variance, and the parallel merge variant (Chan et al., 1979) for combining partial results:
\bar{x}_{AB} = \frac{n_A \bar{x}_A + n_B \bar{x}_B}{n_A + n_B}
M_{2,AB} = M_{2,A} + M_{2,B} + \frac{(\bar{x}_B - \bar{x}_A)^2 \cdot n_A \cdot n_B}{n_A + n_B}
This avoids catastrophic cancellation in the naive \mathbb{E}[X^2] - \mathbb{E}[X]^2 formula and composes correctly in the binary reduction tree.
Content hashing
Problem::content_hash() and Sampler::config_hash() produce SHA-256 digests of the problem/sampler configuration. These serve as cache keys and reproducibility checksums — if two runs have the same content hash and the same RNG seed, they produce bitwise identical results.
The hash is computed over a canonical JSON serialization of the configuration, so it is stable across Rust compiler versions and struct field reordering.
What is NOT guaranteed
- Cross-platform determinism. Different CPU architectures may produce different results due to differences in FMA (fused multiply-add) availability. salib is deterministic within a given binary on a given platform.
- Cross-version determinism. A future version of salib may change algorithms or reduction order. The guarantee is within a single version.
- Determinism across feature flags. Enabling
surrogateorshapleydoes not change the behavior of core estimators, but the surrogate methods themselves (PCE, HDMR) may use different internal reduction strategies.