Variance-Based Methods
Sobol’ sensitivity indices — decompose output variance into input contributions.
When to use: You want to know how much each input contributes to the uncertainty in the output. You can afford (d + 2) \times N model evaluations, where d is the number of factors and N is typically 1024–16384. Your model is a black box — no gradients needed.
Theory
For a square-integrable function f(\mathbf{x}) with independent inputs, the Sobol’ decomposition writes the total variance as a sum of partial variances:
\operatorname{Var}(Y) = \sum_i V_i + \sum_{i < j} V_{ij} + \cdots + V_{1,2,\ldots,d}
where V_i = \operatorname{Var}\!\big[\mathbb{E}(Y \mid X_i)\big] is the variance of the conditional expectation. The first-order index measures the fraction of variance explained by X_i alone:
S_i = \frac{V_i}{\operatorname{Var}(Y)} = \frac{\operatorname{Var}\!\big[\mathbb{E}(Y \mid X_i)\big]}{\operatorname{Var}(Y)}
The total-effect index captures X_i’s contribution including all interactions:
S_{Ti} = 1 - \frac{\operatorname{Var}\!\big[\mathbb{E}(Y \mid \mathbf{X}_{\sim i})\big]}{\operatorname{Var}(Y)} = \frac{\mathbb{E}\!\big[\operatorname{Var}(Y \mid \mathbf{X}_{\sim i})\big]}{\operatorname{Var}(Y)}
where \mathbf{X}_{\sim i} denotes all factors except X_i. If S_{Ti} - S_i > 0, the factor participates in interactions.
Sampling design
All four estimators below use the Saltelli cross-matrix design. Two independent N \times d base matrices A and B are drawn (Sobol’ quasi-random sequence by default). For each factor i, a hybrid matrix A_B^{(i)} is constructed: all columns from A except column i, which comes from B.
This produces N(d + 2) model evaluations: N for A, N for B, and N for each of d cross-matrices.
use salib::samplers::{SobolSampler, build_saltelli_matrix};
use salib::*;
let mut rng = RngState::from_seed([0u8; 32]);
let sampler = SobolSampler::minimal(2 * problem.dim());
// N = 8192 base samples, skip = false
let saltelli = build_saltelli_matrix(
&sampler, 8192, false, &mut rng
).unwrap();
Sample size: N = 1024 is the practical minimum for stable first-order estimates with d \leq 10. For total-effect indices or models with strong interactions, use N \geq 4096. Doubling N roughly halves the standard error.
Saltelli 2010
Saltelli et al. (2010) Comp. Phys. Comm. 181(2), 259–270. [bib]
The default estimator. First-order:
\hat{S}_i = \frac{\frac{1}{N}\sum_{j=1}^{N} f(B)_j \Big(f(A_B^{(i)})_j - f(A)_j\Big)}{\operatorname{Var}(Y)}
Total-effect:
\hat{S}_{Ti} = \frac{\frac{1}{2N}\sum_{j=1}^{N} \Big(f(A)_j - f(A_B^{(i)})_j\Big)^2}{\operatorname{Var}(Y)}
use salib::estimators::estimate_saltelli2010;
let indices = estimate_saltelli2010(&saltelli, |x| {
x[0].sin()
+ 7.0 * x[1].sin().powi(2)
+ 0.1 * x[2].powi(4) * x[0].sin()
});
println!("{indices}");
Verify against Ishigami closed-form (N = 8192, seed
[0u8; 32]):
Factor \hat{S}_i Analytic S_i \hat{S}_{Ti} Analytic S_{Ti} x_1 0.3143 0.3139 0.5559 0.5576 x_2 0.4427 0.4424 0.4420 0.4424 x_3 −0.0042 0.0000 0.2455 0.2437
Jansen 1999
Jansen (1999) Comp. Phys. Comm. 117(1–2), 35–43. [bib]
An alternative total-effect estimator with empirically better convergence for small N.
First-order (same as Saltelli 2010):
\hat{S}_i = \frac{\frac{1}{N}\sum_{j=1}^{N} f(B)_j \Big(f(A_B^{(i)})_j - f(A)_j\Big)}{\operatorname{Var}(Y)}
Total-effect (Jansen formulation):
\hat{S}_{Ti} = \frac{\frac{1}{2N}\sum_{j=1}^{N} \Big(f(B)_j - f(A_B^{(i)})_j\Big)^2}{\operatorname{Var}(Y)}
The difference from Saltelli is subtle: Jansen compares f(B) to f(A_B^{(i)}) instead of f(A) to f(A_B^{(i)}) in the total-effect formula. Both are consistent estimators; Jansen converges faster when the model has high-order interactions.
use salib::estimators::estimate_jansen;
let indices = estimate_jansen(&saltelli, |x| {
x[0].sin()
+ 7.0 * x[1].sin().powi(2)
+ 0.1 * x[2].powi(4) * x[0].sin()
});
Janon 2014
Janon et al. (2014) ESAIM: Probability and Statistics 18, 342–364. [bib]
Asymptotically efficient first-order estimator. Uses the cross-matrix outputs differently to reduce bias:
\hat{S}_i = \frac{\frac{1}{N}\sum_{j=1}^{N} f(A)_j\, f(A_B^{(i)})_j - \left(\frac{1}{2N}\sum_{j=1}^{N}\big(f(A)_j + f(A_B^{(i)})_j\big)\right)^2}{\frac{1}{2N}\sum_{j=1}^{N}\big(f(A)_j^2 + f(A_B^{(i)})_j^2\big) - \left(\frac{1}{2N}\sum_{j=1}^{N}\big(f(A)_j + f(A_B^{(i)})_j\big)\right)^2}
Janon’s estimator achieves the semiparametric efficiency bound for S_i — no other estimator based on the same design can have lower asymptotic variance. In practice the difference matters most for small N and models with near-zero first-order indices.
use salib::estimators::estimate_janon;
let indices = estimate_janon(&saltelli, |x| {
x[0].sin()
+ 7.0 * x[1].sin().powi(2)
+ 0.1 * x[2].powi(4) * x[0].sin()
});
Owen 2013
Owen (2013) ACM Trans. Model. Comp. Sim. 23(2), Article 11. [bib]
Three-matrix design that produces improved estimates for second-order interaction indices S_{ij}. Owen introduces a third base matrix C and defines cross-matrices A_B^{(i)} and A_C^{(i)} to separate first-order and interaction effects more cleanly.
Owen requires its own sampling design — the sampler dimension must be 3d (for the A, B, C matrices):
use salib::samplers::{SobolSampler, build_owen_matrix};
use salib::estimators::estimate_owen;
use salib::*;
let mut rng = RngState::from_seed([0u8; 32]);
// 3 * dim for the three-matrix design
let sampler = SobolSampler::minimal(3 * problem.dim());
let owen = build_owen_matrix(&sampler, 8192, &mut rng).unwrap();
let indices = estimate_owen(&owen, |x| {
x[0].sin()
+ 7.0 * x[1].sin().powi(2)
+ 0.1 * x[2].powi(4) * x[0].sin()
});
Given-Data Sobol’
Plischke et al. (2013) Eur. J. Oper. Res. 226(3), 536–550. [bib]
Estimate first-order Sobol’ indices from an existing dataset without a designed experiment. The method partitions the data by binning each input variable and estimates V_i from the between-bin variance of conditional means.
When to use: You have observational data (not from a designed experiment) and want sensitivity indices. You cannot re-run the model. The estimate is first-order only — no total-effect indices.
use salib::estimators::estimate_given_data_sobol;
use ndarray::Array2;
// x: (N, d) matrix of inputs, y: (N,) vector of outputs
let indices = estimate_given_data_sobol(x.view(), &y);
Caveat: Given-data estimates are biased when inputs are correlated. The method assumes independence — if your factors are dependent, the indices measure a blend of sensitivity and dependence. Consider Shapley effects for correlated inputs.
Choosing among estimators
| Estimator | S_i | S_{Ti} | Best for |
|---|---|---|---|
| Saltelli 2010 | yes | yes | General use. Start here. |
| Jansen 1999 | yes | yes | Better S_T convergence at small N. |
| Janon 2014 | yes | yes | Most efficient S_i at the semiparametric bound. |
| Owen 2013 | yes | yes | Second-order interactions S_{ij}. |
| Given-data | yes | no | Observational data, no designed experiment. |
If you’re unsure, use Saltelli 2010. It’s the most widely cited, has stable behavior across problem types, and gives you both S_i and S_{Ti}. Switch to Jansen if you’re budget-constrained on model evaluations, or to Janon if you need the tightest possible confidence intervals on first-order indices.