Factor covariance backend¶
RMT-cleaned covariances (constant-residual-eigenvalue / eigenvalue clipping) and commercial factor risk models share the structure
$$\Sigma = D + U \Delta U^\top, \qquad D \succ 0 \text{ diagonal},\; U \in \mathbb{R}^{n \times k},\; k \ll n,$$
a diagonal matrix plus a low-rank term. The FactorCovariance backend exploits
this structure through the Woodbury identity: the critical line algorithm runs
in $O(n(k+m))$ memory and each turning point costs $O(n_F k^2 + k^3)$ instead
of $O((n_F+m)^3)$ — no $n \times n$ matrix is ever formed.
Usage¶
import numpy as np
from cvxcla import CLA, FactorCovariance
n, k = 10_000, 50
rng = np.random.default_rng(0)
covariance = FactorCovariance(
d=rng.uniform(0.1, 0.5, n), # idiosyncratic variances
u=rng.standard_normal((n, k)), # factor loadings
delta=rng.uniform(0.5, 2.0, k), # factor (co)variances, (k,) or (k, k)
)
frontier = CLA(
mean=rng.uniform(0.0, 0.1, n),
covariance=covariance, # any ndarray still works as before
lower_bounds=np.zeros(n),
upper_bounds=np.ones(n),
a=np.ones((1, n)),
b=np.ones(1),
).frontier
CLA accepts either a plain numpy covariance matrix (wrapped automatically
in DenseCovariance) or any object implementing the CovarianceOperator
protocol:
class CovarianceOperator(Protocol):
n: int
def matvec(self, x): ... # Sigma @ x
def solve_free(self, free, rhs): ... # Sigma[free][:, free]^{-1} @ rhs
def cross(self, free, x): ... # Sigma[free][:, ~free] @ x[~free]
RMT-cleaned covariances¶
Eigenvalue clipping at the Marchenko–Pastur edge produces exactly this structure: keeping the $k$ eigenpairs $(\lambda_i, v_i)$ above the edge and replacing the rest by their average $\bar{d}$ gives
$$\Sigma_{\text{clean}} = \bar{d}\, I + V_k (\Lambda_k - \bar{d} I) V_k^\top,$$
i.e. FactorCovariance(d=np.full(n, d_bar), u=v_k, delta=lambda_k - d_bar).
See the factor notebook for an end-to-end example: simulated
returns → Marchenko–Pastur threshold → factors → exact frontier.
Benchmark¶
Full frontier (all turning points) of random long-only factor-model problems,
$k = n/20$, budget constraint, run with experiments/factor_benchmark.py
(Apple Silicon, single seed; the dense column wraps the same problem in a
plain numpy matrix):
| n | k | turning points | dense (s) | factor (s) | speedup |
|---|---|---|---|---|---|
| 500 | 25 | 505 | 0.42 | 0.08 | 5x |
| 2000 | 100 | 2051 | 31.90 | 1.34 | 24x |
| 5000 | 250 | 5166 | 726.72 | 20.08 | 36x |
At $n = 20{,}000$, $k = 100$ the dense path would need 3.2 GB for the
covariance alone; the factor backend traces a frontier at that size in a
fraction of a second (see tests/test_operators.py::TestFactorLargeScale).
The dense path spends $O((n_F+m)^3)$ per turning point on the free-block solve (plus the $O(n^2)$ memory for the matrix itself), while the factor path solves the same system through the $k \times k$ Woodbury correction matrix.
References¶
- Perold, Large-Scale Portfolio Optimization, Management Science 30(10), 1984.
- Niedermayer & Niedermayer, Applying Markowitz's Critical Line Algorithm, 2010.
- Schmelzer, Stoll, Wolf, From Marchenko–Pastur to Woodbury: Direct Solvers for Long-Only Mean-Variance Portfolios, 2026.