Acceleration

SCS includes Anderson acceleration (AA), which can be used to speed up convergence. AA is a quasi-Newton method for the acceleration of fixed point iterations and can dramatically speed up convergence in practice, especially if higher accuracy solutions are desired. However, it can also cause severe instability of the solver and so should be used with caution. It is an open research question how to best implement AA in practice to ensure good performance across all problems and we welcome any contributions in that direction!

Mathematical details

The discussion here is taken from section 2 of our paper. Consider the problem of finding a fixed point of the function \(f: \mathbf{R}^n \rightarrow \mathbf{R}^n\), i.e.,

\[\text{Find } x \in \mathbf{R}^n \text{ such that } x = f(x).\]

In our case \(f\) corresponds to one step of Douglas-Rachford splitting and the iterate is the \(w^k\) vector, which converges to a fixed point of the DR operator. At a high level AA, from initial point \(x_0\) and max memory \(m\) (corresponding to the acceleration_lookback setting), works as follows:

\[\begin{split}\begin{array}{l} \text{For } k=0, 1, \dots \\ \quad \text{Set } m_k=\min\{m, k\} \\ \quad \text{Select weights } \alpha_j^k \text{ based on the last } m_k \text{ iterations satisfying } \sum_{j=0}^{m_k}\alpha_j^k=1 \\ \quad \text{Set } x^{k+1}=\sum_{j=0}^{m_k}\alpha_j^kf(x^{k-m_k+j}) \end{array}\end{split}\]

In other words, AA produces an iterate that is the linear combination of the last \(m_k + 1\) outputs of the map. Thus, the main challenge is in choosing the weights \(\alpha \in \mathbf{R}^{m_k+1}\). There are two ways to choose them, corresponding to type-I and type-II AA (named for the type-I and type-II Broyden updates). We shall present type-II first.

Type-II AA

Define the residual \(g: \mathbf{R}^n \rightarrow \mathbf{R}^n\) of \(f\) to be \(g(x) = x - f(x)\). Note that any fixed point \(x^\star\) satisfies \(g(x^\star) = 0\). In type-II AA the weights are selected by solving a small least squares problem.

\[\begin{split}\begin{array}{ll} \mbox{minimize} & \|\sum_{j=0}^{m_k}\alpha_j g(x^{k-m_k+j})\|_2^2\\ \mbox{subject to} & \sum_{j=0}^{m_k}\alpha_j=1, \end{array}\end{split}\]

More explicitly, we can reformulate the above as follows:

\[\begin{array}{ll} \mbox{minimize} & \|g_k-Y_k\gamma\|_2, \end{array}\]

with variable \(\gamma=(\gamma_0,\dots,\gamma_{m_k-1}) \in \mathbf{R}^{m_k}\). Here \(g_i=g(x^i)\), \(Y_k=[y_{k-m_k}~\dots~y_{k-1}]\) with \(y_i=g_{i+1}-g_i\) for each \(i\), and \(\alpha\) and \(\gamma\) are related by \(\alpha_0=\gamma_0\), \(\alpha_i=\gamma_i-\gamma_{i-1}\) for \(1\leq i\leq m_k-1\) and \(\alpha_{m_k}=1-\gamma_{m_k-1}\).

Assuming that \(Y_k\) is full column rank, the solution \(\gamma^k\) to the above is given by \(\gamma^k=(Y_k^\top Y_k)^{-1}Y_k^\top g_k\), and hence by the relation between \(\alpha^k\) and \(\gamma^k\), the next iterate of type-II AA can be written as

\[\begin{split}x^{k+1}&=f(x^k)-\sum_{i=0}^{m_k-1}\gamma_i^k\left(f(x^{k-m_k+i+1})- f(x^{k-m_k+i})\right)\\ &=x^k-g_k-(S_k-Y_k)\gamma^k\\ &=x^k-(I+(S_k-Y_k)(Y_k^\top Y_k)^{-1}Y_k^\top )g_k\\ &=x^k-B_kg_k,\end{split}\]

where \(S_k=[s_{k-m_k}~\dots~s_{k-1}]\), \(s_i=x^{i+1}-x^i\) for each \(i\), and

\[B_k=I+(S_k-Y_k)(Y_k^\top Y_k)^{-1}Y_k^\top\]

Observe that \(B_k\) minimizes \(\|B_k-I\|_F\) subject to the inverse multi-secant condition \(B_kY_k=S_k\), and hence can be regarded as an approximate inverse Jacobian of \(g\). The update of \(x^k\) can then be considered as a quasi-Newton-type update, with \(B_k\) being a generalized second (or type-II) Broyden’s update of \(I\) satisfying the inverse multi-secant condition.

Type-I AA

In the same spirit, we define type-I AA, in which we find an approximate Jacobian of \(g\) minimizing \(\|H_k-I\|_F\) subject to the multi-secant condition \(H_kS_k=Y_k\). Assuming that \(S_k\) is full column rank, we obtain (by symmetry) that

\[H_k=I+(Y_k-S_k)(S_k^\top S_k)^{-1}S_k^\top\]

and the update scheme is defined as

\[x^{k+1}=x^k-H_k^{-1}g_k\]

assuming \(H_k\) to be invertible. A direct application of the Woodbury matrix identity shows that

\[H_k^{-1}=I+(S_k-Y_k)(S_k^\top Y_k)^{-1}S_k^\top\]

where again we have assumed that \(S_k^\top Y_k\) is invertible. Notice that this explicit formula of \(H_k^{-1}\) is preferred in that the most costly step, inversion, is implemented only on a small \(m_k\times m_k\) matrix.

In SCS

In SCS both types of acceleration are available, though by default type-I is used since it tends to have better performance. If you wish to use AA then set the acceleration_lookback setting to a positive value (10 works well for many problems and is the default). This setting corresponds to \(m\), the maximum number of SCS iterates that AA will use to extrapolate to the new point. Set acceleration_lookback to 0 to disable AA entirely.

To select type-II acceleration set acceleration_type_1 to 0 (the default 1 selects type-I). Type-II is more numerically stable than type-I but typically slower; users switching to type-II usually also lower acceleration_regularization (e.g. 1e-12) since the default is tuned for type-I.

The setting acceleration_interval controls how frequently AA is applied. If acceleration_interval \(=k\) for some integer \(k \geq 1\) then AA is applied every \(k\) iterations (AA simply ignores the intermediate iterations). This has the benefit of making AA \(k\) times faster and approximating a \(k\) times larger memory, as well as improving numerical stability by ‘decorrelating’ the data. On the other hand, older iterates might be stale. More work is needed to determine the optimal setting for this parameter, but 10 appears to work well in practice and is the default.

The details about how the linear systems are solved and updated are abstracted away into the AA package. The current implementation uses a rank-revealing pivoted QR factorization (LAPACK geqp3) of the augmented matrix (with Tikhonov regularization folded in as extra rows), followed by rank truncation and a few steps of iterative refinement on the \(\gamma\) solve. This keeps the update stable as the \(S\) and \(Y\) columns become nearly linearly dependent near convergence. An SVD-based solve would be similarly rank-revealing but is substantially more expensive per iteration and has not been benchmarked here.

SCS reports detailed AA solve diagnostics in the aa_stats field of the returned Return information object. These include solve acceptances, rejection causes, the rank of the most recent AA solve, the most recent AA weight norm, and the regularization used in that solve.

Regularization

By default we also add a small amount of regularization to the matrices that are being inverted in the above expressions, ie, in the type-II update

\[(Y_k^\top Y_k)^{-1} \text{ becomes } (Y_k^\top Y_k + \epsilon I)^{-1}\]

for some small \(\epsilon > 0\), and similarly for the type-I update

\[(S_k^\top Y_k)^{-1} \text{ becomes } (S_k^\top Y_k + \epsilon I)^{-1}\]

which is equivalent to adding regularization to the \(S_k^\top S_k\) matrix before using the Woodbury matrix identity. The regularization ensures the matrices are invertible and helps stability. In practice type-I tends to require more regularization than type-II for good performance. The regularization shrinks the AA update towards the update without AA, since if \(\epsilon\rightarrow\infty\) then \(\gamma^\star = 0\) and the AA step reduces to \(x^{k+1} = f(x^k)\). Note that the regularization can be folded into the matrices by appending \(\sqrt{\epsilon} I\) to the bottom of \(S_k\) or \(Y_k\), which is useful when using a QR or SVD decomposition to solve the equations.

The setting acceleration_regularization controls \(\epsilon\). The default (1e-8) is tuned for type-I; type-II is typically run with a smaller value such as 1e-12.

Max \(\gamma\) norm

As the algorithm converges to the fixed point the matrices to be inverted can become ill-conditioned and AA can become unstable. In this case the \(\gamma\) vector can become very large. As a simple heuristic we reject the AA update and reset the AA state whenever \(\|\gamma\|_2\) is greater than max_weight_norm (eg, something very large like \(10^{10}\)).

Safeguarding

We also apply a safeguarding step to the output of the AA step. Explicitly, let \(x^k\) be the current iteration and let \(x_\mathrm{AA} = x^{k+1}\) be the output of AA. We reject the AA step if

\[\|x_\mathrm{AA} - f(x_\mathrm{AA}) \|_2 > \zeta \|x^k - f(x^k) \|_2\]

where \(\zeta\) is the safeguarding tolerance factor (safeguard_factor) and defaults to 1. In other words we reject the step if the norm of the residual after the AA step is larger than some amount (eg, if it increases the residual from the previous iterate). After rejecting a step we revert the iterate to \(x^k\) and reset the AA state.

Relaxation

In some works relaxation has been shown to improve performance. Relaxation replaces the final step of AA by mixing the map inputs and outputs as follows:

\[x^{k+1} = \beta \sum_{j=0}^{m_k}\alpha_j^k f(x^{k-m_k+j}) + (1-\beta) \sum_{j=0}^{m_k}\alpha_j^k x^{k-m_k+j}\]

where \(\beta\) is the acceleration_relaxation parameter, and \(\beta=1\) recovers vanilla AA. This can be computed using the matrices defined above using

\[x^{k+1} = \beta (f(x^k) - (S_k - Y_k) \gamma^k) + (1-\beta) (x^k - S_k \gamma^k)\]

Anderson acceleration API

For completeness, we document the full Anderson acceleration API below.

Typedefs

typedef scs_float aa_float
typedef scs_int aa_int
typedef struct ACCEL_WORK AaWork

Functions

AaWork *aa_init(aa_int dim, aa_int mem, aa_int min_len, aa_int type1, aa_float regularization, aa_float relaxation, aa_float safeguard_factor, aa_float max_weight_norm, aa_int ir_max_steps, aa_int verbosity)

Initialize Anderson Acceleration, allocates memory.

Parameters:
  • dim – the dimension of the variable for AA

  • mem – the memory (number of past iterations used) for AA

  • min_len – minimum number of past iterates required before AA begins producing updates. Must be >= 1 when mem > 0; if min_len exceeds the effective memory (min(mem, dim)) it is clamped down, mirroring how mem is clamped to dim for rank stability. Set min_len == mem to delay AA until the memory is full (stable default for large mem); set min_len == 1 to start extrapolating from the very first residual pair (useful when mem is large but you still want early acceleration). Ignored when mem == 0.

  • type1 – if True use type 1 AA, otherwise use type 2

  • regularization – Tikhonov regularization for the AA least-squares system. Three modes, selected by sign: > 0 : problem-scaled, r = regularization * ||A||_F ||Y||_F. Type-I: 1e-8 works well; Type-II: more stable, 1e-12 often fine. < 0 : pinned absolute, r = -regularization (no Frobenius scaling — useful when the problem scale is known). = 0 : no regularization. Only non-finite values (NaN/Inf) are rejected.

  • relaxation – float \in [0,2], mixing parameter (1.0 is vanilla)

  • safeguard_factor – factor that controls safeguarding checks larger is more aggressive but less stable

  • max_weight_norm – float, maximum norm of AA weights

  • ir_max_steps – max iterative refinement passes on the γ solve. 0 disables IR. Each step is O(mem²) and loops until the correction stops contracting, so on well-conditioned problems only one step runs regardless of this cap. Raise it (e.g. 5) for ill-conditioned systems where more digits can be recovered; lower it for tighter cost bounds.

  • verbosity – if greater than 0 prints out various info

Returns:

pointer to AA workspace

aa_float aa_apply(aa_float *f, const aa_float *x, AaWork *a)

Apply Anderson Acceleration. The usage pattern should be as follows:

  • for i = 0 .. N:

    • if (i > 0): aa_apply(x, x_prev, a)

    • x_prev = x.copy()

    • x = F(x)

    • aa_safeguard(x, x_prev, a) (optional but helps stability)

    Here F is the map we are trying to find the fixed point for. We put the AA before the map so that any properties of the map are maintained at the end. Eg if the map contains a projection onto a set then the output is guaranteed to be in the set.

Parameters:
  • f – output of map at current iteration, overwritten with AA output

  • x – input to map at current iteration

  • a – workspace from aa_init

Returns:

(+ or -) norm of AA weights vector. If positive then update was accepted and f contains new point, if negative then update was rejected and f is unchanged

aa_int aa_safeguard(aa_float *f_new, aa_float *x_new, AaWork *a)

Apply safeguarding.

This step is optional but can improve stability.

Parameters:
  • f_new – output of map after AA step

  • x_new – AA output that is input to the map

  • a – workspace from aa_init

Returns:

0 if AA step is accepted otherwise -1, if AA step is rejected then this overwrites f_new and x_new with previous values

void aa_finish(AaWork *a)

Finish Anderson Acceleration, clears memory.

Parameters:

a – AA workspace from aa_init

void aa_reset(AaWork *a)

Reset Anderson Acceleration.

Resets AA as if at the first iteration, reuses original memory allocations. Does not clear lifetime diagnostic counters; use aa_get_stats after reset to read them, or just re-init the workspace for a clean slate.

Parameters:

a – AA workspace from aa_init

AaStats aa_get_stats(const AaWork *a)

Return lifetime diagnostic counters.

Use for post-hoc diagnosis of why AA is or isn’t accelerating a given fixed-point iteration — e.g. n_reject_weight_cap rising suggests loosening max_weight_norm or raising regularization; n_safeguard_reject rising suggests tuning safeguard_factor or mem. Do not call with a == NULL.

Parameters:

a – AA workspace from aa_init (must be non-NULL).