# Non-identity DR Scaling

In this note we derive the update equations when using a non-identity scaling. Standard Douglas-Rachford splitting applied to the SCS problem is described in the Algorithm page. Now consider modifying DR splitting to use a diagonal matrix \(R\) instead of \(I\). This is useful because the matrix \(R\) can be selected to provide better convergence in practice. The algorithm then becomes:

which yields \(w^k \rightarrow u^\star + R^{-1} \mathcal{Q}(u^\star)\) where \(0 \in \mathcal{Q}(u^\star) + N_{\mathcal{C}_+}(u^\star)\).

This changes the first two steps of the procedure. The linear projection and the cone projection (explained next).

## Cone projection

Note that \(R\) has to be chosen so that the cone projection is preserved. To do this we ensure that the entries of \(R > 0\) corresponding to each cone are constant within that cone (with the box cone being a slight exception to this). This means that R does not affect the cone projection in any way, since for sub-cone \(\mathcal{K}\):

In other words, \(R\) is selected such that

## Selecting \(R\)

For SCS we take

where \(I_n\) is \(n \times n\) identity, \(\rho_x \in \mathbf{R}\),
\(\rho_y \in \mathbf{R}^m\) and \(d \in \mathbf{R}\). The \(\rho_y\)
term includes the effect of the parameter `scale`

, which is updated
heuristically to improve convergence. Basically

with the only difference being that each cone can modify this relationship slightly (but currently only the zero cone does, which sets \(\rho_y = 1 / (1000\ \mathrm{scale})\)). So as scale increases, \(\rho_y\) decreases and vice-versa.

The quantity \(\rho_x\) is determined by the setting value
`rho_x`

. The quantity \(\rho_y\) is determined by the setting value `scale`

but is updated adaptively using the techniques
described in Dynamic scale updating. Finally, \(d\)
is determined by `TAU_FACTOR`

in the code defined in `glbopts.h`

.

## Root-plus function

Finally, the `root_plus`

function is modified to be the solution
of the following quadratic equation:

where \(R_{-1}\) corresponds to the first \(n+m\) entries of \(R\).
Other than when computing \(\kappa\) (which does not affect the algorithm)
this is the *only* place where \(d\) appears, so we have a lot of
flexibility in how to choose it and it can even change from iteration to
iteration. It is an open question on how best to select this parameter.

## Moreau decomposition

The projection onto a cone under a diagonal scaling also satisfies a Moreau-style decomposition identity, as follows:

where \(\Pi_\mathcal{C}^R\) denotes the projection onto convex cone \(\mathcal{C}\) under the \(R\)-norm, which is defined as

This identity is useful when deriving cone projection routines, though most cones are either invariant to this or we enforce that \(R\) is constant across them. Note that the two components of the decomposition are \(R\)-orthogonal.

## Dual vector

In order to get the dual vector \(v^k\) that contains \(s^k\) and \(\kappa^k\) we use:

and we have

by Moreau, and finally note that \(v^k \perp u^k\) from the fact that the Moreau decomposition is \(R\)-orthogonal.

## Dynamic scale updating

The choice of the `scale`

parameter can have a large impact on the
performance of the algorithm and the optimal choice is highly problem
dependent. SCS can dynamically adjust the `scale`

parameter
on the fly via a heuristic procedure that can substantially improve convergence
in practice. This procedure is enabled by the `adaptive_scale`

setting. The procedure attempts to balance the convergence
rate of the primal residual with the dual residual. Loosely speaking, the
`scale`

parameter will be increased if the primal residual is much larger
than the dual and decreased if the opposite is true.

Specifically, at iteration \(k\) consider the case where \(l\)
iterations have elapsed since the last update of the `scale`

parameter,
and denote by \((x, y, \tau) = u^k\) and \((0, s, \kappa) = v^k\), and
the *relative* residuals as

where by default we use the \(\ell_\infty\) norm for these quantities,
but can be changed using the `SCALE_NORM`

constant in
`include/glbopts.h`

.
Now consider

ie, \(\beta\) corresponds to the geometric mean of the ratio of the relative
residuals across the last \(l\) iterations. If this number is larger than a
constant (eg, 3) or smaller than another constant (eg, 1/3) *and* if sufficient
iterations have passed since the last update (eg, 100, as determined by
`RESCALING_MIN_ITERS`

) then an update of the `scale`

parameter is
triggered:

The presence of the square root is to prevent over-shooting the ‘optimal’ scale parameter, which could lead to oscillation.

Note that if the linear system is being solved using a
direct method, then updating the `scale`

parameter will require a new
factorization of the perturbed matrix, so is somewhat expensive for larger
problems and should be done sparingly. Also, since the changing the
`scale`

changes the operator we are using in DR splitting we also need to
perform a reset of the Anderson acceleration.