Constructing a weight scheme

CSE 599, Autumn 2023

Recall the definitions of approximate weights and weight schemes from the previous lecture.

Theorem 1: Suppose that $\varphi_1,\ldots,\varphi_m : \mathbb R\to \mathbb R_+$ are lower $\theta$-homogeneous and upper $u$-homogeneous with $u > \theta> 0$ and uniform constants $c, C > 0$. Then there is some $\alpha = \alpha(\theta,c,u,C) > 1$ such that for every choice of vectors $a_1,\ldots,a_m \in \mathbb R^n$ and $s > 0$, there is an $\alpha$-approximate weight at scale $s$.

This can be proved by considering critical points of the functional $U \mapsto \det(U)$ subject to the constraint $G(U) \leqslant s$, where

$$G(U) \mathrel{\mathop:}=\varphi_1(\|U a_1\|_2) + \cdots + \varphi_m(\|U a_m\|_2)\,,$$

analogous to how we showed the existence of Lewis weights.

Theorem 2: Suppose that $\varphi_1,\ldots,\varphi_m : \mathbb R\to \mathbb R_+$ are lower $\theta$-homogeneous and upper $u$-homogeneous with $4 > u > \theta> 0$ and uniform constants $c, C > 0$. Then for every choice of vectors $a_1,\ldots,a_m \in \mathbb R^n$, there is an $\alpha$-approximate weight scheme $\{ w_i^{(j)} : j \in \mathbb Z\}$.

Contractive algorithm

For a full-rank matrix $V$ with rows $v_1,\ldots,v_m \in \mathbb R^n$, define the $i$th leverage score of $V$ as $\label{eq:ls-def} \sigma_i(V) \mathrel{\mathop:}=\langle v_i, (V^{\top} V)^{-1} v_i\rangle\,.$

For a weight $w \in \mathbb R_{+}^m$ and $i \in \{1,\ldots,m\}$, define $\tau_i(w) \mathrel{\mathop:}=\frac{\sigma_i(W^{1/2} A)}{w_i} = \langle a_i, (A^{\top} W A)^{-1} a_i\rangle\,, \quad W \mathrel{\mathop:}=\mathrm{diag}(w_1,\ldots,w_m)\,,$ and denote $\tau(w) \mathrel{\mathop:}=(\tau_1(w),\ldots,\tau_m(w))$.

Fix a scale parameter $s > 0$ and define the iteration $\Lambda_s : \mathbb R_+^m \to \mathbb R_+^m$ by

$$\begin{equation}\label{eq:phi-iter} (\Lambda_s(w))_i \mathrel{\mathop:}=\frac{1}{s} \frac{\varphi_i(\sqrt{\tau_i(w)})}{\tau_i(w)}\,.\end{equation}$$

Write $\theta^k \mathrel{\mathop:}=\theta \circ \cdots \circ \theta$ for the $k$-fold composition of $\theta$. In this case where $\varphi_i(z)=|z|^p$ and $1 \leqslant p \leqslant 2$, it is known, for $s = 1$, starting from any $w_0 \in \mathbb R_{+}^m$, the sequence $\{\Lambda_1^k(w_0) : k \geqslant 1\}$ converges to the unique fixed point of $\varphi$, which are the corresponding $\ell_p$ Lewis weights.

Define now a metric $d$ on $\mathbb R_+^m$ by

$$\begin{aligned} d(u,w) &\mathrel{\mathop:}=\max \left\{ \left|\log \frac{u_i}{w_i}\right| : i = 1,\ldots,m \right\}.\end{aligned}$$

We note the following characterization.

Fact: A vector $w \in \mathbb R_+^m$ is an $\alpha$-approximate weight at scale $s$ if and only if $d(w,\Lambda_s(w)) \leqslant\log \alpha\,.$

First, we observe that $\tau$ is $1$-Lipschitz on $(\mathbb R_+^m, d)$. In the next proof, $\preceq$ denotes the ordering of two real, symmetric matrices in the Loewner order, i.e., $A \preceq B$ if and only if $B-A$ is positive semi-definite.

Lemma: For any $w,w’ \in \mathbb R_{+}^m$, it holds that $d(\tau(w), \tau(w’)) \leqslant d(w,w’)$.
Proof: Denote $W = \mathrm{diag}(w), W’ = \mathrm{diag}(w’)$, and $\alpha \mathrel{\mathop:}=\exp(d(w,w’))$. Then $\alpha^{-1} W \preceq W’ \preceq \alpha W$, therefore $\alpha^{-1} A^\top W A \preceq A^\top W’ A \preceq \alpha A^\top W A$, and by monotonicity of the matrix inverse in the Loewner order, $\alpha^{-1} (A^\top W A)^{-1} \preceq (A^\top W’ A)^{-1} \preceq \alpha (A^\top W A)^{-1}$. This implies $d(\tau(w),\tau(w’)) \leqslant\log \alpha$, completing the proof.

Proof of Theorem 2

Consider the map $\psi: \mathbb R_+^m \to \mathbb R_+^m$ whose $i$-th coordinate is defined as

$$\psi_i(x) \mathrel{\mathop:}=\frac{\varphi_i(\sqrt{x_i})}{x_i}\,.$$

Our assumptions on lower and upper-homogeneity give, for all $y_i \geqslant x_i$,

$$c \left(\frac{y_i}{x_i}\right)^{\theta/2-1} \leqslant\frac{\varphi_i(\sqrt{y_i})/y_i}{\varphi_i(\sqrt{x_i})/x_i} \leqslant C \left(\frac{y_i}{x_i}\right)^{u/2-1}\,,$$

yielding, for $C_1 \mathrel{\mathop:}=\max\{C, 1/c\}$,

$$\begin{equation}\label{eq:psi-contract} d(\psi(x),\psi(y)) \leqslant\max \left(\left|\frac{\theta}{2}-1\right|, \left|\frac{u}{2}-1\right|\right) d(x,y) + \log(C_1)\,.\end{equation}$$

Fix $s > 0$ and consider the mapping $\varphi: \mathbb R_+^m \to \mathbb R_+^m$ defined in \eqref{eq:phi-iter} Then for $u< 4$ and $\delta \mathrel{\mathop:}=\max \left(\left|\frac{\theta}{2}-1\right|, \left|\frac{u}{2}-1\right|\right) < 1$, in conjunction with \eqref{eq:psi-contract}, shows that

$$\begin{aligned} d(\Lambda_s(w), \Lambda_s(w')) < \delta\,d(w,w') + \log(C_1)\,. \label{eq:onestep}\end{aligned}$$

Applying this bound inductively, for any weight $w \in \mathbb R_{+}^m$ and $k \geqslant 1$, we have

$$\begin{equation}\label{eq:iter-contract0} d\left(\Lambda_s^{k}(w), \Lambda_s^{k+1}(w)\right) \leqslant\frac{\delta^k d(\Lambda_s(w),w) + \log C_1}{1-\delta}\,,\end{equation}$$

Now define $w^{(0)} \mathrel{\mathop:}=\varphi^{k}_{1}(1,\ldots,1)\,,$ where $k \geqslant 1$ is chosen large enough so that $d(w^{(0)}, \Lambda_{1}(w^{(0)})) \leqslant\frac{2 \log C_1}{1-\delta}$. From , one sees that $w^{(0)}$ is an $\alpha$-approximate weight at scale $1$ for $\alpha = C_1^{2/(1-\delta)}$.

Define inductively, for $j =1,2,\ldots$,

$$\begin{aligned} w^{(j)} &\mathrel{\mathop:}=\Lambda_{2^j}(w^{(j-1)}) \\ w^{(-j)} &\mathrel{\mathop:}=\Lambda_{2^{-j}}(w^{(1-j)})\,.\end{aligned}$$

Note that

$$\begin{aligned} d(\Lambda_{2^j}(w^{(j)}), w^{(j)}) &= d(\Lambda_{2^j}^2(w^{(j-1)}), \Lambda_{2^j}(w^{(j-1)})) \\ &\leqslant\delta d(\Lambda_{2^j}(w^{(j-1)}), w^{(j-1)}) + \log(C_1) \\ &\leqslant\delta d(\Lambda_{2^{j-1}}(w^{(j-1)}), w^{(j-1)}) + \delta \log(2) + \log(C_1)\,,\end{aligned}$$

where the last inequality uses $\Lambda_{2s}(w) = 2 \Lambda_s(w)$ for all $w \in \mathbb R_+^m$.

Therefore, by induction, $d(\Lambda_{2^j}(w^{(j)}), w^{(j)}) \leqslant\frac{2 \log(C_1) + \log 2}{1-\delta}$ for all $j > 0$. To see that the family of weights $\{ w^{(j)} : j \in \mathbb Z\}$ forms a weight scheme, note that

$$d(w^{(j)}, w^{(j-1)}) = d(\Lambda_{2^j}(w^{(j-1)}), w^{(j-1)}) \leqslant d(\Lambda_{2^{j}}(w^{(j-1)}), w^{(j-1)}) + \log 2\,,$$

thus $\{w^{(j)} : j \in \mathbb Z\}$ is an $\alpha$-approximate weight scheme for $\alpha = \frac{2 \log(2C_1)}{1-\delta}$, completing the proof.