In the comments to my post Does Sketching Work?, I was asked a very interesting question, which I will paraphrase as follows:
Is the sketch-and-solve least-squares solution
an unbiased estimate of the true least squares solution of
?
In this post, I will answer this question for the special case in which  is a Gaussian sketch, and I will also compute the expected least-squares residual
 is a Gaussian sketch, and I will also compute the expected least-squares residual  . Throughout this post, I will assume knowledge of sketching; see my previous two posts for a refresher if needed. For this post,
. Throughout this post, I will assume knowledge of sketching; see my previous two posts for a refresher if needed. For this post,  will have dimensions
 will have dimensions  and
 and  will have dimensions
 will have dimensions  , and these parameters are related
, and these parameters are related  .
.
I thought I knew the answer to this question—that sketch-and-solve suffers from inversion bias. Indeed, the fact I remembered is that
 (1)    ![Rendered by QuickLaTeX.com \[\expect\Big[\big((SA)^\top (SA)\big)^{-1}\Big] \ne \big(A^\top A\big)^{-1}\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0cd441af1378414d36a71a4405630eab_l3.png)
 in common use. The issue of the inversion bias (1) is discussed in this paper.
 in common use. The issue of the inversion bias (1) is discussed in this paper.
However, the inversion bias (1) does not imply that sketch-and-solve is biased. In fact, for a Gaussian sketch1Note that, typically, we use a sketch  where the variance of the entries is
 where the variance of the entries is  . The sketch-and-solve solution
. The sketch-and-solve solution  does not change under a scaling of the matrix
 does not change under a scaling of the matrix  . Thus, we are free to take the entries to have variance one.
. Thus, we are free to take the entries to have variance one. 
 (2)    ![Rendered by QuickLaTeX.com \[S \in \real^{\ell\times n} \quad \text{with } S_{ij} \sim \text{Normal}(0,1) \text{ iid},\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-4241ac9b0c6ffe780ecf8f59abbb6f2e_l3.png)
Theorem (Gaussian sketch-and-solve is unbiased). Let
be a Gaussian sketch (2) and let
be a full column-rank matrix. Then
(3)
That is, the sketch-and-solve solutionis an unbiased estimate for the least-squares solution
.
Here,  is the Moore–Penrose pseudoinverse, which effectuates the least-squares solution:
 is the Moore–Penrose pseudoinverse, which effectuates the least-squares solution: 
      ![Rendered by QuickLaTeX.com \[A^\dagger b = \operatorname*{argmin}_{x\in\real^d} \norm{Ax - b}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-758a2563a025ea5eb02093ee09e269a2_l3.png)
 (4)    ![Rendered by QuickLaTeX.com \[A^\dagger = (A^\top A)^{-1} A^\top,\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-54e18e841ca1d2908256c65ccb31637f_l3.png)
 with full column-rank.
 with full column-rank.
Let us prove this theorem. The Gaussian sketch (1) is orthogonally invariant: that is,  has the same distribution as
 has the same distribution as  for any orthogonal matrices
 for any orthogonal matrices  and
 and  . Thus, we are free to reparametrize. Let
. Thus, we are free to reparametrize. Let 
      ![Rendered by QuickLaTeX.com \[A = U\twobyone{\Sigma}{0}V^\top\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-b86bfeff6ec3786014519ea5b890b2b4_l3.png)
 . Make the changes of variables
. Make the changes of variables       ![Rendered by QuickLaTeX.com \[A \mapsto U^\top A V, \quad S \mapsto V^\top SU, \quad b \mapsto U^\top b.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-77b2e6a2a41be700f4503e57299c9dfb_l3.png)
 still has the same distribution (2) and
 still has the same distribution (2) and       ![Rendered by QuickLaTeX.com \[A = \twobyone{\Sigma}{0}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-662c1bf602a21154cf1e5355e6fdc115_l3.png)
 and
 and  conformally with
 conformally with  :
:       ![Rendered by QuickLaTeX.com \[b = \twobyone{b_1}{b_2}, \quad S = \onebytwo{S_1}{S_2}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-27fd1fd3e8f3f44834436bb2110fd7ba_l3.png)
 (5)    ![Rendered by QuickLaTeX.com \[A^\dagger b = \Sigma^{-1}b_1.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-b23ab2580fa2fdad9fcfddf3df225fbd_l3.png)
Now, we are ready to prove the claim (3). Observe that  . Begin by using the normal equations (4) to write out the sketch-and-solve solution
. Begin by using the normal equations (4) to write out the sketch-and-solve solution 
      ![Rendered by QuickLaTeX.com \[(SA)^\dagger (Sb) = [(SA)^\top (SA)]^{-1}(SA)^\top (Sb).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-52c84e0a183fe2e46f36f62446379edb_l3.png)
 . Thus,
. Thus,       ![Rendered by QuickLaTeX.com \[(SA)^\dagger (Sb) = [\Sigma S_1^\top S_1\Sigma]^{-1}\Sigma S_1^\top \onebytwo{S_1}{S_2}\twobyone{b_1}{b_2}.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-2800b69708b037d55eeeb98fa104dd82_l3.png)
      ![Rendered by QuickLaTeX.com \[(SA)^\dagger (Sb) = \Sigma^{-1}(S_1^\top S_1)^{-1} S_1^\top (S_1b_1 + S_2 b_2).\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-3f4f56224702b904c6bf72d965d20386_l3.png)
 (6)    ![Rendered by QuickLaTeX.com \[(SA)^\dagger (Sb) = \Sigma^{-1}b_1 + \Sigma^{-1} S_1^\dagger S_2b_2 = A^\dagger b + \Sigma^{-1} S_1^\dagger S_2b_2. \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-49f9435ad832dcae09cd3344a01c4b3e_l3.png)
 ,
,  and
 and  are independent and
 are independent and ![Rendered by QuickLaTeX.com \expect[S_2] = 0](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-57f5284f6ae7b9df499d484d49926dae_l3.png) . Thus, taking expectations, we obtain
. Thus, taking expectations, we obtain      ![Rendered by QuickLaTeX.com \[\expect[(SA)^\dagger (Sb)] = A^\dagger b + \Sigma^{-1} \expect[S_1^\dagger] \expect[S_2]b_2 = A^\dagger b.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-f45600345d37f0812e850b8c1a16e030_l3.png)
Note that the solution formula (6) holds for any sketching matrix  . We only use Gaussianity at the last step, where we invoke three properties (1)
. We only use Gaussianity at the last step, where we invoke three properties (1)  is mean zero; (2)
 is mean zero; (2)  is orthogonally invariant; and (3) conditional on the first
 is orthogonally invariant; and (3) conditional on the first  columns of
 columns of  , the last
, the last  columns of
 columns of  are mean-zero.
 are mean-zero.
The Least-Squares Residual for Gaussian Sketch-and-Solve
The solution formula (6) can be used to understand other properties of the Gaussian sketch-and-solve solution. In this section, we use (6) to understand the expected squared residual norm
      ![Rendered by QuickLaTeX.com \[\expect \norm{A\hat{x} - b}^2,\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-b66ecc55bd4f68677bc892deded4ed12_l3.png)
 is the sketch-and-solve solution. We write the expected residual norm as
 is the sketch-and-solve solution. We write the expected residual norm as       ![Rendered by QuickLaTeX.com \[\expect\norm{A\hat{x} - b}^2 = \expect\norm{\twobyone{b_1 + S_1^\dagger S_2b_2}{0} - \twobyone{b_1}{b_2}}^2 = \norm{b_2}^2 + \expect\norm{S_1^\dagger S_2b_2}^2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-a7dce4f3a10550b2a4c1b24039097db8_l3.png)
      ![Rendered by QuickLaTeX.com \[\expect\norm{A\hat{x} - b}^2 = \left(1+\frac{d}{\ell-d-1}\right)\norm{b_2}^2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-5fffa05a8417a7fa7a7c51d53c765f5f_l3.png)
 is the optimal least-squares residual
 is the optimal least-squares residual  . Thus, we have shown
. Thus, we have shown       ![Rendered by QuickLaTeX.com \[\expect\norm{A\hat{x} - b}^2 = \left(1+\frac{d}{\ell-d-1}\right)\min_{x\in\real^d} \norm{Ax-b}^2.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-0aa56378e18dd3b779e91d08c25c07e3_l3.png)
 larger than the optimal value. In particular,
 larger than the optimal value. In particular,       ![Rendered by QuickLaTeX.com \[\expect\norm{A\hat{x} - b}^2 = \left(1+\varepsilon\right)\min_{x\in\real^d} \norm{Ax-b}^2 \quad \text{when } \ell = \frac{d}{\varepsilon} + d + 1.\]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-423a327d2db81e2aed6aa072386518a2_l3.png)
Remark (Existing literature): When I originally wrote this post, I had not seen these results anywhere in the literature. Thanks to Michał Dereziński, who provided me with the following references: the following lecture notes of Mert Pilanci, equation 1 in this recent paper of Michał’s, and this recent survey by Michał and Michael Mahoney. I find it striking that these basic and beautiful results appear to only have been explicitly recorded very recently. Very big thanks also to Rob Webber for providing help in preparing this post.
Update (11/21/24): After this post was initially released, Mert Pilanci provided me with an earlier reference (2020); see also this paper for extensions. He also mentions an even earlier paper (2017) that does similar calculations in a statistical setting, but does not obtain exactly these formulas. See also this paper for exact and asymptomatically exact formulas for sketch-and-solve in the statistical setting.
![Rendered by QuickLaTeX.com \[\expect[(SA)^\dagger(Sb)] = A^\dagger b. \]](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-6c135e0524dca0ed8d764479c79be876_l3.png)