{"id":648,"date":"2022-07-26T19:42:47","date_gmt":"2022-07-26T19:42:47","guid":{"rendered":"http:\/\/www.ethanepperly.com\/?p=648"},"modified":"2024-01-28T02:54:15","modified_gmt":"2024-01-28T02:54:15","slug":"dont-solve-the-normal-equations","status":"publish","type":"post","link":"https:\/\/www.ethanepperly.com\/index.php\/2022\/07\/26\/dont-solve-the-normal-equations\/","title":{"rendered":"Don&#8217;t Solve the Normal Equations"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<p>The (ordinary) <a href=\"https:\/\/en.wikipedia.org\/wiki\/Least_squares\">linear least squares problem<\/a> is as follows: given an <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-7c060ad3084ad1de4a3ce67946c1e82a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#109;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#110;\" title=\"Rendered by QuickLaTeX.com\" height=\"9\" width=\"48\" style=\"vertical-align: 0px;\"\/> matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/> and a vector <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-3ab6f6c64ce17f786a81f8cc8fdcec90_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#98;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: 0px;\"\/> of length <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-6d890b0f5af56bc2bde465a9af2fd218_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#109;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"15\" style=\"vertical-align: 0px;\"\/>, find the vector <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> such that <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-3da95081ac78cee0692e52bb881df16c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"23\" style=\"vertical-align: 0px;\"\/> is as close to <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-3ab6f6c64ce17f786a81f8cc8fdcec90_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#98;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: 0px;\"\/> as possible, when measured using the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Norm_(mathematics)#Euclidean_norm\">two-norm<\/a> <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-ad23e0537ad74792484596058eecbc39_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#124;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#124;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"27\" style=\"vertical-align: -5px;\"\/>. That is, we seek to <\/p>\n\n\n<p><p class=\"ql-center-displayed-equation\" style=\"line-height: 68px;\"><span class=\"ql-right-eqno\"> (1) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-52e584016399c0e0738dc1aea918936a_l3.png\" height=\"68\" width=\"559\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; &#92;&#109;&#98;&#111;&#120;&#123;&#102;&#105;&#110;&#100;&#32;&#125;&#32;&#120;&#32;&#92;&#105;&#110;&#92;&#109;&#97;&#116;&#104;&#98;&#98;&#123;&#82;&#125;&#94;&#110;&#32;&#92;&#109;&#98;&#111;&#120;&#123;&#32;&#115;&#117;&#99;&#104;&#32;&#116;&#104;&#97;&#116;&#32;&#125;&#92;&#124;&#32;&#98;&#32;&#45;&#32;&#65;&#120;&#32;&#92;&#124;&#94;&#50;&#32;&#61;&#32;&#92;&#115;&#117;&#109;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#109;&#32;&#92;&#108;&#101;&#102;&#116;&#40;&#98;&#95;&#105;&#32;&#45;&#32;&#92;&#115;&#117;&#109;&#95;&#123;&#106;&#61;&#49;&#125;&#94;&#110;&#32;&#65;&#95;&#123;&#105;&#106;&#125;&#32;&#120;&#95;&#106;&#32;&#92;&#114;&#105;&#103;&#104;&#116;&#41;&#94;&#50;&#32;&#92;&#109;&#98;&#111;&#120;&#123;&#32;&#105;&#115;&#32;&#109;&#105;&#110;&#105;&#109;&#105;&#122;&#101;&#100;&#125;&#46; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p><\/p>\n\n\n<p>From this equation, the name &#8220;least squares&#8221; is self-explanatory: we seek <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> which minimizes the sum of the squared discrepancies between the entries of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-3ab6f6c64ce17f786a81f8cc8fdcec90_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#98;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: 0px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-3da95081ac78cee0692e52bb881df16c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"23\" style=\"vertical-align: 0px;\"\/>.<\/p>\n\n\n\n<p>The least squares problem is ubiquitous in science, engineering, mathematics, and statistics. If we think of each row <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-4c0a2a967894e2fd9b2da00447811f96_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#97;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"14\" style=\"vertical-align: -3px;\"\/> of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/> as an input and its corresponding entry <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-1cb360b29aad3a73e5b822eaf397e286_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#98;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"13\" style=\"vertical-align: -3px;\"\/> of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-3ab6f6c64ce17f786a81f8cc8fdcec90_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#98;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"8\" style=\"vertical-align: 0px;\"\/> as an output, then the solution <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> to the least squares model gives the coefficients of a <em>linear model<\/em> for the input\u2013output relationship. Given a new previously unseen input <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-af25c881d86db96b49ba6f37adc88e00_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#97;&#95;&#123;&#92;&#114;&#109;&#32;&#110;&#101;&#119;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"33\" style=\"vertical-align: -3px;\"\/>, our model predicts the output <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-d7165843caae3ec84f431766fdae6842_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#98;&#95;&#123;&#92;&#114;&#109;&#32;&#110;&#101;&#119;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"32\" style=\"vertical-align: -3px;\"\/> is approximately <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-913cc83a17b67a7f2919a46bd188f110_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#98;&#95;&#123;&#92;&#114;&#109;&#32;&#110;&#101;&#119;&#125;&#32;&#92;&#97;&#112;&#112;&#114;&#111;&#120;&#32;&#97;&#95;&#123;&#92;&#114;&#109;&#32;&#110;&#101;&#119;&#125;&#94;&#92;&#116;&#111;&#112;&#32;&#120;&#32;&#61;&#32;&#92;&#115;&#117;&#109;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#110;&#32;&#120;&#95;&#105;&#32;&#40;&#97;&#95;&#123;&#92;&#114;&#109;&#32;&#110;&#101;&#119;&#125;&#41;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"237\" style=\"vertical-align: -5px;\"\/>. The vector <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> consists of coefficients for this linear model. The least squares solution satisfies the property that the average squared difference between the output <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-1cb360b29aad3a73e5b822eaf397e286_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#98;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"13\" style=\"vertical-align: -3px;\"\/> and the prediction <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-840ad08c7b046198c2ac9c7da6cb6a24_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#97;&#95;&#105;&#94;&#92;&#116;&#111;&#112;&#32;&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"31\" style=\"vertical-align: -5px;\"\/> is as small as it could possibly be for all choices of coefficient vectors <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/>.<\/p>\n\n\n\n<p>How do we solve the least squares problem? A classical solution approach, ubiquitous in textbooks, is to solve a system of linear equations known as the <em><a href=\"https:\/\/mathworld.wolfram.com\/NormalEquation.html\">normal equations<\/a><\/em>. The normal equations associated with the least squares problem (1) are given by<\/p>\n\n\n<p><p class=\"ql-center-displayed-equation\" style=\"line-height: 17px;\"><span class=\"ql-right-eqno\"> (2) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-0eac03bc149531897830f6db4e778b9a_l3.png\" height=\"17\" width=\"111\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; &#65;&#94;&#92;&#116;&#111;&#112;&#32;&#65;&#32;&#92;&#44;&#120;&#32;&#61;&#32;&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#98;&#46; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p><\/p>\n\n\n<p>This system of equations always has a solution. If <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/> has <a href=\"https:\/\/en.wikipedia.org\/wiki\/Rank_(linear_algebra)#Main_definitions\">full column-rank<\/a>, then <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-841e32306afba64e7ef539a005755747_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"38\" style=\"vertical-align: 0px;\"\/> is invertible and the unique least squares solution to (1) is given by <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-8e6f07c32a2f2242e54f19f0a3b5f604_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#40;&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#65;&#41;&#94;&#123;&#45;&#49;&#125;&#32;&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#98;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"102\" style=\"vertical-align: -5px;\"\/>. We assume that <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/> has full column-rankQ for the rest of this discussion. To solve the normal equations in software, we compute <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-841e32306afba64e7ef539a005755747_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"38\" style=\"vertical-align: 0px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-48456540bdac6f8640d2a107dc79d98a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#98;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"33\" style=\"vertical-align: 0px;\"\/> and solve (2) using a linear solver like <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/mldivide.html\">MATLAB&#8217;s &#8220;\\&#8221;<\/a>.<sup class=\"modern-footnotes-footnote \" data-mfn=\"1\" data-mfn-post-scope=\"000000000000057f0000000000000000_648\"><a href=\"javascript:void(0)\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000000000057f0000000000000000_648-1\">1<\/a><\/sup><span id=\"mfn-content-000000000000057f0000000000000000_648-1\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"1\">Even better, we could us a <a href=\"https:\/\/wikipedia.org\/wiki\/Cholesky_decomposition\">Cholesky decomposition<\/a> since the matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-841e32306afba64e7ef539a005755747_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"38\" style=\"vertical-align: 0px;\"\/> is <a href=\"https:\/\/wikipedia.org\/wiki\/Positive-definite_matrices\">positive definite<\/a>.<\/span> (As is generally true in matrix computations, it is <a href=\"http:\/\/gregorygundersen.com\/blog\/2020\/12\/09\/matrix-inversion\/\">almost never a good idea<\/a> to explicitly form the inverse of the matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-841e32306afba64e7ef539a005755747_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"38\" style=\"vertical-align: 0px;\"\/>, or indeed any matrix.) We also can solve the normal equations using an iterative method like <a href=\"https:\/\/wikipedia.org\/wiki\/Conjugate_gradient_method\">(preconditioned) conjugate gradient<\/a>.<\/p>\n\n\n\n<p>The purpose of the article is to advocate against the use of the normal equations for solving the least squares problems, at least in most cases. So what&#8217;s wrong with the normal equations? The problem is not that the normal equations aren&#8217;t mathematically correct. Instead, the problem is that the normal equations often lead to <em>poor accuracy<\/em> for the least squares solution using computer arithmetic.<\/p>\n\n\n\n<p>Most of the time when using computers, we store real numbers as <em><a href=\"https:\/\/wikipedia.org\/wiki\/Floating-point_number\">floating point numbers<\/a><\/em>.<sup class=\"modern-footnotes-footnote \" data-mfn=\"2\" data-mfn-post-scope=\"000000000000057f0000000000000000_648\"><a href=\"javascript:void(0)\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000000000057f0000000000000000_648-2\">2<\/a><\/sup><span id=\"mfn-content-000000000000057f0000000000000000_648-2\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"2\">One can represent rational numbers on a computer as fractions of integers and operations can be done exactly. However, this is prone to gross inefficiencies as the number of digits in the rational numbers can grow to be very large, making the storage and time to solve linear algebra problems with rationals dramatically more expensive. For these reasons, the vast majority of numerical computations use floating point numbers which store only a finite number of digits for any given real number. In this model, except for extremely rare circumstances, rounding errors during arithmetic operations are a fact of life.<\/span> At a coarse level, the right model to have in your head is that real numbers on a computer are stored in scientific notation with only 16 decimal digits after the decimal point.<sup class=\"modern-footnotes-footnote \" data-mfn=\"3\" data-mfn-post-scope=\"000000000000057f0000000000000000_648\"><a href=\"javascript:void(0)\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000000000057f0000000000000000_648-3\">3<\/a><\/sup><span id=\"mfn-content-000000000000057f0000000000000000_648-3\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"3\">This is a simplification in multiple ways. First, computers store numbers in binary and thus, rather than storing 16 decimal digits after the decimal point, they store <a href=\"https:\/\/en.wikipedia.org\/wiki\/Double-precision_floating-point_format#IEEE_754_double-precision_binary_floating-point_format:_binary64\">52 binary digits<\/a>. This amounts to roughly 16 decimal digits. Secondly, there are different formats for storing real numbers as floating point on a computer with different amounts of stored digits. The widely used IEEE double precision format has about 16 decimal digits of accuracy; the IEEE single precision format has roughly 8.<\/span> When two numbers are added, subtracted, multiplied, and divided, the answer is computed and then rounded to 16 decimal digits; any extra digits of information are thrown away. Thus, the result of our arithmetic on a computer is the true answer to the arithmetic problem plus a small <strong>rounding error<\/strong>. These rounding errors are small individually, but solving an even modestly sized linear algebra problem requires thousands of such operations. Making sure many small errors don&#8217;t pile up into a big error is part of the subtle art of numerical computation.<\/p>\n\n\n\n<p>To make a gross simplification, if one solves a system of linear equations <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b13422a1f715d37ede9c837a47d0ca76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;&#120;&#32;&#61;&#32;&#99;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"61\" style=\"vertical-align: 0px;\"\/> on a computer using a well-designed piece of software, one obtains an approximate solution <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-aaab1105f322d56886350f3bdc22c2b8_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"11\" style=\"vertical-align: 0px;\"\/> which is, after accounting for the accumulation of rounding errors, close to <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/>. But just how close the computed solution <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-aaab1105f322d56886350f3bdc22c2b8_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"11\" style=\"vertical-align: 0px;\"\/> and the true solution <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> are depends on how &#8220;nice&#8221; the matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a038471f70ce2efa1e2bb1ab05e0a7ce_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/> is. The &#8220;niceness&#8221; of a matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a038471f70ce2efa1e2bb1ab05e0a7ce_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/> is quantified by a quantity known as the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Condition_number#Matrices\">condition number<\/a> of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a038471f70ce2efa1e2bb1ab05e0a7ce_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/>, which we denote <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-7736411604502832ce97ffb9ce7aca7d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#77;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"42\" style=\"vertical-align: -5px;\"\/>.<sup class=\"modern-footnotes-footnote \" data-mfn=\"4\" data-mfn-post-scope=\"000000000000057f0000000000000000_648\"><a href=\"javascript:void(0)\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000000000057f0000000000000000_648-4\">4<\/a><\/sup><span id=\"mfn-content-000000000000057f0000000000000000_648-4\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"4\">In fact, there are multiple definitions of the condition number depending on the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Norm_(mathematics)\">norm<\/a> which one uses the measure the sizes of vectors. Since we use the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Norm_(mathematics)#Euclidean_norm\">2-norm<\/a>, the appropriate 2-norm condition number <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-7736411604502832ce97ffb9ce7aca7d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#77;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"42\" style=\"vertical-align: -5px;\"\/> is the ratio <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-5ea33d020c3a65245c33280c3a807e3c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#77;&#41;&#32;&#61;&#32;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#123;&#92;&#114;&#109;&#32;&#109;&#97;&#120;&#125;&#40;&#77;&#41;&#47;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#123;&#92;&#114;&#109;&#32;&#109;&#105;&#110;&#125;&#40;&#77;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"211\" style=\"vertical-align: -5px;\"\/> of the largest and smallest <a href=\"https:\/\/en.wikipedia.org\/wiki\/Singular_value\">singular values<\/a> of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a038471f70ce2efa1e2bb1ab05e0a7ce_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/>.<\/span> As a rough rule of thumb, the relative error between <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-aaab1105f322d56886350f3bdc22c2b8_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"11\" style=\"vertical-align: 0px;\"\/> is roughly bounded as <\/p>\n\n\n<p><p class=\"ql-center-displayed-equation\" style=\"line-height: 43px;\"><span class=\"ql-right-eqno\"> (3) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-3669bc90f7d9e49234d99eace8d33543_l3.png\" height=\"43\" width=\"197\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#124;&#32;&#92;&#104;&#97;&#116;&#123;&#120;&#125;&#32;&#45;&#32;&#120;&#32;&#92;&#124;&#125;&#123;&#92;&#124;&#120;&#92;&#124;&#125;&#32;&#92;&#108;&#101;&#115;&#115;&#97;&#112;&#112;&#114;&#111;&#120;&#32;&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#77;&#41;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#49;&#48;&#94;&#123;&#45;&#49;&#54;&#125;&#46; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p><\/p>\n\n\n<p>The &#8220;<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-ee28c813dfeb4180c6047ab0d3395689_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#49;&#48;&#94;&#123;&#45;&#49;&#54;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"42\" style=\"vertical-align: 0px;\"\/> corresponds to the fact we have roughly 16 decimal digits of accuracy in double precision floating point arithmetic. Thus, if the condition number of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a038471f70ce2efa1e2bb1ab05e0a7ce_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/> is roughly <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-249974e9137aecdf69a8846448a5a474_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#49;&#48;&#94;&#123;&#49;&#48;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"31\" style=\"vertical-align: 0px;\"\/>, then we should expect around <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-51a8e7a102bf4f438a17a8506afb7ea9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#54;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: 0px;\"\/> digits of accuracy in our computed solution.<\/p>\n\n\n\n<p>The accuracy of the least squares problem is governed by its own condition number <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-328258185ad52c376d516393a63cd76b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#65;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"36\" style=\"vertical-align: -5px;\"\/>. We would hope that we can solve the least squares problem with an accuracy like the rule-of-thumb error bound (3) we had for linear systems of equations, namely a bound like <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-75b955afeaf30b326d0c6fe2cca8aeea_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#124;&#92;&#104;&#97;&#116;&#123;&#120;&#125;&#32;&#45;&#32;&#120;&#92;&#124;&#47;&#92;&#124;&#120;&#92;&#124;&#32;&#92;&#108;&#101;&#115;&#115;&#97;&#112;&#112;&#114;&#111;&#120;&#32;&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#65;&#41;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#49;&#48;&#94;&#123;&#45;&#49;&#54;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"220\" style=\"vertical-align: -5px;\"\/>. But this is not the kind of accuracy we get for the least squares problem when we solve it using the normal equations. Instead, we get accuracy like<\/p>\n\n\n<p><p class=\"ql-center-displayed-equation\" style=\"line-height: 43px;\"><span class=\"ql-right-eqno\"> (4) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-4c71a8f563903ed2cdcce1adb40bf931_l3.png\" height=\"43\" width=\"213\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#124;&#32;&#92;&#104;&#97;&#116;&#123;&#120;&#125;&#32;&#45;&#32;&#120;&#32;&#92;&#124;&#125;&#123;&#92;&#124;&#120;&#92;&#124;&#125;&#32;&#92;&#108;&#101;&#115;&#115;&#97;&#112;&#112;&#114;&#111;&#120;&#32;&#92;&#108;&#101;&#102;&#116;&#40;&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#65;&#41;&#92;&#114;&#105;&#103;&#104;&#116;&#41;&#94;&#50;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#49;&#48;&#94;&#123;&#45;&#49;&#54;&#125;&#46; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p><\/p>\n\n\n<p>By solving the normal equations we effectively square the condition number! Perhaps this is not surprising as the normal equations also more-or-less square the matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/> by computing <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-841e32306afba64e7ef539a005755747_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"38\" style=\"vertical-align: 0px;\"\/>. This squared condition number drastically effects the accuracy of the computed solution. If the condition number of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/> is <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-31d20e6685260292523ad3f16b8741b2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#49;&#48;&#94;&#123;&#56;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"24\" style=\"vertical-align: 0px;\"\/>, then the normal equations give us absolute nonsense for <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-aaab1105f322d56886350f3bdc22c2b8_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"11\" style=\"vertical-align: 0px;\"\/>; we expect to get no digits of the answer <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> correct. Contrast this to above, where we were able to get <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-51a8e7a102bf4f438a17a8506afb7ea9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#54;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: 0px;\"\/> correct digits in the solution to <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b13422a1f715d37ede9c837a47d0ca76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;&#120;&#32;&#61;&#32;&#99;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"61\" style=\"vertical-align: 0px;\"\/> despite the condition number of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a038471f70ce2efa1e2bb1ab05e0a7ce_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#77;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/> being <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-4f068194c694e1bdd25ec7cf0a5b2a0b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#49;&#48;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"26\" style=\"vertical-align: 0px;\"\/> times larger than <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/>!<\/p>\n\n\n\n<p>All of this would be just a sad fact of life for the least squares problem if the normal equations and their poor accuracy properties were the best we could do for the least squares problem. But we can do better! One can solve linear least squares problems by computing a so-called <em><a href=\"https:\/\/en.wikipedia.org\/wiki\/QR_decomposition\">QR factorization<\/a><\/em> of the matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/>.<sup class=\"modern-footnotes-footnote \" data-mfn=\"5\" data-mfn-post-scope=\"000000000000057f0000000000000000_648\"><a href=\"javascript:void(0)\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000000000057f0000000000000000_648-5\">5<\/a><\/sup><span id=\"mfn-content-000000000000057f0000000000000000_648-5\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"5\">In MATLAB, the least squares problem can be solved with QR factorization by calling <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/mldivide.html\">&#8220;A\\b&#8221;<\/a>.<\/span> Without going into details, the upshot is that the least squares solution by a well-designed<sup class=\"modern-footnotes-footnote \" data-mfn=\"6\" data-mfn-post-scope=\"000000000000057f0000000000000000_648\"><a href=\"javascript:void(0)\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000000000057f0000000000000000_648-6\">6<\/a><\/sup><span id=\"mfn-content-000000000000057f0000000000000000_648-6\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"6\">One way of computing the QR factorization is by <a href=\"https:\/\/en.wikipedia.org\/wiki\/QR_decomposition#Using_the_Gram\u2013Schmidt_process\">Gram\u2013Schmidt orthogonalization<\/a>, but the accuracy properties of this are <a href=\"https:\/\/en.wikipedia.org\/wiki\/QR_decomposition#Advantages_and_disadvantages\">poor too<\/a>. A gold-standard way of computing the QR factorization by means of <a href=\"https:\/\/en.wikipedia.org\/wiki\/QR_decomposition#Using_Householder_reflections\">Householder reflectors<\/a>, which has excellent accuracy properties.<\/span> QR factorization requires a similar amount of time to solving the normal equations and has dramatically improved accuracy properties, achieving the desirable rule-of-thumb behavior<sup class=\"modern-footnotes-footnote \" data-mfn=\"7\" data-mfn-post-scope=\"000000000000057f0000000000000000_648\"><a href=\"javascript:void(0)\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000000000057f0000000000000000_648-7\">7<\/a><\/sup><span id=\"mfn-content-000000000000057f0000000000000000_648-7\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"7\">More precisely, the rule of thumb is like <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-ecc6aed2d9fb5b7ec3e7feed897025f8_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#124;&#92;&#104;&#97;&#116;&#123;&#120;&#125;&#32;&#45;&#32;&#120;&#92;&#124;&#47;&#92;&#124;&#120;&#92;&#124;&#32;&#92;&#108;&#101;&#115;&#115;&#97;&#112;&#112;&#114;&#111;&#120;&#32;&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#65;&#41;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#49;&#48;&#94;&#123;&#45;&#49;&#54;&#125;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#40;&#49;&#43;&#32;&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#65;&#41;&#92;&#124;&#32;&#98;&#32;&#45;&#32;&#65;&#120;&#32;&#92;&#124;&#47;&#40;&#92;&#124;&#65;&#92;&#124;&#92;&#124;&#98;&#92;&#124;&#41;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"473\" style=\"vertical-align: -5px;\"\/>. So even if we solve the least squares problem with QR factorization, we still get a squared condition number in our error bound, but <em>this condition number squared is multiplied by the residual <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-7b120ffa3c820d3c087da99d6b29f4d7_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#124;&#98;&#45;&#65;&#120;&#92;&#124;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"66\" style=\"vertical-align: -5px;\"\/>, which is small if the least squares fit is good.<\/em> The least squares solution is usually only interesting when the residual is small, thus justifying dropping it in the rule of thumb.<\/span>\n\n\n<p><p class=\"ql-center-displayed-equation\" style=\"line-height: 43px;\"><span class=\"ql-right-eqno\"> (5) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-35f0018fb24c33176796cfa8d673bba0_l3.png\" height=\"43\" width=\"191\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#124;&#32;&#92;&#104;&#97;&#116;&#123;&#120;&#125;&#32;&#45;&#32;&#120;&#32;&#92;&#124;&#125;&#123;&#92;&#124;&#120;&#92;&#124;&#125;&#32;&#92;&#108;&#101;&#115;&#115;&#97;&#112;&#112;&#114;&#111;&#120;&#32;&#92;&#107;&#97;&#112;&#112;&#97;&#40;&#65;&#41;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#49;&#48;&#94;&#123;&#45;&#49;&#54;&#125;&#46; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p><\/p>\n<p>I have not described how the QR factorization is accurately computed nor how to use the QR factorization to solve least squares problems nor even what the QR factorization is. All of these topics are explained excellently by <a href=\"https:\/\/my.siam.org\/Store\/Product\/viewproduct\/?ProductId=42813031\">the<\/a> <a href=\"https:\/\/epubs.siam.org\/doi\/book\/10.1137\/1.9781611971446\">standard<\/a> <a href=\"https:\/\/books.google.com\/books\/about\/Matrix_Computations.html?id=X5YfsuCWpxMC\">textbooks<\/a> in this area, as well as by publicly available resources like <a href=\"https:\/\/en.wikipedia.org\/wiki\/QR_decomposition\">Wikipedia<\/a>. There&#8217;s much more that can be said about the many benefits of solving the least squares problem with the QR factorization,<sup class=\"modern-footnotes-footnote \" data-mfn=\"8\" data-mfn-post-scope=\"000000000000057f0000000000000000_648\"><a href=\"javascript:void(0)\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000000000057f0000000000000000_648-8\">8<\/a><\/sup><span id=\"mfn-content-000000000000057f0000000000000000_648-8\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"8\">E.g., it can work for sparse matrices while the normal equations often do not, it has superior accuracy to Gaussian elimination with partial pivoting even for solving linear systems, the &#8220;<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-e7f252699e1616398a7ce5179d3e425e_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#81;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"14\" style=\"vertical-align: -4px;\"\/>&#8221; matrix in the QR factorization can be represented implicitly as a product of easy-to-compute-with Householder reflectors which is much more efficient when<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-dac207a0e6de32402419896a91fbc885_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#109;&#92;&#103;&#103;&#32;&#110;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"54\" style=\"vertical-align: -1px;\"\/>, etc.<\/span> but in the interest of brevity let me just say this: <em>TL;DR when presented in the wild with a least squares problem, the solution method one should default to is one based on a well-implemented QR factorization, not solving the normal equations.<\/em><\/p>\n<p>Suppose for whatever reason we don&#8217;t have a high quality QR factorization algorithm at our disposal. Must we then resort to the normal equations? Even in this case, there is a way we can reduce the problem of solving a least squares problems to a linear system of equations without squaring the condition number! (For those interested, to do this, we recognize the normal equations as a <a href=\"https:\/\/www.ethanepperly.com\/index.php\/2020\/07\/09\/big-ideas-in-applied-math-the-schur-complement\/\">Schur complement<\/a> of a somewhat larger system of linear equations and then solve that. See Eq. (7) in <a href=\"https:\/\/www.ethanepperly.com\/index.php\/2020\/07\/09\/big-ideas-in-applied-math-the-schur-complement\/\">this post<\/a> for more discussion of this approach.)&nbsp;<\/p>\n\n\n<p>The title of this post <em>Don&#8217;t Solve the Normal Equations<\/em> is deliberately overstated. There are times when solving the normal equations is appropriate. If <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/> is well-conditioned with a small condition number, squaring the condition number might not be that bad. If the matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-11f4f587954b361e7d78940f65b8d70d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"13\" style=\"vertical-align: 0px;\"\/> is too large to store in memory, one might want to solve the least squares problem using the normal equations and the conjugate gradient method.<\/p>\n\n\n\n<p>However, the dramatically reduced accuracy of solving the normal equations should disqualify the approach from being the de-facto way of solving least squares problems. Unless you have good reason to think otherwise, when you see <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-841e32306afba64e7ef539a005755747_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#94;&#92;&#116;&#111;&#112;&#32;&#65;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"38\" style=\"vertical-align: 0px;\"\/>, solve a different way.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The (ordinary) linear least squares problem is as follows: given an matrix and a vector of length , find the vector such that is as close to as possible, when measured using the two-norm . That is, we seek to (1) &nbsp; From this equation, the name &#8220;least squares&#8221; is self-explanatory: we seek which minimizes<a class=\"more-link\" href=\"https:\/\/www.ethanepperly.com\/index.php\/2022\/07\/26\/dont-solve-the-normal-equations\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,13],"tags":[],"class_list":["post-648","post","type-post","status-publish","format-standard","hentry","category-expository","category-numerical-linear-algebra-dos-and-donts"],"_links":{"self":[{"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/posts\/648","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/comments?post=648"}],"version-history":[{"count":35,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/posts\/648\/revisions"}],"predecessor-version":[{"id":1216,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/posts\/648\/revisions\/1216"}],"wp:attachment":[{"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/media?parent=648"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/categories?post=648"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/tags?post=648"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}