{"id":2093,"date":"2025-06-07T01:56:10","date_gmt":"2025-06-07T01:56:10","guid":{"rendered":"https:\/\/www.ethanepperly.com\/?p=2093"},"modified":"2025-06-07T18:02:42","modified_gmt":"2025-06-07T18:02:42","slug":"a-neat-not-randomized-algorithm-the-polar-express","status":"publish","type":"post","link":"https:\/\/www.ethanepperly.com\/index.php\/2025\/06\/07\/a-neat-not-randomized-algorithm-the-polar-express\/","title":{"rendered":"A Neat Not-Randomized Algorithm: Polar Express"},"content":{"rendered":"\n<p>Every once in a while, there&#8217;s a paper that comes out that is so delightful that I can&#8217;t help share it on this blog, and I&#8217;ve started a little series <em><a href=\"https:\/\/www.ethanepperly.com\/index.php\/category\/neat-randomized-algorithms\/\">Neat Randomized Algorithms<\/a><\/em> for exactly this purpose. Today&#8217;s entry into this collection is <em><a href=\"https:\/\/arxiv.org\/abs\/2505.16932\">The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm<\/a><\/em> by <a href=\"https:\/\/noahamsel.github.io\">Noah Amsel<\/a>, <a href=\"https:\/\/scholar.google.com\/citations?user=jOtDnRAAAAAJ&amp;hl=sv\">David Persson<\/a>, <a href=\"https:\/\/www.chrismusco.com\">Christopher Musco<\/a>, and <a href=\"https:\/\/gowerrobert.github.io\">Robert M. Gower<\/a>. Despite its authors belonging to the randomized linear algebra ouvr\u00e9, this paper is actually about a plain-old deterministic algorithm. But it&#8217;s just so delightful that I couldn&#8217;t help but share it in this series any way.<\/p>\n\n\n\n<p>The authors of <em>The Polar Express<\/em> are motivated by the recent <a href=\"https:\/\/kellerjordan.github.io\/posts\/muon\/\">Muon algorithm<\/a> for neural network optimization. The basic idea of Muon is that it helps to <em>orthogonalize<\/em> the search directions in a stochastic gradient method. That is, rather than update a weight matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-676e51a51d2f41a64088aeb105e847e7_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#87;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"19\" style=\"vertical-align: 0px;\"\/> with search direction <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-53c2a72f422a8e6f02f2e3ac92c66d78_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> using the update rule <p class=\"ql-center-displayed-equation\" style=\"line-height: 16px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-f4e1c8e417c22f5378f2c084fe8937cd_l3.png\" height=\"16\" width=\"115\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#87;&#32;&#92;&#103;&#101;&#116;&#115;&#32;&#87;&#32;&#45;&#32;&#92;&#101;&#116;&#97;&#32;&#71;&#44;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>instead use the update <p class=\"ql-center-displayed-equation\" style=\"line-height: 19px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-c9c2f2804d2675c9a8a9e6112b43c6f1_l3.png\" height=\"19\" width=\"172\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#87;&#92;&#103;&#101;&#116;&#115;&#32;&#87;&#32;&#45;&#32;&#92;&#101;&#116;&#97;&#32;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#112;&#111;&#108;&#97;&#114;&#125;&#40;&#71;&#41;&#46;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>Here, <p class=\"ql-center-displayed-equation\" style=\"line-height: 32px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-cedee2dc450e00d845f4b3f41de98a5f_l3.png\" height=\"32\" width=\"349\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#112;&#111;&#108;&#97;&#114;&#125;&#40;&#71;&#41;&#32;&#92;&#99;&#111;&#108;&#111;&#110;&#101;&#113;&#113;&#32;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#42;&#123;&#97;&#114;&#103;&#109;&#105;&#110;&#125;&#95;&#123;&#81;&#32;&#92;&#116;&#101;&#120;&#116;&#114;&#109;&#123;&#32;&#119;&#105;&#116;&#104;&#32;&#111;&#114;&#116;&#104;&#111;&#110;&#111;&#114;&#109;&#97;&#108;&#32;&#99;&#111;&#108;&#117;&#109;&#110;&#115;&#125;&#125;&#32;&#92;&#110;&#111;&#114;&#109;&#123;&#71;&#32;&#45;&#32;&#81;&#125;&#95;&#123;&#92;&#114;&#109;&#32;&#70;&#125;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>is the closed matrix with orthonormal columns to <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-53c2a72f422a8e6f02f2e3ac92c66d78_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> and is called the <em>(unitary) <a href=\"https:\/\/en.wikipedia.org\/wiki\/Polar_decomposition\">polar factor<\/a><\/em> of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-53c2a72f422a8e6f02f2e3ac92c66d78_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/>. (Throughout this post, <em><mark style=\"background-color:#fff4d7\" class=\"has-inline-color\">we shall assume for simplicity that <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-53c2a72f422a8e6f02f2e3ac92c66d78_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> is tall and full-rank<\/mark><\/em>.) Muon relies on efficient algorithms for rapidly approximating <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-d2d5e499d077154690d212b033031e87_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#112;&#111;&#108;&#97;&#114;&#125;&#40;&#71;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"67\" style=\"vertical-align: -5px;\"\/>.<\/p>\n\n\n\n<p>Given a singular value decomposition <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-dedb2d0d560dd4cddd17f199016ba4ce_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;&#32;&#61;&#32;&#85;&#92;&#83;&#105;&#103;&#109;&#97;&#32;&#86;&#94;&#92;&#116;&#111;&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"89\" style=\"vertical-align: 0px;\"\/>, the polar factor may be computed in closed form as <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a7a486699f420a01878deb874e01d957_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#112;&#111;&#108;&#97;&#114;&#125;&#40;&#71;&#41;&#32;&#61;&#32;&#85;&#86;&#94;&#92;&#116;&#111;&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"130\" style=\"vertical-align: -5px;\"\/>. But computing the SVD is computationally expensive, particularly in GPU computing environments. Are there more efficient algorithms that avoid the SVD? In particular, can we design algorithms that use only matrix multiplications, for maximum GPU efficiency?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Polar Factor as a Singular Value Transformation<\/h2>\n\n\n\n<p>Computing the polar factor <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-d2d5e499d077154690d212b033031e87_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#112;&#111;&#108;&#97;&#114;&#125;&#40;&#71;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"67\" style=\"vertical-align: -5px;\"\/> of a matrix <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-53c2a72f422a8e6f02f2e3ac92c66d78_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> effectively applies an operation to <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-53c2a72f422a8e6f02f2e3ac92c66d78_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> which replaces all of its singular values by one. Such operations <a href=\"https:\/\/arxiv.org\/abs\/1806.01838\">are studied in quantum computing<\/a>, where they are called singular value transformations.<\/p>\n\n\n\n<p><strong>Definition (singular value transformation):<\/strong> Given an <em><mark style=\"background-color:#fff4d7\" class=\"has-inline-color\">odd<\/mark><\/em> function <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-c23e757cb6bd08fe194e942085387dcd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"10\" style=\"vertical-align: -4px;\"\/>, the singular value transformation of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-dedb2d0d560dd4cddd17f199016ba4ce_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;&#32;&#61;&#32;&#85;&#92;&#83;&#105;&#103;&#109;&#97;&#32;&#86;&#94;&#92;&#116;&#111;&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"89\" style=\"vertical-align: 0px;\"\/> by <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-c23e757cb6bd08fe194e942085387dcd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"10\" style=\"vertical-align: -4px;\"\/> is <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-ced13a661f24d5413c3535383f4c910f_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#91;&#71;&#93;&#32;&#92;&#99;&#111;&#108;&#111;&#110;&#101;&#113;&#113;&#32;&#85;&#102;&#40;&#92;&#83;&#105;&#103;&#109;&#97;&#41;&#86;&#94;&#92;&#116;&#111;&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"137\" style=\"vertical-align: -5px;\"\/>.<\/p>\n\n\n\n<p>On its face, it might seem like that the polar factor of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-53c2a72f422a8e6f02f2e3ac92c66d78_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> is cannot be obtained as a singular value transformation. After all, the constantly one function <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-5ca7ab1c3731d8b5eb9657df7534827d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#40;&#120;&#41;&#61;&#32;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"66\" style=\"vertical-align: -5px;\"\/> is not odd. But, to obtain the polar factor, we only need a function <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-c23e757cb6bd08fe194e942085387dcd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"10\" style=\"vertical-align: -4px;\"\/> which sends <em>positive inputs<\/em> to <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-27e3598fd2d4f0491067f1afaced92e9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"7\" style=\"vertical-align: 0px;\"\/>. Thus, the polar decomposition <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-d2d5e499d077154690d212b033031e87_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#112;&#111;&#108;&#97;&#114;&#125;&#40;&#71;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"67\" style=\"vertical-align: -5px;\"\/> is given by the singular value transformation associated with the <em>sign function<\/em>: <p class=\"ql-center-displayed-equation\" style=\"line-height: 75px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-103d0549df7eab731cd55caeaef8f0f2_l3.png\" height=\"75\" width=\"186\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;&#40;&#120;&#41;&#32;&#61;&#32;&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#99;&#97;&#115;&#101;&#115;&#125;&#32;&#49;&#44;&#32;&#38;&#32;&#120;&#32;&#62;&#32;&#48;&#44;&#32;&#92;&#92;&#32;&#48;&#44;&#32;&#38;&#32;&#120;&#32;&#61;&#32;&#48;&#44;&#32;&#92;&#92;&#32;&#45;&#49;&#44;&#32;&#38;&#32;&#120;&#32;&#60;&#32;&#48;&#46;&#32;&#92;&#101;&#110;&#100;&#123;&#99;&#97;&#115;&#101;&#115;&#125;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>The sign function is manifestly odd, and the polar factor satisfies <\/p>\n\n\n\n<p><p class=\"ql-center-displayed-equation\" style=\"line-height: 19px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-94977973d95c7d3483bee0620f2ef5f9_l3.png\" height=\"19\" width=\"149\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#112;&#111;&#108;&#97;&#114;&#125;&#40;&#71;&#41;&#32;&#61;&#32;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;&#91;&#71;&#93;&#46;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Singular Value Transformations and Polynomials<\/h2>\n\n\n\n<p>How might we go about computing the singular value transformation of a matrix? For an (odd) polynomial, this computation can be accomplished using a sequence of matrix multiplications alone. Indeed, for <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-136b6a8fe1276b412243035f4e26301f_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#40;&#120;&#41;&#32;&#61;&#32;&#97;&#95;&#49;&#32;&#120;&#32;&#43;&#32;&#97;&#95;&#51;&#32;&#120;&#94;&#51;&#32;&#43;&#32;&#92;&#99;&#100;&#111;&#116;&#115;&#32;&#43;&#32;&#97;&#95;&#123;&#50;&#107;&#43;&#49;&#125;&#32;&#120;&#94;&#123;&#50;&#107;&#43;&#49;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"289\" style=\"vertical-align: -5px;\"\/>, we have <p class=\"ql-center-displayed-equation\" style=\"line-height: 22px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-3377aab54a08cc6e010a9df4e06d0606_l3.png\" height=\"22\" width=\"380\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#112;&#91;&#71;&#93;&#32;&#61;&#32;&#97;&#95;&#49;&#32;&#71;&#32;&#43;&#32;&#97;&#95;&#51;&#32;&#71;&#40;&#71;&#94;&#92;&#116;&#111;&#112;&#32;&#71;&#41;&#32;&#43;&#32;&#92;&#99;&#100;&#111;&#116;&#115;&#32;&#43;&#32;&#97;&#95;&#123;&#50;&#107;&#43;&#49;&#125;&#32;&#71;&#40;&#71;&#94;&#92;&#116;&#111;&#112;&#32;&#71;&#41;&#94;&#107;&#46;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>For a general (odd) function <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-c23e757cb6bd08fe194e942085387dcd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"10\" style=\"vertical-align: -4px;\"\/>, we can approximately compute the singular value transformation <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a72a56b33f34eecf4bf37ad1ee1c78ea_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#91;&#71;&#93;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"33\" style=\"vertical-align: -5px;\"\/> by first approximating <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-c23e757cb6bd08fe194e942085387dcd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"10\" style=\"vertical-align: -4px;\"\/> by a polynomial <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-726e7ac82690902493102f577788aa40_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/>, and then using <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-2f3968e6285f4d4baef540b858d2c50a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#91;&#71;&#93;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"32\" style=\"vertical-align: -5px;\"\/> as a proxy for <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a72a56b33f34eecf4bf37ad1ee1c78ea_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#91;&#71;&#93;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"33\" style=\"vertical-align: -5px;\"\/>. Here is an example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&gt;&gt; G = randn(2)                            <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">% Random test matrix<\/mark>\nG =\n   0.979389080992349  -0.198317114406418\n  -0.252310961830649  -1.242378171072736\n&gt;&gt; &#91;U,S,V] = svd(G);\n&gt;&gt; fG = U*sin(S)*V'                        <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">% Singular value transformation<\/mark>\nfG =\n   0.824317193982434  -0.167053523352195\n  -0.189850719961322  -0.935356030417109\n&gt;&gt; pG = G - (G*G'*G)\/6 + (G*G'*G*G'*G)\/120 <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">% Polynomial approximation<\/mark>\npG =\n   0.824508188218982  -0.167091255945116\n  -0.190054681059327  -0.936356677704568<\/code><\/pre>\n\n\n\n<p>We see that we get reasonably high accuracy by approximating <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-7d43149d58358ce19f9f50e096f525e2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#115;&#105;&#110;&#91;&#71;&#93;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"44\" style=\"vertical-align: -5px;\"\/> using its degree-three Taylor polynomial.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Power of Composition<\/h2>\n\n\n\n<p>The most basic approach to computing the sign function would be to use a fixed polynomial of degree <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-4e3e465ecf16fee2ea1d3a606c75894c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#50;&#107;&#43;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"48\" style=\"vertical-align: -2px;\"\/>. However, this approach converges fairly slowly as we increase the degree <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-f65286b751f121928913d4aa91d94ee9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#107;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: 0px;\"\/>.<\/p>\n\n\n\n<p>A better strategy is to use compositions. A nice feature of the sign function is the <em>fixed point property:<\/em> For every <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b312d649591164b7149ed0756f694a76_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/>, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-85906bac4fc2c7dc1098ec9deb55c5a3_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;&#40;&#120;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"54\" style=\"vertical-align: -5px;\"\/> is a fixed point of the <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-bb8186340570b2b80f548d3a8ea66f6d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"31\" style=\"vertical-align: -4px;\"\/> function: <p class=\"ql-center-displayed-equation\" style=\"line-height: 19px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-fa3bb06693b0d6b4742c882a7a2f4b7c_l3.png\" height=\"19\" width=\"295\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;&#40;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;&#40;&#120;&#41;&#41;&#32;&#61;&#32;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;&#40;&#120;&#41;&#32;&#92;&#113;&#117;&#97;&#100;&#32;&#92;&#116;&#101;&#120;&#116;&#123;&#102;&#111;&#114;&#32;&#97;&#108;&#108;&#32;&#125;&#32;&#120;&#32;&#92;&#105;&#110;&#32;&#92;&#114;&#101;&#97;&#108;&#46;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>The fixed point strategy suggests an alternate strategy for computing the sign function using polynomials. Rather than using one polynomial of large degree, we can instead compose many polynomials of low degree. The simplest such compositional algorithm is the <em>Newton\u2013Schulz iteration<\/em>, which consists of initializing <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-f9d242dac49359e4d3e61c12159686f4_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#80;&#92;&#103;&#101;&#116;&#115;&#32;&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"55\" style=\"vertical-align: -1px;\"\/> applying the following fixed point equation until convergence: <p class=\"ql-center-displayed-equation\" style=\"line-height: 36px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-88631b351c7b5aafba34c5d29a5c4921_l3.png\" height=\"36\" width=\"157\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#80;&#32;&#92;&#103;&#101;&#116;&#115;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#32;&#80;&#32;&#45;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#50;&#125;&#32;&#80;&#80;&#94;&#92;&#116;&#111;&#112;&#32;&#80;&#46;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>Here is an example execution of the algorithm:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>>> P = randn(100) \/ 25;\n>> &#91;U,~,V] = svd(P); polar = U*V'; <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">% True polar decomposition<\/mark>\n>> for i = 1:20\n      P = 1.5*P-0.5*P*P'*P; <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">% Newton-Schulz iteration<\/mark>\n      fprintf(\"Iteration %d\\terror %e\\n\",i,norm(P - polar));\n   end\n<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Iteration 1\terror 9.961421e-01\nIteration 2\terror 9.942132e-01\nIteration 3\terror 9.913198e-01\nIteration 4\terror 9.869801e-01\nIteration 5\terror 9.804712e-01\nIteration 6\terror 9.707106e-01\nIteration 7\terror 9.560784e-01\nIteration 8\terror 9.341600e-01\nIteration 9\terror 9.013827e-01\nIteration 10\terror 8.525536e-01\nIteration 11\terror 7.804331e-01\nIteration 12\terror 6.759423e-01\nIteration 13\terror 5.309287e-01\nIteration 14\terror 3.479974e-01\nIteration 15\terror 1.605817e-01\nIteration 16\terror 3.660929e-02\nIteration 17\terror 1.985827e-03\nIteration 18\terror 5.911348e-06\nIteration 19\terror 5.241446e-11\nIteration 20\terror 6.686995e-15<\/mark><\/code><\/pre>\n\n\n\n<p>As we see, the initial rate of convergence is very slow, and obtain only a single digit of accuracy after 15 iterations. After this burn-in period, the rate of convergence is very rapid, and the method achieves machine accuracy after 20 iterations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Polar Express<\/h2>\n\n\n\n<p>The Newton\u2013Schulz iteration approximates the sign function using a composition of the same polynomial <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-726e7ac82690902493102f577788aa40_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/> repeatedly. But we can get better approximations by applying a sequence of <em>different<\/em> polynomials <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a3b319b2450db7e23f042668f48260d2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#49;&#44;&#92;&#108;&#100;&#111;&#116;&#115;&#44;&#112;&#95;&#116;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"71\" style=\"vertical-align: -4px;\"\/>, resulting in an approximation of the form <p class=\"ql-center-displayed-equation\" style=\"line-height: 18px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-fa6a1907a8c5265dd61a630d5f0eb268_l3.png\" height=\"18\" width=\"271\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;&#91;&#71;&#93;&#32;&#92;&#97;&#112;&#112;&#114;&#111;&#120;&#32;&#112;&#95;&#116;&#91;&#112;&#95;&#123;&#116;&#45;&#49;&#125;&#91;&#92;&#99;&#100;&#111;&#116;&#115;&#91;&#112;&#95;&#50;&#91;&#112;&#95;&#49;&#91;&#71;&#93;&#93;&#92;&#99;&#100;&#111;&#116;&#115;&#93;&#93;&#46;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p><em>The Polar Express<\/em> paper asks the question:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>What are the <em>optimal<\/em> choice of polynomials <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b4decb89744dc3e83e5896f8f2dc1e99_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"15\" style=\"vertical-align: -4px;\"\/>?<\/p>\n<\/blockquote>\n\n\n\n<p>For simplicity, the authors of <em>The Polar Express<\/em> focus on the case where all of the polynomials <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-b4decb89744dc3e83e5896f8f2dc1e99_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"15\" style=\"vertical-align: -4px;\"\/> have the same (odd) degree <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-4e3e465ecf16fee2ea1d3a606c75894c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#50;&#107;&#43;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"48\" style=\"vertical-align: -2px;\"\/>.<\/p>\n\n\n\n<p>On its face, it seems like this problem might be intractable as the best choice of polynomial <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-f3a516d7e3586e8a0e82916b57ce9bdc_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#123;&#105;&#43;&#49;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"32\" style=\"vertical-align: -5px;\"\/> seemly could depend in a complicated way on all of the previous polynomials <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-6ddeb8244a0101be6c2240668a589110_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#49;&#44;&#92;&#108;&#100;&#111;&#116;&#115;&#44;&#112;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"71\" style=\"vertical-align: -4px;\"\/>. Fortunately, the authors of <em>The Polar Express<\/em> show that there is a very simple way of computing the optimal polynomials. Begin by assuming that the singular values of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-53c2a72f422a8e6f02f2e3ac92c66d78_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> lie in an interval <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-211f4ff3af7863232d7db347c4ef7e54_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#91;&#92;&#101;&#108;&#108;&#95;&#48;&#44;&#117;&#95;&#48;&#93;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"47\" style=\"vertical-align: -5px;\"\/>. We then choose <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-78ce8990eee91903e48b6ead3b4ebc29_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"16\" style=\"vertical-align: -4px;\"\/> to be the degree-(<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-4e3e465ecf16fee2ea1d3a606c75894c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#50;&#107;&#43;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"48\" style=\"vertical-align: -2px;\"\/>) odd polynomial approximation to the sign function on <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-211f4ff3af7863232d7db347c4ef7e54_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#91;&#92;&#101;&#108;&#108;&#95;&#48;&#44;&#117;&#95;&#48;&#93;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"47\" style=\"vertical-align: -5px;\"\/> that minimizes the <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-79e456b36551f5d489c24944c63c3a87_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#76;&#95;&#92;&#105;&#110;&#102;&#116;&#121;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"25\" style=\"vertical-align: -3px;\"\/> error:<p class=\"ql-center-displayed-equation\" style=\"line-height: 34px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-e291c7f68e60512be5e8e92c7e3cf90b_l3.png\" height=\"34\" width=\"427\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#112;&#95;&#49;&#32;&#61;&#32;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#42;&#123;&#97;&#114;&#103;&#109;&#105;&#110;&#125;&#95;&#123;&#92;&#116;&#101;&#120;&#116;&#123;&#111;&#100;&#100;&#32;&#100;&#101;&#103;&#114;&#101;&#101;&#45;&#40;&#36;&#50;&#107;&#43;&#49;&#36;&#41;&#32;&#112;&#111;&#108;&#121;&#110;&#111;&#109;&#105;&#97;&#108;&#32;&#125;&#32;&#112;&#125;&#32;&#92;&#109;&#97;&#120;&#95;&#123;&#120;&#32;&#92;&#105;&#110;&#32;&#91;&#92;&#101;&#108;&#108;&#95;&#48;&#44;&#117;&#95;&#48;&#93;&#125;&#32;&#124;&#112;&#40;&#120;&#41;&#32;&#45;&#32;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;&#40;&#120;&#41;&#124;&#46;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>This optimal polynomial can be computed by a version of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Remez_algorithm\">Remez algorithm<\/a> provided in the Polar Express paper. After applying <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-78ce8990eee91903e48b6ead3b4ebc29_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"16\" style=\"vertical-align: -4px;\"\/> to <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-53c2a72f422a8e6f02f2e3ac92c66d78_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/>, the singular values of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-630611cd5d04a5cd2fabb16f5855dcfa_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#49;&#91;&#71;&#93;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"40\" style=\"vertical-align: -5px;\"\/> lie in a new interval <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-15fd6cfaa78a33099efd2e7ebe256fbb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#91;&#92;&#101;&#108;&#108;&#95;&#49;&#44;&#117;&#95;&#49;&#93;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"47\" style=\"vertical-align: -5px;\"\/>. To build the next polynomial <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-6a787f52ede5593f5103c47621ad5972_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#50;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"17\" style=\"vertical-align: -4px;\"\/>, we simply find the optimal approximation to the sign function on this interval: <p class=\"ql-center-displayed-equation\" style=\"line-height: 34px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-d4b8a6d502f41db5153e14b3a0f17b88_l3.png\" height=\"34\" width=\"427\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91;&#112;&#95;&#50;&#32;&#61;&#32;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#42;&#123;&#97;&#114;&#103;&#109;&#105;&#110;&#125;&#95;&#123;&#92;&#116;&#101;&#120;&#116;&#123;&#111;&#100;&#100;&#32;&#100;&#101;&#103;&#114;&#101;&#101;&#45;&#40;&#36;&#50;&#107;&#43;&#49;&#36;&#41;&#32;&#112;&#111;&#108;&#121;&#110;&#111;&#109;&#105;&#97;&#108;&#32;&#125;&#32;&#112;&#125;&#32;&#92;&#109;&#97;&#120;&#95;&#123;&#120;&#32;&#92;&#105;&#110;&#32;&#91;&#92;&#101;&#108;&#108;&#95;&#49;&#44;&#117;&#95;&#49;&#93;&#125;&#32;&#124;&#112;&#40;&#120;&#41;&#32;&#45;&#32;&#92;&#111;&#112;&#101;&#114;&#97;&#116;&#111;&#114;&#110;&#97;&#109;&#101;&#123;&#115;&#105;&#103;&#110;&#125;&#40;&#120;&#41;&#124;&#46;&#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>Continuing in this way, we can generate as many polynomials <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-5e949a375def54adcda10832b29c8ce9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#49;&#44;&#112;&#95;&#50;&#44;&#92;&#108;&#100;&#111;&#116;&#115;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"70\" style=\"vertical-align: -4px;\"\/> as we want. <\/p>\n\n\n\n<p>For given values of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-6d3265144bc9e8592a8de5fe10ee1b08_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#101;&#108;&#108;&#95;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"14\" style=\"vertical-align: -3px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-a32578bbf8ddfba1b889165d4c486584_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#117;&#95;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"17\" style=\"vertical-align: -3px;\"\/>, the coefficients of the optimal polynomials <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-5e949a375def54adcda10832b29c8ce9_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;&#95;&#49;&#44;&#112;&#95;&#50;&#44;&#92;&#108;&#100;&#111;&#116;&#115;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"70\" style=\"vertical-align: -4px;\"\/> can be computed in advance and stored, allowing for rapid deployment at runtime. Moreover, we can always ensure the upper bound is <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-ebf4e9933109217157326dabc9365fd5_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#117;&#95;&#48;&#32;&#61;&#32;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"49\" style=\"vertical-align: -3px;\"\/> by normalizing <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-e286d99d5588ca9dee49c99af1177658_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#71;&#92;&#103;&#101;&#116;&#115;&#32;&#71;&#32;&#47;&#32;&#92;&#110;&#111;&#114;&#109;&#123;&#71;&#125;&#95;&#123;&#92;&#114;&#109;&#32;&#70;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"108\" style=\"vertical-align: -5px;\"\/>. As such, there is only one parameter <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-6d3265144bc9e8592a8de5fe10ee1b08_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#101;&#108;&#108;&#95;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"14\" style=\"vertical-align: -3px;\"\/> that we need to know in order to compute the optimal coefficients. The authors of <em>The Polar Express<\/em> are motivated by applications in deep learning using 16-bit floating point numbers. In this value, the lower bound <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-dbc127ef0cf9804c84be83af4d20da24_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#101;&#108;&#108;&#95;&#48;&#32;&#61;&#32;&#48;&#46;&#48;&#48;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"79\" style=\"vertical-align: -3px;\"\/> is appropriate. (As the authors stress, their method remains convergent even if too large a value of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.ethanepperly.com\/wp-content\/ql-cache\/quicklatex.com-6d3265144bc9e8592a8de5fe10ee1b08_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#101;&#108;&#108;&#95;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"14\" style=\"vertical-align: -3px;\"\/> is chosen, though convergence may be slowed somewhat.)<\/p>\n\n\n\n<p>Below, I repeat the experiment from above using (degree-5) <em>Polar Express<\/em> instead of Newton\u2013Schulz. The coefficients for the optimal polynomials are taken from the Polar Express paper. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>>> P = randn(100) \/ 25;\n>> &#91;U,~,V] = svd(P); polar = U*V';\n>> P2 = P*P'; P = ((17.300387312530933*P2-23.595886519098837*eye(100))*P2+8.28721201814563*eye(100))*P; fprintf(\"Iteration 1\\terror %e\\n\",norm(P - polar));\n<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Iteration 1\terror 9.921347e-01<\/mark>\n>> P2 = P*P'; P = ((0.5448431082926601*P2-2.9478499167379106*eye(100))*P2+4.107059111542203*eye(100))*P; fprintf(\"Iteration 2\\terror %e\\n\",norm(P - polar));\n<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Iteration 2\terror 9.676980e-01<\/mark>\n>> P2 = P*P'; P = ((0.5518191394370137*P2-2.908902115962949*eye(100))*P2+3.9486908534822946*eye(100))*P; fprintf(\"Iteration 3\\terror %e\\n\",norm(P - polar));\n<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Iteration 3\terror 8.725474e-01<\/mark>\n>> P2 = P*P'; P = ((0.51004894012372*P2-2.488488024314874*eye(100))*P2+3.3184196573706015*eye(100))*P; fprintf(\"Iteration 4\\terror %e\\n\",norm(P - polar));\n<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Iteration 4\terror 5.821937e-01<\/mark>\n>> P2 = P*P'; P = ((0.4188073119525673*P2-1.6689039845747493*eye(100))*P2+2.300652019954817*eye(100))*P; fprintf(\"Iteration 5\\terror %e\\n\",norm(P - polar));\n<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Iteration 5\terror 1.551595e-01<\/mark>\n>> P2 = P*P'; P = ((0.37680408948524835*P2-1.2679958271945868*eye(100))*P2+1.891301407787398*eye(100))*P; fprintf(\"Iteration 6\\terror %e\\n\",norm(P - polar));\n<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Iteration 6\terror 4.588549e-03<\/mark>\n>> P2 = P*P'; P = ((0.3750001645474248*P2-1.2500016453999487*eye(100))*P2+1.8750014808534479*eye(100))*P; fprintf(\"Iteration 7\\terror %e\\n\",norm(P - polar));\n<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Iteration 7\terror 2.286853e-07<\/mark>\n>> P2 = P*P'; P = ((0.375*P2-1.25*eye(100))*P2+1.875*eye(100))*P; fprintf(\"Iteration 8\\terror %e\\n\",norm(P - polar));\n<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Iteration 8\terror 1.113148e-14<\/mark><\/code><\/pre>\n\n\n\n<p>We see that the Polar Express algorithm converges to machine accuracy in only 8 iterations (24 matrix products), a speedup over the 20 iterations (40 matrix products) required by Newton\u2013Schulz. The Polar Express paper contains further examples with even more significant speedups.<\/p>\n\n\n\n<p>Make sure to check out the Polar Express paper for many details not shared here, including extra tricks to improve stability in 16-bit floating point arithmetic, discussions about how to compute the optimal polynomials, and demonstrations of the Polar Express algorithm for training GPT-2.<\/p>\n\n\n\n<p><strong>References:<\/strong> Muon was first formally described in the blog post <em><a href=\"https:\/\/kellerjordan.github.io\/posts\/muon\/\">Muon: An optimizer for hidden layers in neural networks<\/a><\/em> (2024); for more, see <a href=\"https:\/\/jeremybernste.in\/writing\/deriving-muon\">this blog post<\/a> by <a href=\"https:\/\/jeremybernste.in\">Jeremy Bernstein<\/a> and <a href=\"https:\/\/arxiv.org\/abs\/2409.20325\">this paper<\/a> by Jeremy Bernstein and <a href=\"https:\/\/www.lakernewhouse.com\">Laker Newhouse<\/a>. The Polar Express is proposed in <em><a href=\"https:\/\/arxiv.org\/abs\/2505.16932\">The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm<\/a><\/em> (2025) by <a href=\"https:\/\/noahamsel.github.io\">Noah Amsel<\/a>, <a href=\"https:\/\/scholar.google.com\/citations?user=jOtDnRAAAAAJ&amp;hl=sv\">David Persson<\/a>, <a href=\"https:\/\/www.chrismusco.com\">Christopher Musco<\/a>, and <a href=\"https:\/\/gowerrobert.github.io\">Robert M. Gower<\/a>. For more on the matrix sign function and computing it, chapter 5 of <em><a href=\"https:\/\/nhigham.com\/functions-of-matrices-theory-and-computation\/\">Functions of Matrices: Theory and Computation<\/a><\/em> (2008) by <a href=\"https:\/\/nhigham.com\">Nicholas H. Higham<\/a> is an enduringly useful reference. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Every once in a while, there&#8217;s a paper that comes out that is so delightful that I can&#8217;t help share it on this blog, and I&#8217;ve started a little series Neat Randomized Algorithms for exactly this purpose. Today&#8217;s entry into this collection is The Polar Express: Optimal Matrix Sign Methods and their Application to the<a class=\"more-link\" href=\"https:\/\/www.ethanepperly.com\/index.php\/2025\/06\/07\/a-neat-not-randomized-algorithm-the-polar-express\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15],"tags":[],"class_list":["post-2093","post","type-post","status-publish","format-standard","hentry","category-neat-randomized-algorithms"],"_links":{"self":[{"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/posts\/2093","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/comments?post=2093"}],"version-history":[{"count":27,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/posts\/2093\/revisions"}],"predecessor-version":[{"id":2122,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/posts\/2093\/revisions\/2122"}],"wp:attachment":[{"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/media?parent=2093"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/categories?post=2093"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ethanepperly.com\/index.php\/wp-json\/wp\/v2\/tags?post=2093"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}