<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Robin Heggelund Hansen on Medium]]></title>
        <description><![CDATA[Stories by Robin Heggelund Hansen on Medium]]></description>
        <link>https://medium.com/@robinheggelundhansen?source=rss-5bd4064179a5------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*rxMRJovNH7y93KblWvAh-g.jpeg</url>
            <title>Stories by Robin Heggelund Hansen on Medium</title>
            <link>https://medium.com/@robinheggelundhansen?source=rss-5bd4064179a5------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Tue, 26 May 2026 07:39:16 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@robinheggelundhansen/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Increasing the performance of elm-css]]></title>
            <link>https://blogg.bekk.no/increasing-the-performance-of-elm-css-34075512d6a6?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/34075512d6a6</guid>
            <category><![CDATA[css]]></category>
            <category><![CDATA[elm]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Tue, 14 Dec 2021 11:55:06 GMT</pubDate>
            <atom:updated>2022-01-08T05:57:08.081Z</atom:updated>
            <content:encoded><![CDATA[<p>Most Elm projects I work on in a professional setting make use of a wonderful library, created by Richard Feldman, called elm-css. This library gives you statically typed CSS which can live alongside you view functions, making it easy to see the connections between HTML elements and CSS styling.</p><p>Recently, I came across a <a href="https://blogg.bekk.no/fixing-a-performance-problem-in-elm-using-html-lazy-c4298b72500d">performance problem in one of our applications</a>, triggered by the amount of work elm-css had to perform every time there was a change in state. While I did manage to fix the problem using aHtml.Lazy function, I walked away with a hope that elm-css could be made faster so that there wasn&#39;t a problem to begin with.</p><p>In this article, we’ll take a look at how elm-css works, and how I found a way to nearly double its performance.</p><h4>The workings of elm-css</h4><p>Let’s start off with a simple example. Below you’ll see the source code of a very simple elm-css based application:</p><pre>module Main exposing (main)<br><br>import Css<br>import Html exposing (Html)<br>import Html.Styled<br>import Html.Styled.Attributes exposing (css)<br><br><br>main : Html msg<br>main =<br>    Html.Styled.toUnstyled view<br><br><br>view : Html.Styled.Html msg<br>view =<br>    Html.Styled.div<br>        [ css<br>            [ Css.backgroundColor &lt;| Css.rgb 0 0 0 ]<br>        ]<br>        [ Html.Styled.div<br>            [ css<br>                [ Css.color &lt;| Css.rgb 125 125 125<br>                , Css.textDecoration Css.underline<br>                ]<br>            ]<br>            [ Html.Styled.text &quot;grey text&quot; ]<br>        , Html.Styled.div<br>            [ css<br>                [ Css.color &lt;| Css.rgb 125 125 125<br>                , Css.textDecoration Css.underline<br>                ]<br>            ]<br>            [ Html.Styled.text &quot;more grey text&quot; ]<br>        ]</pre><p>When compiled and run in a browser, it will produce the following html structure:</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*rff51GLYjbZgNd3w2RXnHg.png" /></figure><p>Things to note:</p><ol><li>Html.Styled.div seems to be identical to Html.div, except it accepts a css property. However, it does return a different data structure, which is why it must later be converted to &quot;unstyled&quot; HTML.</li><li>The generated CSS code is injected into a &lt;style&gt; element in the HTML, as a child element to the root styled HTML element. It contains CSS for all styled elements, not just the top element.</li><li>CSS class names are auto generated.</li><li>CSS styles are de-duplicated. The CSS for grey text only appears once.</li></ol><p>So how does all this work? How do we go from the code we see above, to the resulting HTML structure?</p><h4>How elm-css works</h4><p>Let’s explain this code example, starting from the inner most function calls.</p><p>Functions like Css.color and Css.textDecoration return a CSS key-value string wrapped in a Style type, that represents the actual CSS property that will later be injected into the HTML structure. Some Style content, like media queries or global selectors, will need further processing before it can be injected into the page.</p><p>The css function takes a list of styles and compiles it to CSS source code with one major exception: the class name. Once it has the CSS, this is passed to a hash function which converts it to a 32-bit number. This 32-bit number is hex encoded, and that hex encoded value becomes the class name for this particular list of styles.</p><p>For instance, the following Elm code:</p><pre>[ Css.color &lt;| Css.rgb 125 125 125<br>, Css.textDecoration Css.underline<br>]</pre><p>Will be converted to the following CSS:</p><pre>{<br>    color:rgb(125, 125, 125);<br>    text-decoration:underline;<br>}</pre><p>The resulting hash of this CSS code is 1444442742, and the hex encoded value becomes 56187276. This hex value, will be used as the class name for this particular CSS code.</p><p>Once we have figured out the class name, the css function returns a Html.Styled.Attribute type which contains:</p><ol><li>An Html.Attribute with our generated class name, this will be injected directly into the final HTML.</li><li>The list of styles, which we’ll need to recompile with the generated class name.</li><li>The class name, which we’ll use for de-duplicating styles and when recompiling the list of styles.</li></ol><p>Moving on.</p><p>Html.Styled.div, and similar functions, simply returns a data structure containing its input.</p><p>The final piece of the puzzle is Html.Styled.toUnstyled. This function converts the styled HTML elements to unstyled ones. For every element, it collects all matching class names and style lists in a dictionary. When done, it iterates over all the style lists that is in the dictionary and compiles the lists into CSS code using the associated class name. Because this has been stored in a dictionary, the styles have become de-duplicated.</p><p>The result of this compilation is then wrapped in a &lt;style&gt; element, and added as the first child of the root element.</p><h4>Finding potential performance improvements</h4><p>I’ve tried to summarize the important parts of the elm-css flow visually in the following diagram:</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*D2xafHsCQakzAJsb1jrJ0Q.png" /></figure><p>From a performance point of view, there are a couple of interesting things that stand out:</p><ol><li>All HTML elements will compile and hash their CSS properties. Got a list of 100 elements with identical CSS? That CSS will be compiled and hashed a 100 times.</li><li>CSS will potentially be compiled twice. With and without its class name. Thankfully, due to de-duplication, we only compile a list of styles twice if it’s unique.</li></ol><p>Before we decide on how to proceed, we first need to figure out which of the above operations are most expensive. Luckily, I’ve recently done some <a href="https://blogg.bekk.no/fixing-a-performance-problem-in-elm-using-html-lazy-c4298b72500d">profiling on a slow Elm application at work</a>, and here are the results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*kmoDNrfg5ohbXzM0CXiRfA.png" /></figure><p>Judging by the self time value, almost 25% of the duration of this particular profiling session was spent within elm-css&#39;s getClassname function. While it isn&#39;t clear from the image above, further investigation reveals that several of those (anonymous) function calls are related to the hash function used by elm-css.</p><p>It seems, then, that improving the performance of hashing is the most important thing to focus on.</p><h4>Reducing hashing</h4><p>A hash function works by converting all the characters of a string into integers, then reduce all those integers into a single integer using some mathematical formulae.</p><p>The goal is to come up with a unique id for a string. In practice, this isn’t really possible. You can have an endless combination of characters to form a string, but a hash in JavaScript is often restricted to 32-bits. So you’re bound to have collisions, meaning multiple strings getting the same hash value.</p><p>A hash function is considered to be of good quality if it is (a) fast and (b) has a low probability of collisions. elm-css uses the <a href="https://en.wikipedia.org/wiki/MurmurHash">Murmur3 hash function</a>, which in general is considered to be of high quality.</p><p>From a performance perspective, it might be <a href="https://github.com/robinheghan/fnv1a/blob/main/docs/why_fnv_is_great_for_elm.md">better to use the FNV hash function in Elm</a>. However, there’s a bigger chance of getting collisions with that hash function, and a collision means that two unrelated style lists generates the same class name, causing problems.</p><p>What we can do is to reduce the size of the compiled CSS. As we’ve seen, elm-css generates code like:</p><pre>{<br>    color:rgb(125, 125, 125);<br>    text-decoration:underline;<br>}</pre><p>This is easy to read, but contains a lot of unnecessary characters that needs to be hashed. There’s no reason we can’t generate more compact code, like this:</p><pre>{color:rgb(125,125,125);text-decoration:underline;}</pre><p>This removes a significant amount of characters. In my testing, I saw performance improvements of 10–15% depending on the browser.</p><h4>Avoiding hashing</h4><p>Every single time a we define a css property, we compile <em>and</em> hash the CSS in order to get a class name.</p><p>Based on our profiling, we know this to be the most expensive part of generating the final HTML structure. For HTML that contains a list of equally styled elements, this means we’re likely to spend a lot of time hashing identical CSS.</p><p>But do we really need the class name that early in the process? No CSS or HTML elements are created before toUnstyled is called. If we could delay class name generation to a later stage, we might be able to avoid hashing identical CSS entirely.</p><p>Such a change will touch a lot of code. It’s a relatively big restructuring of how elm-css works. So if we&#39;re already going to do that, we might also be able to avoid generating CSS more than once.</p><p>The first change I made was to the css function. I still let elm-css compile the list of styles into CSS, but with a twist. Instead of leaving the class name out, I add in a placeholder character that can be used to inject the class name using String.replace. I’ll get into more details about this later.</p><p>The hashing part of the function was removed.</p><p>Instead of returning a Html.Styled.Attribute which contain the list of styles and the class name, I changed the type to only include the compiled CSS. Not the list of styles, as we’re done compiling it, and not the class name, which we’ll generate later.</p><p>In the toUnstyled function, I changed the dictionary used for de-duplicating styles to instead contain the generated CSS as the key, and the hash as the value. When iterating over all the HTML elements, we need to check if the CSS for an element is already in the dictionary. If it is, then no action is necessary. If it isn&#39;t, we need to insert it along with its hash. This way, we only hash unique CSS, at the cost of more expensive dictionary lookups due to the increased size of the key.</p><p>Finally, when iterating the dictionary to extract the CSS styles, we need to call String.replace on the keys, replacing the placeholder character with the associated class name value. This way we avoid generating the CSS twice.</p><p>The flow now looks something like this:</p><figure><img alt="An updated visual summary of the inner workings of `elm-css`" src="https://cdn-images-1.medium.com/proxy/1*-lKYYAMownXOR7mGeKIz3A.png" /><figcaption>An updated visual summary of the inner workings of `elm-css`</figcaption></figure><p>On the benchmark that I used I went from 3900 ops/sec all the way up to 6800 ops/sec. That’s nearly double the performance!</p><h4>Conclusion</h4><p>I hope you’ve learned a little about how elm-css works. As of today, <a href="https://github.com/rtfeldman/elm-css/pull/555">the PR introducing these changes</a> hasn’t been merged. When (if?) it does, you can expect elm-css to become quite a bit faster.</p><p><em>Update: the changes described in this article have been released as part of version 17.0.2.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=34075512d6a6" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/increasing-the-performance-of-elm-css-34075512d6a6">Increasing the performance of elm-css</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Fixing a performance problem in Elm using Html.Lazy]]></title>
            <link>https://blogg.bekk.no/fixing-a-performance-problem-in-elm-using-html-lazy-c4298b72500d?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/c4298b72500d</guid>
            <category><![CDATA[elm]]></category>
            <category><![CDATA[performance]]></category>
            <category><![CDATA[technology]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Mon, 13 Dec 2021 12:25:04 GMT</pubDate>
            <atom:updated>2021-12-13T13:59:08.593Z</atom:updated>
            <content:encoded><![CDATA[<p>A few weeks ago, I got to sit down with a colleague of mine to fix a performance issue with one of our customer’s search page. The page looks something like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Bg8SqiZO5GU49sXWch144A.png" /></figure><p>This page is one Elm app. Its job is to present the user with a list of results that match a search the user has made, and let the user make a new search if they find no suitable result.</p><p>Before reading the next paragraph, keep in mind that the actual application is much more complex than the simple drawing I’ve just shown you. The illustration is only there to make it easier to understand the problem and the solution we arrived at.</p><p>The page worked as intended, but felt a bit slow. There were small, but noticeable, delays between characters when changing the content of the search field. When clicking a button you might notice a tiny delay before anything happened. The animation for clicking the search button, which due to a bug in elm-animator wasn&#39;t using css animations, was choppy.</p><p>Now, the reason we were looking into this was because we, the developers, were unhappy about it. We hadn’t received any user feedback from our users that the page was slow. It wouldn’t even surprise me if most of our users didn’t notice any performance problem at all, much like many don’t notice the difference between a native- and an electron-app unless comparing them side-by-side. Well, the choppy animation was probably noticable, but that had only been in the app for a few days at this point.</p><p>In other words, the reason we were looking into this was because the performance didn’t meet our own high standards, not because it had become an actual problem… yet.</p><h4>Finding the problem</h4><p>Whenever there is a performance problem, we need to find a reproducible way to measure it. Since we already had a choppy animation, we simply triggered the animation a couple of times while <a href="https://www.debugbear.com/blog/devtools-performance">running the browser’s builtin profiler</a>.</p><p>Here are the results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*kmoDNrfg5ohbXzM0CXiRfA.png" /></figure><p>The root of our performance problem stems from a function called _Char_toCode, which is being called by two elm-css functions: VirtualDom.Styled.getClassnameand Hash.fromString.</p><p>Since our animation isn&#39;t using css animations (again, due to a bug in elm-animator), every stage of the animation causes the view function of our Elm application to run. It seems that the most expensive part of our view function is when elm-cssgenerates class names for the HTML elements.</p><p>If the only problem we had was a choppy animation, we could re-implement it using css animations ourselves. However, since the app had a general feeling of sluggishness, that wouldn’t fix all our issues.</p><p>One could attempt to refactor the code to use less HTML elements, thereby giving elm-css less work to do. But when looking over the code, we didn&#39;t find any obvious way to do this. Even if we did manage to remove a bunch of HTML elements without changing the look of the page, we had no idea if it would be enough to fix our performance problems.</p><h4>Enter Html.Lazy</h4><p>Elm’s HTML package has a series of functions in a module called<a href="https://package.elm-lang.org/packages/elm/html/latest/Html-Lazy">Html.Lazy</a>, which lets you change a view function so that it is only called when its input arguments have changed. If the input arguments are the same as when the view was previously called, it will tell Elm&#39;s virtual dom implementation to make no changes to this particular part of the DOM <em>without</em> calling the actual view function.</p><p>As an example, if we have a view function call like:</p><pre>viewSearchResults currentDate results</pre><p>We can make it lazy by re-writing it to:</p><pre>Html.lazy2 viewSearchResults currentDate results</pre><p>And then viewSearchResults will only be called if there is reason to believe that it could return something different, which potentially avoids all the computation that function has to make.</p><p>elm-css has its own implementation of Html.Lazy, <a href="https://package.elm-lang.org/packages/rtfeldman/elm-css/latest/Html-Styled-Lazy">Html.Styled.Lazy</a>, and so in theory we should be able to use this trick to give elm-css much less work to do, thereby increasing our performance.</p><p>It did turn out to be the case. Today, the search page runs much faster thanks to two lines of code that wrap our view functions using Html.Styled.Lazy. However, in order for those two lines to work we did have to do quite a bit of refactoring, in part because of how this optimization works.</p><h4>The problem with Html.Lazy</h4><p>The first thing we had to fix was some bad design on our part.</p><p>The search page is structured like two separate Elm views. The top part of the page, the search box, is one view while the results is another. It is natural then, that both of these top-level view functions become wrapped in Html.Styled.Lazy functions. This way, if you edit the text in the search box, the computation for rendering the search results can be skipped.</p><p>However, both view functions, as well as their sub-views, take in the entire model as an input argument. This doesn’t play too well with Html.Lazy, as the view will be triggered whenever the input changes, and since the input is the model itself, it will trigger every time one of its fields changes, even if that field isn’t being used in the particular view.</p><p>It’s also problematic from a code design perspective, as the views become very tightly coupled to the entire state of the application.</p><p>It took some time to refactor this so that both view functions had their own, entirely separate, piece of the model. Strangely enough, the application felt just as sluggish as before.</p><p>In order to reduce the amount of refactoring, we had opted not to change the actual application model, as that would also require us to refactor the init and update functions. Instead, we divided the application model into separate pieces as part of the view function itself.</p><p>So, where we previously had code like this:</p><pre>view : Model -&gt; Html msg<br>view model =<br>  div []<br>    [ viewSearchBox model<br>    , viewSearchResults model<br>    ]</pre><p>We now had:</p><pre>view : Model -&gt; Html msg<br>view model =<br>  let<br>    searchModel =<br>      { field1 = model.searchField1<br>      , field2 = model.searchField2 <br>      -- in reality, there were more fields<br>      }<br><br>    resultsModel =<br>      { field1 = model.resultsField1<br>      , field2 = model.resultsField2<br>      -- in reality, there were more fields<br>      }<br>  in<br>  div []<br>    [ Html.Styled.Lazy.lazy viewSearchBox searchModel<br>    , Html.Styled.Lazy.lazy viewSearchResults resultsModel<br>    ]</pre><p>But this didn’t work.</p><p>The reason is that functions in Html.Styled.Lazy have a different notion of equality than Elm itself. In general, two things are considered equal in Elm if they represent the same value. However, Html.Styled.Lazy functions only considers two things to be equal if they are the same <em>reference</em>.</p><p>Since we were constructing a new searchModel and resultsModel object every time the view function was being called, they would <em>never</em> be considered equal to the previous input by Html.Styled.Lazyfunctions, even if comparing them with Elm&#39;s == would return true.</p><p>Realizing this, we went on and refactored the application model to have its own search and results sub-models. This also required big changes to our init and update functions.</p><p>After another hour or two we had a better structured application. Not only that, the animation was running silky-smooth and there were no noticable delays anywhere.</p><h4>In conclusion</h4><p>Html.Lazy, or Html.Styled.Lazy, can be a great way to improve the performance of your Elm applications. However, it&#39;s an optimization that is easy to get wrong.</p><p>Html.Lazy is the only construct in Elm that has the notion of reference equality. Because of that, it&#39;s an easy thing to forget when refactoring code, especially if you&#39;re not working on the same code base on a daily basis. You now need to be careful to retain reference equality in update and view functions around your code, something you normally don’t need to worry about.</p><p>If you don’t, nothing bad happens other than performance becoming worse, which you may or may not notice depending on your hardware, if you have Elm’s debugger running, or the code paths you’re triggering.</p><p>So while Html.Lazy can be a great tool, I found myself wishing that elm-csswas fast enough that we didn&#39;t have to use this optimization at all.</p><p>Tomorrow, I’ll take a deep dive into the inner workings of elm-css and explain how I found a way to nearly double the performance of the framework.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c4298b72500d" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/fixing-a-performance-problem-in-elm-using-html-lazy-c4298b72500d">Fixing a performance problem in Elm using Html.Lazy</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Improving the performance of record updates]]></title>
            <link>https://blogg.bekk.no/improving-the-performance-of-record-updates-cb34cb6d4451?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/cb34cb6d4451</guid>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[performance]]></category>
            <category><![CDATA[elm]]></category>
            <category><![CDATA[compilers]]></category>
            <category><![CDATA[technology]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Wed, 01 Dec 2021 06:03:26 GMT</pubDate>
            <atom:updated>2021-12-01T06:03:26.531Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*tnBU8eG8IEvEiZZ9" /><figcaption>Photo by <a href="https://unsplash.com/@trommelkopf?utm_source=medium&amp;utm_medium=referral">Steve Harvey</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>I was trying to measure the performance impact of monomorphising custom types on a typical update function, when I came across a discovery that would lead to the biggest performance improvement I’ve yet to find.</p><p>The code I was benchmarking, was this:</p><pre>type alias Model =<br>    { someString : String<br>    , someNum : Int<br>    , sortKey : ( String, Int )<br>    }<br><br><br>emptyRecord : Model<br>emptyRecord =<br>    { someString = &quot;Str&quot;<br>    , someNum = 0<br>    , sortKey = ( &quot;Str&quot;, 0 )<br>    }<br><br><br>type Msg<br>    = SetString String<br>    | SetAll String Int<br><br><br>update : Msg -&gt; Model -&gt; Model<br>update msg model =<br>    case msg of<br>        SetString val -&gt;<br>            { model<br>                | someString = val<br>                , sortKey = ( val, model.someNum )<br>            }<br><br>        SetAll str int -&gt;<br>            { model<br>                | someString = str<br>                , someNum = int<br>                , sortKey = ( str, int )<br>            }<br><br><br>range : List Int<br>range =<br>    List.range 0 100<br><br><br>suite : Benchmark<br>suite =<br>    describe &quot;Updates&quot;<br>        [ benchmark &quot;Update&quot; &lt;|<br>            \_ -&gt;<br>                List.foldl updater emptyRecord range<br>        ]<br><br><br>updater : Int -&gt; Model -&gt; Model<br>updater idx rec =<br>    rec<br>        |&gt; update (SetString (&quot;String&quot; ++ String.fromInt idx))<br>        |&gt; update (SetAll (&quot;Str-&quot; ++ String.fromInt idx) idx)</pre><p>The update function is called 100 times with each message type and the previous model as parameters. Running this through deoptigate, reveals something interesting:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*AjzBHaSDzyZEaPRVtux1Cg.png" /></figure><p>As expected, the $ property of the message is considered to be polymorphic. What is not expected, is that the inline cache for model.O, or model.someNum, is in a polymorphic state. The model never changes shape, so why does this happen?</p><h3>Uncovering a hidden shape change</h3><p>The answer lies in how Elm perform record updates. Let’s take a closer look at the builtin update function:</p><pre>function _Utils_update(oldRecord, updatedFields)<br>{<br>	var newRecord = {};<br><br>	for (var key in oldRecord)<br>	{<br>		newRecord[key] = oldRecord[key];<br>	}<br><br>	for (var key in updatedFields)<br>	{<br>		newRecord[key] = updatedFields[key];<br>	}<br><br>	return newRecord;<br>}</pre><p>Elm is immutable, so when updating a record we first have to make a copy of the original record. Once we have a copy, we can overwrite some fields with new values.</p><p>The problem here is that when making the copy, Elm starts by making a new empty object, which naturally doesn’t have the same shape as the object we wish to copy. It then proceeds to add properties to this new empty object, each time changing the object’s shape. In the end, we have a new object that contains the same values as our original object, but the shapes are different because they were constructed differently.</p><p>This means that if you have a function that works on a certain kind of record, and that function is passed a record which has been updated at some point, all property access on the record will be polymorphic.</p><p>Another problem with how Elm performs record updates, is that two loops with a bunch of property access (most of which are likely to be polymorphic, or even megamorphic) is bound to be slower than just creating the object directly using an object literal.</p><p>If we could replace all record updates with the equivalent of an object literal expression, we would not only improve the performance of record updates, but also improve property access everywhere on an updated record.</p><h3>Optimizing record updates</h3><p>elm-optimize-level-2 is a JavaScript to JavaScript transpiler, as such it doesn&#39;t have access to Elm&#39;s type information. Fortunately, the Elm compiler outputs object literals with short, but unique, keys. So two records should only have the same shape if they&#39;re also the same type of record.</p><p>We can make use of this by first traversing all the JavaScript output and registering the different record shapes. With that information, we can create top-level constructor functions. So, if we find an object literal like:</p><pre>var person = { o: &quot;Name&quot;, iJ: 32 };</pre><p>We can generate a function at the top of the file which looks like:</p><pre>function $$Record1(a, b) {<br>  this.o = a;<br>  this.iJ = b;<br>}</pre><p>Then, we need to replace all object literals with calls to this new constructor function, like:</p><pre>var person = new $$Record1(&quot;Name&quot;, 32);</pre><p>What does this give us? Now that we have converted all records to the equivalent of a JavaScript class, we can attach methods to it. A useful operation to have is a clone method:</p><pre>$$Record1.prototype.$clone = function() {<br>  return new $$Record1(this.o, this.iJ);<br>}</pre><p>With this, we can convert record update expression that originally look like:</p><pre>var newPerson = _Utils_update(person, { o: &quot;New name&quot; });</pre><p>into:</p><pre>var newPerson = (function()<br>  var $c = person.$clone()<br>  $c.o = &quot;New name&quot;;<br>  return $c;<br>)();</pre><p>And with that, all record updates should not only be faster, but all property access on updated records should now also be monomorphic.</p><h4>Results</h4><p>Running the benchmark gives us the following:</p><pre>Record Update<br>   <br>   safari, original         15,649 runs/sec <br>   safari, altered          90,779 runs/sec  (580%)<br>   <br>   firefox, original        33,058 runs/sec <br>   firefox, altered         99,995 runs/sec  (302%)<br>   <br>   chrome, original         31,714 runs/sec <br>   chrome, altered          267,819 runs/sec (844%)</pre><p>That’s… crazy.</p><p>This transformation is already in elm-optimize-level-2 and can be enabled by passing a -O3 flag. Why is the optimization hidden behind a flag, you ask?</p><p>As you might have guessed, the transformation does increase the size of the final JavaScript bundle. In my testing, the asset size is increased by roughly 5%. This number will vary depending on how many types of records you perform updates on, however.</p><p>This concludes my work on optimizing Elm’s runtime. I hope it has been both fun and educational. Should you need better performance in your Elm applications, give elm-optimize-level-2 a try.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cb34cb6d4451" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/improving-the-performance-of-record-updates-cb34cb6d4451">Improving the performance of record updates</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Improving the performance of Custom Types]]></title>
            <link>https://blogg.bekk.no/improving-the-performance-of-custom-types-39f7e2a1d8e1?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/39f7e2a1d8e1</guid>
            <category><![CDATA[performance]]></category>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[compilers]]></category>
            <category><![CDATA[elm]]></category>
            <category><![CDATA[technology]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Tue, 30 Nov 2021 06:01:08 GMT</pubDate>
            <atom:updated>2021-11-30T06:01:08.336Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*pO6RU-1pqViICdKr" /><figcaption>Photo by <a href="https://unsplash.com/@kellysikkema?utm_source=medium&amp;utm_medium=referral">Kelly Sikkema</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>While benchmarking my attempts at improving the performance of function calls, I noticed something interesting in deoptigate:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*tBjOXlx-v4GZKPTqV9qeOw.png" /></figure><p>This particular snippet of JavaScript, is the output of the following Elm code in the core library:</p><pre>get : comparable -&gt; Dict comparable v -&gt; Maybe v<br>get targetKey dict =<br>  case dict of<br>    RBEmpty_elm_builtin -&gt;<br>      Nothing<br><br>    RBNode_elm_builtin _ key value left right -&gt;<br>      case compare targetKey key of<br>        LT -&gt;<br>          get targetKey left<br><br>        EQ -&gt;<br>          Just value<br><br>        GT -&gt;<br>          get targetKey right</pre><p>It seems that dict is not monomorphic (one shape), but can be one of several shapes when this pattern match is executed. If you remember my previous article on <a href="https://blogg.bekk.no/how-javascript-engines-achieve-great-performance-fb0b36601557">how JavaScript engines optimize code</a>, you&#39;ll know that this code is not optimized as well as it could be.</p><p>But <em>why</em> does the JavaScript engine consider dict to be polymorphic? Let&#39;s take a look at how Dict is defined in Elm:</p><pre>type Dict k v<br>    = RBNode_elm_builtin NColor k v (Dict k v) (Dict k v)<br>    | RBEmpty_elm_builtin</pre><p>Not surprising, considering the pattern matching we just saw. This type definition compiles into the following constructor functions:</p><pre>var $elm$core$Dict$RBEmpty_elm_builtin = {$: &#39;RBEmpty_elm_builtin&#39;};<br><br><br>var $elm$core$Dict$RBNode_elm_builtin = F5(<br>  function (a, b, c, d, e) {<br>    return {$: &#39;RBNode_elm_builtin&#39;, a: a, b: b, c: c, d: d, e: e};<br>  });</pre><p>The different constructor functions for Dict returns JavaScript objects of different shapes. The fact that the .$ property exists and is defined first in both, doesn&#39;t matter. The JavaScript engine view these objects as different.</p><h4>What can be done?</h4><p>What we need is for all constructors of a custom type to return the same kind of shape. This can easily be done by adding extra, unused, properties to some constructors so that every constructor returns a shape with the same number of properties.</p><p>The problem with this obvious technique, though, is that it will increase the size of the compiled output. Fortunately, it won’t be by much. A few extra bytes per constructor. Still, we should make sure it has a noticeable effect on performance before performing this transform on all custom types.</p><h4>Measuring the difference</h4><p>The interesting thing about this optimization, is that it can be applied entirely in Elm code. To test this, we’re going to define our own List types. First, we define a faithful implementation of Elm&#39;s builtin List like so:</p><pre>type SlowList a<br>    = SlowEmpty<br>    | SlowFilled a (SlowList a)<br><br><br>slowFromList : List a -&gt; SlowList a<br>slowFromList ls =<br>    List.foldr (\el acc -&gt; SlowFilled el acc) SlowEmpty ls<br><br><br>sampleSlowList : SlowList String<br>sampleSlowList =<br>    List.range 0 1000<br>        |&gt; List.map String.fromInt<br>        |&gt; slowFromList<br><br><br>slowCount : SlowList a -&gt; Int<br>slowCount ls =<br>    slowCountHelp 0 ls<br><br><br>slowCountHelp : Int -&gt; SlowList a -&gt; Int<br>slowCountHelp count list =<br>    case list of<br>        SlowEmpty -&gt;<br>            count<br><br>        SlowFilled _ rest -&gt;<br>            slowCountHelp (count + 1) rest</pre><p>We’re going to use sampleSlowList with slowCount for the basis of our performance test. Effectively, we&#39;re going to measure how quickly one can iterate through a list.</p><p>Then, we’re going to define a very similar type, but where every constructor has the same number of properties:</p><pre>type FastList a<br>    = FastEmpty () ()<br>    | FastFilled a (FastList a)</pre><p>We then define a sample list and a count function, and run a benchmark were we compare these two implementations.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mFQ0S_0WAW4mWzOoQKJjfQ.png" /></figure><p>FastList is 7% faster in Chrome (top-left) and 5% faster in Safari (bottom). Interestingly, it&#39;s a whopping 53% faster in Firefox (top-right)!</p><h4>Where to go from here</h4><p>elm-optimize-level-2 already supports this transformation, but only for a few types (List and Dict). This optimization increases the size of the compiled JavaScript file, and probably matters most in hot loops, which is common for recursive data structures like List and Dict.</p><p>It might be worth considering if monomorphising custom types can be enabled through a flag in the future.</p><p>For our next, and final, article we’ll explore how we can improve the performance of record updates. This particular performance improvement will prove to be the biggest improvement of them all.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=39f7e2a1d8e1" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/improving-the-performance-of-custom-types-39f7e2a1d8e1">Improving the performance of Custom Types</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Successfully improving function call performance]]></title>
            <link>https://blogg.bekk.no/successfully-improving-function-call-performance-e61a41dbaf7f?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/e61a41dbaf7f</guid>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[compilers]]></category>
            <category><![CDATA[performance]]></category>
            <category><![CDATA[elm]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Mon, 29 Nov 2021 06:02:55 GMT</pubDate>
            <atom:updated>2021-11-29T06:02:55.555Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*52RG75UmJie9BGwZ" /><figcaption>Photo by <a href="https://unsplash.com/@bradencollum?utm_source=medium&amp;utm_medium=referral">Braden Collum</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>We haven’t had much luck in our attempts to improve the runtime performance of Elm’s function calls, but there’s still one thing left to try. What happens when we combine the fast-path approach of our previous attempt, with the slow-path approach of Elm’s current implementation?</p><h3>Fail again. Fail better.</h3><p>In our previous attempts, we’ve tried making the fast path of executing a function faster, or at least remain as fast as it was. However, this has come at the expense of making the curried/slow path slower, and the solution in general more complex. What if we try to improve the fast path <em>without</em> doing anything that affects the slow path?</p><p>Let’s look at the implementation of F3 again:</p><pre>function F3(fun) {<br>  return F(3, fun, function(a) {<br>    return function(b) { <br>      return function(c) { return fun(a, b, c); }; <br>    };<br>  });<br>}</pre><pre>function F(arity, fun, wrapper) {<br>  wrapper.a = arity;<br>  wrapper.f = fun;<br>  return wrapper;<br>}</pre><p>Then the implementation of A3:</p><pre>function A3(fun, a, b, c) {<br>  return fun.a === 3 ? fun.f(a, b, c) : fun(a)(b)(c);<br>}</pre><p>There are a couple of things to note:</p><ul><li>Functions with arity 2–9 all have the same shape. They are curried, with a reference, the original arity, and the original function definition.</li><li>When calling a function through an Ax helper, the JavaScript engine has to do the following steps:</li><li>Does the function object have an .a property? (single-arity functions do not)</li><li>If so, does it equal x (3 in our example)</li><li>If either above statements are false, perform curried call.</li><li>If both statements are true, perform direct call.</li></ul><p><a href="https://blogg.bekk.no/making-elms-a-functions-more-predictable-a-failed-experiment-2ea4b86b5345">In our previous article</a>, we tried to encode the arity of the function into the shape of the function itself by storing the original function in an .ax property on the curried version, thus not needing an arity or an arity check at all. Let&#39;s try a minimal version of that.</p><p>We change the definition of F3:</p><pre>function F3(fn) {<br>  var wrapper = function(a) { <br>    return function(b) { <br>      return function(c) { return fn(a, b, c); };<br>    };<br>  };</pre><pre>  wrapper.a3 = fn;<br>  return wrapper;<br>}</pre><p>And we change the definition of A3 to become:</p><pre>function A3(fun, a, b, c) {<br>  return fun.a3 ? fun.a3(a, b, c) : fun(a)(b)(c);<br>}</pre><p>This is a tiny change, but it has several benefits:</p><ul><li>We no longer store the arity on the function object. This saves some bytes of memory for every function definition.</li><li>Functions of different arities now have different shapes, which might make it easier for the JavaScript engine to tell them apart, which could prove useful for optimization.</li><li>Calling a function involves fewer steps:</li><li>Does the function object have an .a3 property? (only 3-arity function objects do)</li><li>If so, call it</li><li>Otherwise, perform curried application</li></ul><h3>How does it perform?</h3><p>In our micro-benchmark, we get the following results:</p><pre>0 arg<br>   safari, original       388,205 runs/sec <br>   safari, altered        374,644 runs/sec  (97%)<br>   <br>   firefox, original      51,116 runs/sec <br>   firefox, altered       55,481 runs/sec  (109%)<br>   <br>   chrome, original       359,688 runs/sec <br>   chrome, altered        358,539 runs/sec  (100%)<br><br><br>1 arg<br>   safari, original       56,986 runs/sec <br>   safari, altered        57,290 runs/sec  (101%)<br>   <br>   firefox, original      24,553 runs/sec <br>   firefox, altered       26,536 runs/sec  (108%)<br>   <br>   chrome, original       107,497 runs/sec <br>   chrome, altered        158,280 runs/sec  (147%)<br><br><br>2 args<br>   safari, original       50,347 runs/sec <br>   safari, altered        58,018 runs/sec  (115%)<br>   <br>   firefox, original      26,906 runs/sec <br>   firefox, altered       29,726 runs/sec  (110%)<br>   <br>   chrome, original       29,475 runs/sec <br>   chrome, altered        31,875 runs/sec  (108%)<br><br><br>3 args<br>   safari, original       56,310 runs/sec <br>   safari, altered        55,328 runs/sec  (98%)<br>   <br>   firefox, original      26,267 runs/sec <br>   firefox, altered       28,941 runs/sec  (110%)<br>   <br>   chrome, original       29,496 runs/sec <br>   chrome, altered        32,083 runs/sec  (109%)</pre><p>Seems to be a good improvement. In the few cases where have a regression, the regression is small enough to be within the margin of error.</p><p>Now let’s see the html-benchmark:</p><pre>Create a 4 level nested html tree<br>   <br>   safari, original       46,377 runs/sec <br>   safari, altered        49,049 runs/sec  (106%)<br>   <br>   firefox, original      28,669 runs/sec <br>   firefox, altered       27,927 runs/sec  (97%)<br>   <br>   chrome, original       42,115 runs/sec <br>   chrome, altered        44,695 runs/sec  (106%)</pre><p>Firefox seems to show a slight regression here, but Safari and Chrome both sees a 6% improvement. I think we have a winner.</p><h3>Is that it?</h3><p>Keep in mind that elm-optimize-level-2 already contains an optimization that converts Ax calls to direct function calls when it can statically determine that it is safe to do so.</p><p>As such, this improvement will mostly run inside higher-order functions, where it is never safe to assume the arity of a function. Together, these two optimizations can deliver performance improvements of 20–30% for certain kinds of code.</p><h3>Conclusion</h3><p>It seems Elm already had a pretty good implementation of function calls, considering that it needs to support currying. Still, we have managed to find a way to make it faster.</p><p>This concludes my work on improving function calls. This work is available in the latest version of elm-optimize-level-2.</p><p>That doesn’t mean it’s the end of this series, though. There are other aspects of the Elm runtime that could improve.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e61a41dbaf7f" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/successfully-improving-function-call-performance-e61a41dbaf7f">Successfully improving function call performance</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Making Elm’s A-functions more predictable, a failed experiment]]></title>
            <link>https://blogg.bekk.no/making-elms-a-functions-more-predictable-a-failed-experiment-2ea4b86b5345?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/2ea4b86b5345</guid>
            <category><![CDATA[elm]]></category>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[compilers]]></category>
            <category><![CDATA[performance]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Fri, 26 Nov 2021 06:01:07 GMT</pubDate>
            <atom:updated>2021-11-26T06:01:07.686Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*MCwVOjSfxNMbrbI5" /><figcaption>Photo by <a href="https://unsplash.com/@etiennegirardet?utm_source=medium&amp;utm_medium=referral">Etienne Girardet</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>In the previous article, we investigated the possibility of removing F-functions. Today, we’re going to focus on A-functions.</p><p>Removing A-functions isn’t particularly hard. One can simply always perform single-arity function calls. However, performing a direct function call is always faster, and so keeping some overhead around function calls is likely worth it.</p><p>But we can definitely try to make that overhead as small as possible.</p><h4>Removing the arity check</h4><p>Let’s begin by looking at the implementation of an A-function:</p><pre>function A3(fun, a, b, c) {<br>  return fun.a === 3 ? fun.f(a, b, c) : fun(a)(b)(c);<br>}</pre><p>Currently, the decision of whether a function call should be direct or curried, is decided by a number property on the curried version of the function. Do we really need to check this number, though? Is there some other way to dynamically decide how a function should be called, in a way that is also easier for the JavaScript engine to optimize?</p><p>We know that JavaScript engines makes use of hidden classes and inline caches to make optimization decisions based on the shape of an object. What if we somehow encode how a function should be called based on the number of arguments passed, right into the shape of the object itself?</p><p>Since everything in JavaScript is an object, it might be that an object-oriented technique could help us out. If we could define a Callable interface that all functions must implement, with methods that represents calling the function with arities 2-9, the JavaScript engine might be able to optimize that very well based on runtime information.</p><h4>A prototype implementation</h4><p>I’m not going to post the entire implementation, as it is quite repetitive. Instead, I’m going to focus on the implementation for 3-arity functions.</p><p>We’re first going to define a JavaScript prototype object that defines how a function call with 2, 4, 5, 6, 7, 8 and 9 arguments should behave:</p><pre>var F3Proto = {<br>    a2: function(a, b) {<br>        return F1(this.a3.bind(null, a, b));<br>    },<br>    a4: function(a, b, c, d) {<br>        return this.a3(a, b, c)(d);<br>    },<br>    a5: function(a, b, c, d, e) {<br>        return this.a3(a, b, c).a2(d, e);<br>    },<br>    a6: function(a, b, c, d, e, f) {<br>        return this.a3(a, b, c).a3(d, e, f);<br>    },<br>    a7: function(a, b, c, d, e, f, g) {<br>        return this.a3(a, b, c).a4(d, e, f, g);<br>    },<br>    a8: function(a, b, c, d, e, f, g, h) {<br>        return this.a3(a, b, c).a5(d, e, f, g, h);<br>    },<br>    a9: function(a, b, c, d, e, f, g, h, i) {<br>        return this.a3(a, b, c).a6(d, e, f, g, h, i);<br>    }<br>};<br><br>Object.setPrototypeOf(F3Proto, Function.prototype);</pre><p>If you’re not aware, a JavaScript prototype is just an object which contains properties that can be shared among many object instances. Whenever you access a property on an object, the object will look up the property in its associated prototype if the property doesn’t exist in the object itself.</p><p>In the code above, we define a regular object, F3Proto, that contains a function for each arity we might end up calling on a function. We then set the prototype of F3Proto to the Function.prototype object. This means that an object, which has F3Proto as its prototype, also has the same prototype as any other function. This is how inheritance in JavaScript works.</p><p>When you look at the F3Proto object, you&#39;ll see that there are a bunch of references being made to this.a3, which is the function that handles a call being made with exactly three arguments. This next snippet of code might shed some light as to why that is.</p><pre>function F3(fun) {<br>    var wrapper = function(a) { <br>        return F2(fun.bind(null, a));<br>    };<br>    Object.setPrototypeOf(wrapper, F3Proto);<br>    wrapper.a3 = fun;<br>    return wrapper;<br>}</pre><p>For this scheme to work, we need to redefine all the F-functions. We start by creating a curried wrapper function. This curried version is different from the original implementation in that it returns an F-wrapped partially applied function. I’ll explain why in a bit.</p><p>Once we have a curried version of our function, we set its prototype to be F3Proto, and then we set a3 to be the original function.</p><p>Finally, all function calls that goes through an A-function, needs to be changed from:</p><pre>A3(fun, a, b, c);</pre><p>to:</p><pre>fun.a3(a, b, c);</pre><p>So, the way this works is that calling a 3-arity function with fun.a3(...) will execute the original function directly. Calling any other of the .ax functions will either partially apply .a3 or call it and then call a .ax function on its result. This means that every function needs these methods, which is why we need to F-wrap partially applied functions as well.</p><p>In the case where only one type of object is involved, the JavaScript engine should in theory be able to optimize this very well. In the case where multiple type of objects are involved, the JavaScript engine might need to a work a little harder to figure out which function should be called.</p><p>As we saw in the last article, doing something more complex than an if-then-else might give us worse performance, even though the more complex thing should lead to a more direct course of action.</p><h4>How does this perform?</h4><p>The results are mixed in our micro-benchmark. One can’t really make a definitive statement here whether this optimization is good or bad.</p><pre>0 arg<br>   safari, original       149,340 runs/sec <br>   safari, altered        411,929 runs/sec  (276%)<br>   <br>   firefox, original      41,672 runs/sec <br>   firefox, altered       65,770 runs/sec  (158%)<br>   <br>   chrome, original       378,142 runs/sec <br>   chrome, altered        331,656 runs/sec  (88%)<br><br><br>1 arg<br>   safari, original       53,110 runs/sec <br>   safari, altered        23,689 runs/sec  (45%)<br>   <br>   firefox, original      24,666 runs/sec <br>   firefox, altered       43,085 runs/sec  (175%)<br>   <br>   chrome, original       109,962 runs/sec <br>   chrome, altered        71,177 runs/sec  (65%)<br><br><br>2 args<br>   safari, originalv      52,452 runs/sec <br>   safari, altered        23,390 runs/sec  (45%)<br>   <br>   firefox, original      27,921 runs/sec <br>   firefox, altered       51,456 runs/sec  (184%)<br>   <br>   chrome, original       29,650 runs/sec <br>   chrome, altered        66,603 runs/sec  (225%)<br><br><br>3 args<br>   safari, original       47,531 runs/sec <br>   safari, altered        22,663 runs/sec  (48%)<br>   <br>   firefox, original      27,036 runs/sec <br>   firefox, altered       15,779 runs/sec  (58%)<br>   <br>   chrome, original       30,482 runs/sec <br>   chrome, altered        62,658 runs/sec  (206%)</pre><p>It gets even worse in our more realistic html benchmark:</p><pre>   safari, original        49,239 runs/sec <br>   safari, altered         6,669 runs/sec  (14%)<br>   <br>   firefox, original       27,703 runs/sec <br>   firefox, altered        12,676 runs/sec  (46%)<br>   <br>   chrome, original        45,518 runs/sec <br>   chrome, altered         37,751 runs/sec  (83%)</pre><p>Why are the results so bad?</p><h4>Searching for a cause</h4><p>Running the two version through deoptigate gives us a hint. The unaltered version seems to be better at optimizing functions, and makes use of fewer inline caches:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FFxwk7ipW6yUBbsHGKLGJQ.png" /><figcaption>Summary window of deoptigate, numbers for unaltered Elm code</figcaption></figure><p>Whereas the altered version isn’t able to optimize as many functions, despite using many more inline caches:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*oIA48MYAK4-se1Ke-fjSHQ.png" /><figcaption>Numbers for altered Elm code. Note the number of inline caches.</figcaption></figure><p>The new code transformation we’ve experimented with here, causes many more property accesses than before. Each property access causes the JavaScript engine to insert an inline cache, hoping that it will prove beneficial when optimizing code later on.</p><p>Since each function now has a prototype that depends on the arity the function is defined with, it’s also more likely that the optimizing compiler has to insert type checks where it previously didn’t, because an arity-2 function is now considered to be a different type than an arity-3 function. In a tight loop, this might make a noticable difference in performance. At least, that’s the only explanation I can think of.</p><h4>Conclusion</h4><p>The goal was to speed up function calls by removing the need to perform an arity check first, trying instead to delegate to the JavaScript engine how to best perform the function call. It seems this proved harder to optimize than originally hoped.</p><p>In our next article we’ll be looking at a successful attempt at improving the runtime performance of Elm’s function calls.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2ea4b86b5345" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/making-elms-a-functions-more-predictable-a-failed-experiment-2ea4b86b5345">Making Elm’s A-functions more predictable, a failed experiment</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Removing Elm’s F-functions, a failed experiment]]></title>
            <link>https://blogg.bekk.no/removing-elms-f-functions-a-failed-experiment-357599e9c4d7?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/357599e9c4d7</guid>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[elm]]></category>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[performance]]></category>
            <category><![CDATA[compilers]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Thu, 25 Nov 2021 06:02:40 GMT</pubDate>
            <atom:updated>2021-11-25T06:02:40.473Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*YP9YMat7oullK19k" /><figcaption>Photo by <a href="https://unsplash.com/@sigmund?utm_source=medium&amp;utm_medium=referral">Sigmund</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>To make curried functions fast, Elm makes use of two helper functions. One set of functions, the F-functions, are used when defining functions. The other set of functions, the A-functions, are used when applying/calling functions.</p><p>Ideally, we wouldn’t need any of them. Both sets of functions cause a level of indirection that make it harder for the JavaScript engine to optimize. It also increases the size of the generated code.</p><p>In this article, we’ll see if it is possible to remove Elm’s usage of F-functions, and how that affects performance.</p><h4>What’s in a function?</h4><p>In JavaScript, everything is an object. This also applies to functions. When starting this exploration I questioned if the function object contained any properties or methods that might help me get rid of F-functions.</p><p>It might be a good idea to refresh our knowledge of what a F-function does. The implementation for F3 is as follows:</p><pre>function F3(fun) {<br>  return F(3, fun, function(a) {<br>    return function(b) { <br>      return function(c) { return fun(a, b, c); }; <br>    };<br>  });<br>}</pre><p>The implementation for F is:</p><pre>function F(arity, fun, wrapper) {<br>  wrapper.a = arity;<br>  wrapper.f = fun;<br>  return wrapper;<br>}</pre><p>F-functions create a curried version of a function, but stores both the arity and the original function on the curried version so that we, at runtime, can perform a direct function call if possible.</p><p>We’re going to need two things. One way of knowing the arity of the function at runtime, and one way of partially applying a function. Luckily, the function object provides the capabilities to do just that.</p><p>The Mozilla developer network (MDN) documents the function object nicely. Here, we find that Function.length gives us the arity of the function. We also learn that Function.prototype.bind can be used for partial application.</p><h4>Implementation</h4><p>F-functions can all be removed. To make this work, we need to change the implementation of A-functions. The current implementation of A3 is:</p><pre>function A3(fun, a, b, c) {<br>  return fun.a === 3 ? fun.f(a, b, c) : fun(a)(b)(c);<br>}</pre><p>If the function takes three arguments, and is called with three arguments, call the original function directly. Otherwise, perform partial application. In an F-less world, it could be implemented as:</p><pre>function A3(fun, a, b, c) {<br>  return fun.length === 3 ? fun(a, b, c) : A2(A1(fun, a), b, c);<br>}</pre><p>You might be surprised by the last part of this implementation. Why perform an A1 followed by an A2? Since we only know that the function doesn&#39;t take exactly three arguments, we need to consider that the function could take one argument, and return a function that takes two arguments.</p><p>The implementation of A1 is:</p><pre>function A1(fun, a) {<br>  return fun.length === 1 ? fun(a) : fun.bind(null, a);<br>}</pre><p>A1 will either call the function, or partially apply it. A2 will either call, or partially apply through another call to A1, the result of the A1 function call.</p><p>There are two additional things to keep in mind. In our implementation, we’re creating new functions through partial application whenever we cannot call a function directly. In the original implementation, all the partially applied steps are performed when the function is defined. This means that the JavaScript engine might end up performing more work to partially apply functions in our version.</p><p>Finally, the original implementation has no need for A1, as all functions can always be called with a single argument. This means that single-arity functions might be more expensive in our implementation.</p><h4>How does it perform?</h4><p>In our micro-benchmark, defined in the previous article, we get these numbers:</p><pre>Micro benchmark<br><br>0 arg<br>   safari, original         358,689 runs/sec <br>   safari, altered          399,651 runs/sec  (111%)<br>   <br>   firefox, original        43,995 runs/sec <br>   firefox, altered         55,188 runs/sec  (125%)<br>   <br>   chrome, original         367,836 runs/sec <br>   chrome, altered          49,127 runs/sec  (13%)<br><br><br>1 arg<br>   safari, original         54,808 runs/sec <br>   safari, altered          71,638 runs/sec  (131%)</pre><pre>   firefox, original        24,704 runs/sec <br>   firefox, altered         33,401 runs/sec  (135%)<br>   <br>   chrome, original         110,183 runs/sec <br>   chrome, altered          28,236 runs/sec  (26%)<br><br><br>2 args<br>   safari, original         55,041 runs/sec <br>   safari, altered          42,223 runs/sec  (77%)<br>   <br>   firefox, original        27,288 runs/sec <br>   firefox, altered         28,942 runs/sec  (106%)<br>   <br>   chrome, original         29,486 runs/sec <br>   chrome, altered          24,315 runs/sec  (82%)</pre><p>The results are mixed. Chrome doesn’t seem to handle this new implementation very well, but the other browsers are sometimes faster, sometimes slower.</p><p>A micro-benchmark can only give us an indication of how an optimization will perform in the real world, however. This can be helpful while developing the changes, as a micro-benchmark will run quicker, but to pass a final verdict on whether the optimization is better or worse, we need a more realistic benchmark.</p><p>When running a benchmark that creates a small HTML structure, we get the following:</p><pre>Create a 4 level nested html tree</pre><pre>   safari, original         49,455 runs/sec <br>   safari, altered          15,628 runs/sec  (32%)<br>   <br>   firefox, original        27,775 runs/sec <br>   firefox, altered         20,500 runs/sec  (74%)<br>   <br>   chrome, original         44,796 runs/sec <br>   chrome, altered          14,793 runs/sec  (33%)</pre><p>The results are consistently bad.</p><p>Why is it slower? A reasonable guess is that when a direct function call cannot be made, the partial application path is more complex, and therefore slower. Each Ax call performs an arity check and either triggers a direct call or a partial application, whereas in the original implementation we just perform a series of single-arity function calls without any ifs or buts.</p><p>In addition, we also have a A1 call where we didn&#39;t before. This is likely to have an impact on the generated code in general.</p><h4>Alternative implementation</h4><p>One thing that might make our implementation slower, is that we need to delegate to additional A-function calls to perform partial application. We can reduce this somewhat by using a switch instead of a ternary operation:</p><pre>function A3(fun, a, b, c) {<br>  switch (fun.length) {<br>    case 1: return A2(fun(a), b, c);<br>    case 2: return A1(fun(a, b), c);<br>    case 3: return fun(a, b, c);<br>    default: return fun.bind(null, a, b, c);<br>  }<br>}</pre><p>However, this turns out to be slower. First, the micro-benchmark results:</p><pre>0 arg<br>   safari, original       375,487 runs/sec <br>   safari, altered        118,251 runs/sec  (31%)<br>   <br>   firefox, original      38,178 runs/sec <br>   firefox, altered       50,135 runs/sec  (131%)<br>   <br>   chrome, original       360,408 runs/sec <br>   chrome, altered        48,727 runs/sec  (14%)<br><br><br>1 arg<br>   safari, original       59,749 runs/sec <br>   safari, altered        69,867 runs/sec  (117%)<br>   <br>   firefox, original      23,462 runs/sec <br>   firefox, altered       33,109 runs/sec  (141%)<br>   <br>   chrome, original       109,235 runs/sec <br>   chrome, altered        28,514 runs/sec  (26%)<br><br><br>2 args<br>   safari, original       56,593 runs/sec <br>   safari, altered        65,713 runs/sec  (116%)<br>   <br>   firefox, original      25,849 runs/sec <br>   firefox, altered       37,064 runs/sec  (143%)<br>   <br>   chrome, original       29,875 runs/sec <br>   chrome, altered        26,956 runs/sec  (90%)</pre><p>Then the html-benchmark results:</p><pre>   safari, original        47,307 runs/sec <br>   safari, altered         13,971 runs/sec  (30%)<br>   <br>   firefox, original       28,245 runs/sec <br>   firefox, altered        19,993 runs/sec  (71%)<br>   <br>   chrome, original        45,198 runs/sec <br>   chrome, altered         15,183 runs/sec  (34%)</pre><p>A switch is a more complicated way of branching compared to a simple ternary operation. Even if the correct branch in a switch can trigger a less complex path, the switch itself is more complicated for the JavaScript engine to optimize.</p><h4>Conclusion</h4><p>It is entirely possible to remove the F-functions from Elm. This would result in lower asset size and reduced memory consumption, as less information is allocated per function. However, it comes at a steep performance cost.</p><p>It seems that having these F-functions is better for the performance of Elm applications overall.</p><p>In our next article, we’ll try to see if we can remove A-functions.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=357599e9c4d7" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/removing-elms-f-functions-a-failed-experiment-357599e9c4d7">Removing Elm’s F-functions, a failed experiment</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How Elm functions work]]></title>
            <link>https://blogg.bekk.no/how-elm-functions-work-71cab7426a2f?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/71cab7426a2f</guid>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[elm]]></category>
            <category><![CDATA[compilers]]></category>
            <category><![CDATA[performance]]></category>
            <category><![CDATA[technology]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Wed, 24 Nov 2021 06:02:35 GMT</pubDate>
            <atom:updated>2021-11-24T06:02:35.623Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*0PeoAneujcz3vox-" /><figcaption>Photo by <a href="https://unsplash.com/@avirichards?utm_source=medium&amp;utm_medium=referral">Avi Richards</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>Before we can explore how Elm function calls can be improved, we first have to understand how they work. This last sentence might seem strange to you. Functions are functions, right? Well, not quite.</p><p>Elm has an interesting property known as automatic currying. To curry a function means to transform it so that it only takes a single argument and returns a function which takes the second argument, and so on until the last argument is provided.</p><p>This might be easier to grasp when looking at code. If we have a function like this:</p><pre>add3 a b c =<br>  a + b + c</pre><p>The curried version would look like:</p><pre>add3 =<br>  \a -&gt;<br>    \b -&gt;<br>      \c -&gt; <br>        a + b + c</pre><p>Since Elm has automatic currying, <em>all</em> functions work this way. Similarly, when you write a function call, like this:</p><pre>add3 1 2 3</pre><p>It’s the same as writing:</p><pre>(((add3 1) 2) 3)</pre><p>Since this is not an Elm tutorial, I’m not going to delve into why this is a good thing. What I will say is that this makes it easy to create a shorter version of any given function by pre-applying default arguments. If you, as an example, wanted a function that adds two numbers together and then increments it by one, it could be defined as simply as:</p><pre>addAndIncrement = add3 1</pre><h4>Optimization opportunities</h4><p>If you’re thinking that performing multiple function calls when calling a multiple-arity function leads to slow execution, you’d be right. There are two main reasons for this:</p><ol><li>Even though function calls are very fast on modern CPU’s, it still carries some overhead. Performing three function calls when you could have performed one is going to be slower.</li><li>Functions returning functions will be hard for the Javascript engine to inline, which might lead to missed optimization opportunities.</li></ol><p>The Elm compiler does have a trick up its sleeve. The add3 function we defined above is compiled to the following JavaScript:</p><pre>var $author$package$module$add3 = F3(function(a, b, c) {<br>  return a + b + c;<br>});</pre><p>The $author$package$module prefix to the function name is the namespace the function is in. The Elm compiler outputs all functions in an application into the same scope, and uses the fully qualified name when referencing functions. Normally, JavaScript applications store namespaces in objects. By outputting all functions in the same scope, we avoid dynamic lookups of functions and constants, which should be faster. It also leads to smaller JavaScript bundles as there is no boilerplate for namespaces, and the fully qualified function names will minify very well.</p><p>What is interesting, though, is the F3 call that surrounds the function definition. F3 is defined as follows:</p><pre>function F3(fun) {<br>  return F(3, fun, function(a) {<br>    return function(b) { <br>      return function(c) { return fun(a, b, c); }; <br>    };<br>  });<br>}</pre><p>F3 performs a call to F, providing the number of arguments, the original function and a curried version of that function. The definition of F is:</p><pre>function F(arity, fun, wrapper) {<br>  wrapper.a = arity;<br>  wrapper.f = fun;<br>  return wrapper;<br>}</pre><p>Every time you define an Elm function, that function will be wrapped in a call to Fx where x is the number of arguments that function accepts. Fx will then create a curried version of the provided function, and attach the arity and original function to that curried version.</p><p>When calling the add3 function, the following JavaScript code will be emitted:</p><pre>A3($author$package$module$add3, 1, 2, 3);</pre><p>A3 is defined as:</p><pre>function A3(fun, a, b, c) {<br>  return fun.a === 3 ? fun.f(a, b, c) : fun(a)(b)(c);<br>}</pre><p>Ax functions checks if the provided function object is defined to accept x number of arguments. If it is, it calls the original function, otherwise it performs multiple function calls on the curried version.</p><p>The Elm compiler will output eight Fx and eight Ax functions, where x can be any number from two to nine. Single-arity functions naturally doesn&#39;t require to be curried, and functions that takes in more than nine arguments, are compiled directly to a curried function.</p><p>An interesting thing to note here is that if the curried path is taken, further calls on the returned function will never take the fast path. The addAndIncrement function we defined earlier will be emitted like:</p><pre>var $author$package$module$addAndIncrement = $author$package$module$add3(1);</pre><p>The return value of the single-arity call to $author$package$module$add3 will return a curried function which has not been passed to a Fx function, so when we later call addAndIncrement with two arguments, it will always perform two function calls.</p><p>There are other pitfalls of this approach:</p><ul><li>We store an additional (curried) function and the arity along with every function definition, which increases memory usage and startup time.</li><li>Functions defined within functions are more expensive to create as they require the allocation of a function object every time the surrounding function is executed.</li><li>The arity test at every function call is extra overhead, and might prevent inlining.</li><li>Wrapping most function calls in Ax almost doubles the call stack. This makes stack traces harder to read, and makes it easier to trigger a stack overflow exception.</li><li>When the original function can be called with the supplied arguments, it retrieves the function from the fn property. This is a dynamic lookup, and is slower than referencing the function directly.</li></ul><p>Even with all of these pitfalls, the performance improvement it provides by reducing the number of function calls is worth it. As an extra benefit, it reduces the size of a compiled program when compared to doing all currying inline.</p><h4>Taking a more direct approach</h4><p>At this point, you might be wondering why, since Elm is statically typed, the compiler cannot generate code which always ends up calling a non-curried version of a function?</p><p>This can be done in some cases and is, in fact, one of the most important transformations performed by elm-optimize-level-2, but it cannot always be done safely.</p><p>Consider the following function signature:</p><pre>map : (a -&gt; b) -&gt; List a -&gt; List b</pre><p>At first glance, it might seem that map requires the first argument to be a single-arity function, but that isn&#39;t actually true. a and b are generic variables, and so can represent <em>any</em> type, even functions. A legal specialization of map is:</p><pre>map : (a -&gt; (x -&gt; y -&gt; z)) -&gt; List a -&gt; List (x -&gt; y -&gt; z)</pre><p>Which means that map can be called like this:</p><pre>map add3 [ 1, 2, 3 ]</pre><p>As you know, add3 is not a single-arity function and so map cannot be compiled in a way that assumes the first argument is a single-arity function.</p><p>That being said, in many cases you can be certain about the exact arity of a function. elm-optimize-level-2 performs a transformation that performs a direct function call if the function in question is a direct reference. For the following function call:</p><pre>A3($author$package$module$add3, 1, 2, 3);</pre><p>We can be certain that $author$package$module$add3 is defined to take 3 arguments, and so it&#39;s safe to convert the code to the equivalent of:</p><pre>$author$package$module$add3.f(1, 2, 3);</pre><h4>Can we do better?</h4><p>While elm-optimize-level-2 can improve function calls whenever a function is referenced directly, it does little in the case where functions are provided as arguments to a function (also known as higher-order functions).</p><p>In Elm, it is typical to abstract looping code into higher-order functions, and since loops is where you typically want good performance, being able to optimize function calls in these situations would be a worthwhile endeavour.</p><p>To determine the success or failure of an experiment, I’ve used the following benchmark:</p><pre>suite : Benchmark<br>suite =<br>    describe &quot;Partial application&quot; &lt;|<br>        let<br>            listOfNumbers =<br>                List.range 0 1000<br>        in<br>        [ benchmark &quot;0 arg&quot; &lt;|<br>            \_ -&gt;<br>                List.foldl fold0 0 listOfNumbers<br>        , benchmark &quot;1 arg&quot; &lt;|<br>            \_ -&gt;<br>                List.foldl (fold1 1) 0 listOfNumbers<br>        , benchmark &quot;2 args&quot; &lt;|<br>            \_ -&gt;<br>                List.foldl (fold2 3 5) 0 listOfNumbers<br>        , benchmark &quot;3 args&quot; &lt;|<br>            \_ -&gt;<br>                List.foldl (fold3 3 5 10) 0 listOfNumbers<br>        ]<br><br><br>fold0 a b =<br>    a + b<br><br><br>fold1 a b c =<br>    a + b + c<br><br><br>fold2 a b c d =<br>    (a + b) + (c + d)<br><br><br>fold3 a b c d e =<br>    (a + b) + (c + d) + e</pre><p>Here, we pass different functions to List.foldl to make sure the JavaScript engine generates code that takes every code path into account. Except for fold0, these functions will end up being called in a curried fashion.</p><p>The direct call optimization of elm-optimize-level-2 has been verified to make no difference for this particular benchmark.</p><h4>Are you ready?</h4><p>Over the next couple of articles, we’ll explore several of my own attempts to optimize function calls. Most of these attempts failed, but in the end I did find something that improves Elm’s runtime performance.</p><p>You might be wondering why I’ll spend time presenting you failed experiments at all. There are two main reasons:</p><ol><li>Hopefully, this might save people time in the future when they know what works and what doesn’t.</li><li>It should give people an impression of how much work performance optimization really is, and how to do it correctly.</li></ol><p>In the next article, we’ll attempt to improve Elm’s performance and reduce the asset size of compiled programs, by removing the need of wrapping Elm functions with Fx functions.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=71cab7426a2f" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/how-elm-functions-work-71cab7426a2f">How Elm functions work</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Discovering JavaScript performance opportunities]]></title>
            <link>https://blogg.bekk.no/discovering-javascript-performance-opportunities-8c1718fd6c42?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/8c1718fd6c42</guid>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[compilers]]></category>
            <category><![CDATA[performance]]></category>
            <category><![CDATA[elm]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Tue, 23 Nov 2021 06:02:42 GMT</pubDate>
            <atom:updated>2021-11-23T06:02:42.778Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*sW5xw-bDEQkPtXoh" /><figcaption>Photo by <a href="https://unsplash.com/@dkleu?utm_source=medium&amp;utm_medium=referral">Devin Kleu</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>In the previous article, we looked at certain techniques JavaScript engines use to generate fast code. In this article, we’ll take a look at how we can discover when code gets in the way of these techniques, making it harder for the JavaScript engine to optimize them.</p><h4>Benchmarks</h4><p>To make something faster, we need to know how fast it currently is, and how changes to the code affects that speed. This is more commonly known as benchmarking.</p><p>For our purposes, we’ll make use of elm-optimize-level-2 to write benchmarks. This tool takes the output of the Elm compiler and applies a series of JavaScript transformations on it to make it run faster. It also includes a framework for writing benchmarks, to measure the impact of a set of code transformations.</p><p>Alternatively, we could write a single page application using Brian Hick’s amazing elm-explorations/benchmark library. As the benchmarks are running, we can bring up the browser’s performance development tool to spot hot spots. In the case where we’re mostly concerned about the time it takes to run a benchmark, this isn’t as automated as the setup included in elm-optimize-level-2 .</p><p>Writing a benchmark is not as easy as it might seem, though. In our previous article, we mentioned that JavaScript engines perform speculative optimizations. If we’re not careful, those speculative optimizations can perform wonders in our benchmarks, even if they wouldn’t actually do anything in real-world code.</p><p>To demonstrate, let’s borrow an example from the previous article:</p><pre>function map(fn, list) {<br>    var newList = [];<br>    for (var i = 0; i &lt; list.length; i++) {<br>        newList.push(fn(list[i]));<br>    }<br>    <br>    return newList;<br>}<br><br>function incrementNumbers(list) {<br>    return map(function(n) { return n + 1; }, list);<br>}<br><br>incrementNumbers([1, 2, 3]); // returns [2, 3, 4]</pre><p>If map is always called with the same list and the same function, the optimizing compiler might decide to inline the fn parameter and make optimizations based on the list always having a length of three. In real-world code, however, both fn and list are likely to represent many different values, meaning the optimizing compiler cannot make as many optimizations to the code.</p><p>When writing benchmarks, we have to try to re-create a scenario that gives the JavaScript engine as close to a realistic view of the code as possible, or we might get the wrong results. Because of this, benchmarks testing real-world code is more trustworthy than synthetic ones.</p><h4>Deoptigate</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*AjzBHaSDzyZEaPRVtux1Cg.png" /></figure><p>While running a benchmark can tell us how fast something is running, it fails to give us any insight into what could change to make it run faster. Fortunately for us, there is a tool that can help us with this, known as deoptigate.</p><p>Deoptigate is a node.js program that executes a JavaScript file while analyzing how the engine optimizes the code. The results are then displayed in a clean, visual interface.</p><p>Through deoptigate, we can see which functions have been optimized, which have been deoptimized, and the state of each inline cache in the code.</p><p>Only <em>hot</em> functions will be optimized by the JavaScript engine. If you find a function that isn’t optimized, this usually means that it hasn’t been run enough times as a part of the benchmark. This could mean that the benchmark is badly written, or that the function isn’t worth looking closer into. Do note that a function might not be optimized because it has been inlined most places where it is used. This means that it is optimized as part of the function that calls it, instead of being optimized itself.</p><p>A function being deoptimized usually means that the function was first optimized with a set of speculative optimizations, and that at least one of those speculations turned out to be wrong later in the program. This isn’t necessarily indicative of a problem in the code or the benchmark. If you’ve got a function that iterates every item in a list, you might see a deoptimization if it is mostly passed a list of integers but occasionally sees a list of floats. If you see that a function is deoptimized many times, however, it means that the JavaScript engine has a hard time making accurate assumptions. When this is the case, the engine might give up optimizing the function completely.</p><p>Inline caches keep track of what kind of objects are involved in a specific property access operation. Deoptigate will tell you if an inline cache is <em>monomorphic</em> (has only seen one type), <em>polymorphic</em> (has seen more than one type) or <em>megamorphic</em> (has seen too many types to be of use). In general, you want inline caches to be monomorphic in order for the optimizing compiler to generate the fastest code. If you can help it, you definitely want to avoid the megamorphic state.</p><h4>Inlining</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SsR6pEv8TNxJLUQOulyvUw.png" /></figure><p>What deoptigate doesn’t tell us, is which functions (if any) is inlined as part of optimizing a function. We can get this information by looking at the optimized function log of V8. To get this log, we can execute the benchmark using the following node command (works with node v14, not in later versions):</p><pre>node --print-opt-code benchmark.js</pre><p>This will print <em>a lot</em> of information, much of which is cryptic unless you’re familiar with V8 internals. But we do get inlining information, as well as the actual machine instructions that is being generated. We’ll take a closer look at this log in future articles as we attempt to explain why certain code is faster.</p><h4>V8 specific tools</h4><p>Both deoptigate and viewing inlining information is specific to Chrome’s JavaScript engine, V8. Throughout the following articles, we’ll use Chrome/Node to run most of our benchmarks, and then verify that the results apply to all browsers at the end. I’m personally not willing to create optimizations that make code run faster in one JavaScript engine, at the detriment of another.</p><h4>Ready to go?</h4><p>We now have the knowledge and tools we need to attempt speeding up Elm’s runtime performance.</p><p>In the next article, we’ll dive deeper into one of the most common operations in Elm programs: function calls.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8c1718fd6c42" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/discovering-javascript-performance-opportunities-8c1718fd6c42">Discovering JavaScript performance opportunities</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How JavaScript engines achieve great performance]]></title>
            <link>https://blogg.bekk.no/how-javascript-engines-achieve-great-performance-fb0b36601557?source=rss-5bd4064179a5------2</link>
            <guid isPermaLink="false">https://medium.com/p/fb0b36601557</guid>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[performance]]></category>
            <category><![CDATA[compilers]]></category>
            <category><![CDATA[elm]]></category>
            <dc:creator><![CDATA[Robin Heggelund Hansen]]></dc:creator>
            <pubDate>Mon, 22 Nov 2021 11:02:09 GMT</pubDate>
            <atom:updated>2021-11-22T11:02:09.078Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*4gqV8jp6zZIlKEpt" /><figcaption>Photo by <a href="https://unsplash.com/@lozt?utm_source=medium&amp;utm_medium=referral">Hoover Tung</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>JavaScript is an impressive technology. Not because it’s particularly well-designed (it isn’t). Not because almost every single consumer device with internet access in the world has executed a JavaScript program. Instead, JavaScript is impressive because almost every single feature of the language makes it a nightmare to optimize and yet, it is fast.</p><p>Think about it. There is no type information. Every single object can gain and lose properties over the lifetime of the program. There are six(!) different kinds of falsy values, and every number is a 64-bit float. As if that wasn’t enough, JavaScript is expected to execute quickly, so you can’t spend a lot of time analyzing and optimizing it either.</p><p>And yet, JavaScript is fast.</p><p>How can this be?</p><p>In this article, we’re going to look closer at a few techniques that different JavaScript engines use to achieve good runtime performance. Keep in mind that I’m purposefully leaving a few details out, and simplifying things. It’s not a goal of this article that you learn how things work <em>exactly</em>, but that you understand enough to comprehend the theory behind the experiments we’ll perform later in this series.</p><h4>The execution model</h4><p>When your browser downloads JavaScript, its top priority is to get it running as quickly as possible. It does this by translating the code to bytecode, virtual machine instructions, which is then handed over to an interpreter, or virtual machine, that understands how to execute them.</p><p>You might question why the browser would convert JavaScript to <em>virtual</em> machine instructions instead of <em>actual</em> machine instructions. It’s a good question. In fact, converting straight to machine instructions is what V8 (Chrome’s JavaScript engine) used to do until recently.</p><p>A virtual machine for a specific programming language is usually an easier compilation target because it has a closer relation to the source language. An actual machine has a much more generic instruction set, and so it requires more work to translate the programming language to work well with those instructions. This difficulty means compilation takes longer, which again means it takes longer for the JavaScript to start executing.</p><p>As an example, a virtual machine that understands JavaScript is also likely to understand JavaScript objects. Because of this, the virtual instructions required to execute a statement like object.x might be one or two instructions. An actual machine, with no understanding of how JavaScript objects work, will need a lot more instructions to figure out where .x resides in memory and how to get it.</p><p>The problem with a virtual machine is that it is, well, virtual. It doesn’t exist. The instructions cannot be executed directly, but must be interpreted at runtime. Interpreting code will always be slower than executing code directly.</p><p>There’s a tradeoff here. Faster compilation time versus faster runtime. In many cases, faster compilation is a good tradeoff to make. The user is unlikely to care whether a single button click takes 20 or 40 milliseconds to execute, especially if the button is only pressed once. Compiling the JavaScript quickly, even if the resulting code is slower to execute, will let the user see and interact with the page faster.</p><p>There are situations that are computationally expensive. Stuff like games, syntax highlighting or calculating the fizzbuzz string of a thousand numbers. In these cases, the combined time of compiling and executing machine instructions is likely to reduce the total execution time. So how does JavaScript handle these kinds of situations?</p><h4>Hot code</h4><p>Whenever the JavaScript engine detects that a function is hot (that is, executed many times) it hands that function over to an optimizing compiler. This compiler translates the virtual machine instructions into actual machine instructions. What’s more, since the function has already been run several times, the optimizing compiler can make several assumptions based on previous runs. In other words, it can perform speculative optimizations to make even faster code.</p><p>What happens if, later on, these speculations turn out to be wrong? The JavaScript engine can simply delete the optimized, but wrong, function, and revert to using the unoptimized version. Once the function has been run several more times, it can attempt to pass it to the optimizing compiler again, this time with even more information that it can use for speculative optimizations.</p><p>Now that we know that frequently run functions use information from previous executions during optimization, the next thing to explore is what kind of information this is.</p><h4>A problem of translation</h4><p>Almost everything in JavaScript is an object. Unfortunately, JavaScript objects are tricky things to teach a machine to deal with. Let’s look at the following code:</p><pre>function addFive(obj) {<br>    return obj.method() + 5;<br>}</pre><p>A function is pretty straightforward to translate to machine instructions, as is returning from a function. But a machine doesn’t know what objects are, so how would you translate accessing the method property of obj?</p><p>It would help to know what obj looks like, but in JavaScript we can never really be certain. Any object can have a method property added to, or removed from, it. Even when it does exist, we cannot actually be certain if it is a function, much less what calling it returns.</p><p>Let’s attempt to translate the above code to a subset of JavaScript that doesn’t have objects, to get an idea of what translating to machine instructions might be like.</p><p>First, we need a way to represent objects. We also need a way to retrieve values from one. Arrays are trivial to support in machine code, so we might go with a representation like this:</p><pre>// An object like { method: function() {} }<br>// could be represented as:<br>// [ [ &quot;method&quot; ], // property names<br>//   [ function() {} ] ] // property values<br><br>function lookup(obj, name) {<br>  for (var i = 0; i &lt; obj[0].length; i++) {<br>    if (obj[0][i] === name) return i;<br>  }<br><br>  return -1;<br>}</pre><p>With this, we can attempt to make a naive implementation of addFive :</p><pre>function addFive(obj) {<br>  var propertyIndex = lookup(obj, &quot;method&quot;);<br>  var property = propertyIndex &lt; 0 <br>      ? undefined <br>      : obj[1][propertyIndex];<br><br>  if (typeof(property) !== &quot;function&quot;) {<br>      throw NotAFunction(obj, &quot;method&quot;);<br>  }</pre><pre>  var callResult = property(/* this */ obj);<br>  return callResult + 5;<br>}</pre><p>Of course, this doesn’t work in the case where obj.method() returns a something other than a number, so we need to tweak the implementation a little:</p><pre>function addFive(obj) {<br>  var propertyIndex = lookup(obj, &quot;method&quot;);<br>  var property = propertyIndex &lt; 0 <br>      ? undefined <br>      : obj[1][propertyIndex];<br><br>  if (typeof(property) !== &quot;function&quot;) {<br>      throw NotAFunction(obj, &quot;method&quot;);<br>  }</pre><pre>  var callResult = property(/* this */ obj);<br>  if (typeof(callResult) === &quot;string&quot;) {<br>      return stringConcat(callResult, &quot;5&quot;);<br>  } else if (typeof(callResult !== &quot;number&quot;) {<br>      throw NotANumber(callResult);<br>  }<br>  <br>  return callResult + 5;<br>}</pre><p>This would work, but I hope it’s apparent that this code could skip a few steps (and thus be faster) if we could somehow know ahead of time what the structure of obj is, and what the type of method is.</p><h4>Hidden classes</h4><p>All the major JavaScript engines keep track of an object’s shape in some way. In Chrome, this concept is known as hidden classes. It’s what we will call it in this article as well.</p><p>Let’s start by looking at the following snippet of code:</p><pre>var obj = {}; // empty object<br>obj.x = 1; // shape has now changed to include a `x` property<br>obj.toString = function() { return &quot;TODO&quot;; }; // shape changes<br>delete obj.x; // shape changes again</pre><p>If we were to translate this to machine instructions, how would we keep track of the object’s shape as new properties are added and removed? If we use the previous example’s idea of representing objects as arrays, it might look something like this:</p><pre>var emptyObj__Class = [ <br>  null, // No parent hidden class<br>  [],   // Property names<br>  []    // Property types<br>];<br><br>var obj = [ <br>  emptyObj__Class, // Hidden class of `obj`<br>  []               // Property values<br>];<br><br>var obj_X__Class = [ <br>  emptyObj__Class, // Contains same properties as empty object<br>  [&quot;x&quot;],           // As well as one property called `x`<br>  [&quot;number&quot;]       // Where `x` is a number<br>];<br><br>obj[0] = obj_X__Class; // Shape changes<br>obj[1].push(1);        // value of `x`<br><br>var obj_X_ToString__Class = [<br>  obj_X__Class, // Contains same properties as previous shape<br>  [&quot;toString&quot;], // And one property called `toString`<br>  [&quot;function&quot;]  // Where `toString` is a function<br>];<br><br>obj[0] = obj_X_ToString__Class;             // shape change<br>obj[1].push(function() { return &quot;TODO&quot;; }); // `toString` value<br><br>var obj_ToString__Class = [<br>  null, // Starting from scratch when deleting `x`<br>  [&quot;toString&quot;], <br>  [&quot;function&quot;] <br>];<br><br>obj[0] = obj_ToString__Class;<br>obj[1] = [obj[1][1]];</pre><p>If we were to generate virtual machine instructions like this, we now would have a way to track what an object looks like at any given time. However, this on its own doesn’t really help us. We need to store this information somewhere where it would be valuable.</p><h4>Inline caches</h4><p>Whenever JavaScript code performs property access on an object, the JavaScript engine stores that object’s hidden class, as well as the result of the lookup (the mapping of property name to index) in a cache. These caches are known as inline caches, and they serve two important purposes:</p><ul><li>When executing bytecode, they speed up property access <em>if</em> the object involved has a hidden class that is in the cache.</li><li>During optimization, they contain information about what type of objects that have been involved when accessing an object property, which helps the optimizing compiler generate code specially suited for those types.</li></ul><p>Inline caches have a limit on how many hidden classes they store information on. This preserves memory, but also makes sure that performing lookups in the cache is fast. If retrieving an index from the inline cache takes longer than retrieving the index from the hidden class, the cache serves no purpose.</p><p>From what I can tell, inline caches will keep track of 4 hidden classes at most, at least in Chrome. After this, the inline cache will be disabled and the information will instead be stored in a global cache. The global cache is also limited in size, and once it has reached its limit, newer entries will overwrite older ones.</p><p>To best utilize inline caches, and aid the optimizing compiler, one should try to write functions that only perform property access on objects of a single type. More than that and the performance of the generated code will be sub-optimal.</p><h4>Inlining</h4><p>A separate, but significant, kind of optimization is inlining. In short, this optimization replaces a function call with the implementation of the called function. An example:</p><pre>function map(fn, list) {<br>    var newList = [];<br>    for (var i = 0; i &lt; list.length; i++) {<br>        newList.push(fn(list[i]));<br>    }<br>    <br>    return newList;<br>}<br><br>function incrementNumbers(list) {<br>    return map(function(n) { return n + 1; }, list);<br>}<br><br>incrementNumbers([1, 2, 3]); // returns [2, 3, 4]</pre><p>After inlining, the code might end up looking something like this:</p><pre>function incrementNumbers(list) {<br>    var newList = [];<br>    var fn = function(n) { return n + 1; };<br>    for (var i = 0; i &lt; list.length; i++) {<br>        newList.push(fn(list[i]));<br>    }</pre><pre>    return newList;<br>}<br><br>incrementNumbers([1, 2, 3]); // returns [2, 3, 4]</pre><p>One benefit of this is that a function call has been removed. An even bigger benefit is that the JavaScript engine now has even more insight into what the function actually does. Based on this new version, the JavaScript engine might decide to perform inlining again:</p><pre>function incrementNumbers(list) {<br>    var newList = [];<br>    for (var i = 0; i &lt; list.length; i++) {<br>        newList.push(list[i] + 1);<br>    }<br>	<br>    return newList;<br>}<br><br>incrementNumbers([1, 2, 3]); // returns [2, 3, 4]</pre><p>Another function call has been removed. What’s more, the optimizer might now speculate that incrementNumbers is only ever called with a list of numbers as an argument. It might also decide to inline the incrementNumbers([1, 2, 3]) call itself, and discover that list.length is 3, which again might lead to:</p><pre>var list = [1, 2, 3];<br>var newList = [];<br>newList.push(list[0] + 1);<br>newList.push(list[1] + 1);<br>newList.push(list[2] + 1);<br>list = newList;</pre><p>In short, inlining enables optimizations that would not have been possible to perform across function boundaries.</p><p>There are limits to what can be inlined, however. Inlining can lead to larger functions due to code duplication, which requires additional memory. The JavaScript engine has a budget on how big a function can get before it skips inlining altogether.</p><p>Some function calls are also difficult to inline. Particularly when a function is passed in as an argument.</p><p>In addition, functions passed as arguments can be difficult to inline unless it’s always the same function. While that might strike you as a strange thing to do, it might end up being the case because of inlining.</p><h4>Conclusion</h4><p>JavaScript engines have many tricks to improve runtime performance, many more than what has been covered here. However, the optimizations described in this article apply to most browsers, and are easy to verify if they’re being applied. Because of this, we’re mainly going to be focusing on these optimizations when we attempt to improve Elm’s runtime performance.</p><p>But before we start trying to optimize anything, we need a way of identifying what code can be improved. The tools that give us this information, is the topic of the next article.</p><h4>Further references</h4><p>I’m not the first to try to explain how JavaScript engines work. Here are a few articles that go more in depth, and which have a different way of explaining similar concepts:</p><ul><li><a href="https://mrale.ph/blog/2015/01/11/whats-up-with-monomorphism.html">What’s up with monomorphism</a></li><li><a href="https://mathiasbynens.be/notes/shapes-ics">Shapes and inline caches</a></li><li><a href="https://mathiasbynens.be/notes/prototypes">Optimizing prototypes</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=fb0b36601557" width="1" height="1" alt=""><hr><p><a href="https://blogg.bekk.no/how-javascript-engines-achieve-great-performance-fb0b36601557">How JavaScript engines achieve great performance</a> was originally published in <a href="https://blogg.bekk.no">Bekk</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>