Challenges in a single-page application - bundling strategies

In this post - part 3 of this series, I'm going to explore how bundling script files can be optimized to improve performance of a single-page application.

Other posts in this series

Part 1 - Challenges in a (really) large single-page application - how to optimize script downloads
Part 2 - Challenges in a single-page application - Analytics

We start off with a question.

Why do we need to bundle script files?

When building a website, there's a few constraints that are imposed on the developer such as network bandwidth and the processing capabilities of the client browser and machine. Bundling of JavaScript files helps to reduce the usage of these constrained resources through multiple desirable effects.

Reduction in network calls

When you have a lot of files, the browser has to make separate network calls for each and every one of those files. Each network call has oveheads associated with it such as DNS lookup, latency to the server, time to do an SSL handshake, etc. Moreover, most popular browsers impose limits on the maximium number of network calls that can be done in parallel to a single domain.

By bundling multiple JavaScript files into a single file that is served by the website, the network overhead can be minimized thereby resulting in a faster website.

Better compression of HTTP traffic

Compression allows web servers to reduce the size of files served to a website. GZIP compression technique that's used by most webservers since it is known to produce the best results for text resources such as JavaScript, CSS and HTML. It works by trying to find duplicate data fragments and replace them with markers in an efficient way.

More details about GZIP here: https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/optimize-encoding-and-transfer?hl=en#text-compression-with-gzip

What this means is that larger files result in much better compression, thereby reducing total bytes across the network.

So how do we do this?

A typically large single-page application contains a lot of JavaScript files. For example, the Azure Portal that I'm currently helping build contains millions of lines of JavaScript partitioned into thousands of files. In this website, and other similarly large single-page applications, we can assume a few things to be true:

  • Some chunk of all JavaScript code is required at startup
  • As users interact with various features in the website, the corresponding JavaScript code is needed

What this means is that to have optimal performance, all the files that enable startup functionality should be bundled together into a single file or a small finite number of files. Moreover, as users interact with various features of the website, only the code needed for those individual features should be downloaded minimizing the number of files at each point.

What is a JavaScript module?

A module consists of self-contained code providing distinct functionality. This allows for reuse of the functionality across the codebase without conflicts. Among other things, some of the benefits that modules provide are maintainability, namespace and reusability.

Great post explaining modules: https://medium.freecodecamp.com/javascript-modules-a-beginner-s-guide-783f7d7a5fcc#.pdbq7fdbo

There's quite a few ways to implement modules such as using the module pattern, creating object literals, AMD modules, CommonJS modules or native ES6 modules.

CommonJS modules

This is a module system that was originally designed for running JavaScript outside the context of a browser. Popular implementations include node.js and browserify.

Trivia fact - CommonJS was originally called ServerJS but then renamed to demonstrate broader applicability.

While this works great in server-side JavaScript, the synchronous nature of requiring modules is not very conducive. Especially since network requests are asynchronous. What this means essentially is that for every require made from within a module, the code for all those dependent modules has already been downloaded to the client browser. Browserify is an attempt at addressing this problem.

AMD modules

Asynchronous module definition is a module system designed for the browser which provides an asynchronous method for loading modules. This allows for loading of multiple modules in parallel and also works well with async network calls. This makes lazy-loading of modules simple.

One of the disadvantages of this system is that defining AMD modules is quite verbose.

// file1.js
define(["dep1"], function (dep1) {
    // code within this module.
});

This is simplified by using languages that cross-compile to JavaScript such as TypeScript.

// file1.ts
import dep1 = require("dep1");
// code within this module.

The above TS code cross-compiles into the JavaScript previously depicted. In addition to that, it preserves the synchronous require semantics used in CommonJs. The most common implementation of this is RequireJS.

ES6 modules

This is a new module system that introduces new constructs into JavaScript itself. This makes static analysis simpler. More importantly, it is part of the ES standard. While this is not yet supported widely by browsers, when it is, it will offer significant advantages.

For example, through static analysis unused exports can be removed. Here's how rollup.js is able to bundle ES6 modules and create bundles.

// dep.js
export function doStuff(a, b) {
    return a + b;
}

// file1.js
import { doStuff } from "./dep.js";
doStuff();

These two files can be converted into a single file

// bundle.js
function doStuff(a, b) {
    return a + b;
}
doStuff();

What's the optimal strategy for bundling files? A case study.

Let's analyze an example web application with modules and dependencies as described below.

Web application - module graph

What's interesting here is that combinations of various modules may be required by multiple features. There's a few approaches one could take when figuring out how to assign modules to bundles.

Strategy 1 - each module is served as a single request to the client

This results in a large number of network calls - which doesn't scale for a large application.

Strategy 2 - all modules are combined into a single file and served as a single request

This approach isn't ideal either since it will result in a huge download on startup.

Let us consider module E. It is needed when the user interacts with features 1, 2 and 3. Lets say feature 1 is the most commonly used feature. In that case, an argument could be made that we should optimize for least number of network calls for it. Here's the bundles generated for the modules in this case study.

Web application - module graph - bundling strategy 1

Pro: Only 1 network call needed to exercise feature 1
Con: If features 2 or 3 are exercised, then Bundle1 and so all the code for feature 1 gets downloaded

Strategy 4 - split common sub-graphs into separate bundles

In employing this strategy we analyze the graph of modules and create separate bundles for every subset of modules that are needed by multiple features. This is what bundles generated by such a strategy may look like.

Web application - module graph - bundling strategy 2

Strategy 5 - an amalgamation of strategies 3 and 4

While strategy 4 works well to reduce the amount of code download on exercising each feature, it could also result more downloads on startup than necessary. A way to overcome this would be to consider all features needed at startup as a "preferred" feature - combining them into a single bundle, which employing strategy 4 for the rest of the code.

Other thoughts

Which module system should I use?

There's several things to consider when making this decision. Some of them include: -

  • Which browsers do you need to support?
  • Are there external dependencies built using a specific module that you want to reuse?
  • Are you comfortable using a transpiler to author ES6 modules that get converted to AMD or commonjs modules?

There's a lot of posts online debating as to which one is most awesome, so I'll not get into that. What I'll say is this: -

I've used AMD in the Azure Portal with a custom graph traversal algorithm for determining how to bundles and it has scaled well. I've also successfully employed commonjs in a large-scale server-side application. I plan on trying out ES6 modules next to gauge its strengths and weaknesses.

My recommendation: try out various module systems, and pick one that suits your scenario the best.

Strict mode - what to watch out for

Strict mode is an approach used to change the semantics of JavaScript execution. Among other things, one of the reasons you may want to "use strict" is because it has changes that makes it easier for JavaScript engines to perform optimizations that they couldn't otherwise do. There's two levels at which strict mode can be invoked - scripts and functions. You can read more about it here: Strict mode - JavaScript | MDN

A typical method used to generate bundles for JS files is by concatenating the contents of individual files and serving the result JavaScript from a single url. If some of these files require the semantics of non-strict execution, while other files invoke strict mode at a script level - when bundled the script-level strict mode invocation would apply to the JavaScript in the entire bundle file.

This problem can be avoided by switching to invoking strict mode for functions only. That way when combined with files that need to execute outside of strict mode, there isn't any conflict. Below are some examples of how you would do this in your code.

// JavaScript
(function () {
    "use strict";

    // code within closure.
})();