Challenges in a (really) large single-page application

Browsers were originally designed for stateless HTML pages. SPAs have deviated from that model quite an extent. So, when building an SPA, there's a few challenges that have emerged and need to be overcome. Some of them being: -

Search engine optimization
Client/Server code partitioning
Browser history
Analytics
Speed of initial load

I've already posted a little bit on this topic earlier in the context of the Azure Portal: What it takes to make a single page application performant.

I plan on exploring this area further through this new series of posts on "Challenges in a (really) large single-page applications".

Today, I'll dig further into the performance aspects of an SPA. The topic at hand - how to speed the initial load by optimizing script downloads necessary to run the application.

There are a few well-established techniques that are employed to improve the performance of script downloads on a web page.

Minification
Caching
CDNs
Bundling

In the case of a single-page application there's yet another technique that's quite useful

Lazy-loading

While I talk here primarily about JavaScript, I only do so since generally, that consists of the bulk of a single-page application. However, these techniques would just as easily be applied to any other client-side artifact such as CSS or JSON.

Minification

This is a process where you run an algorithm on your code to reduce the size it takes without impacting functionality. By doing this, network bandwidth required to transfer code to the client browser is reduced thereby speeding up the web page.

As part of this process, the algorithm may remove unnecessary characters including white spaces, new line characters, comments, sometimes certain delimiters. Depending on the sophistication of the algorithm, it may perform other modifications to the code, such as rename identifiers to have shorter names.

There's quite a few minifiers out there which do a decent job of minifying JavaScript including Microsoft Ajax Minifier, YUI Compressor, Closure and UglifyJS.

You can read more about this at https://en.wikipedia.org/wiki/Minification_(programming)

Caching

If a user has already opened the SPA and so downloaded scripts required to run it, it would be desirable to not have to download them all over again. Further, not all scripts in an application change with every release. In such cases, the preferred approach would be to only download the scripts that have changed.

This can be achieved through HTTP caching.

I found a post that explores and explains this topic really well, so I'd recommend you read How To Optimize Your Site With HTTP Caching.

CDNs

Content delivery networks are designed for the whole purpose of serving Content to the end-user with high-availability and high-performance. Generally they consist of a bunch of proxy servers hosted in data centers spread out across multiple locations. A lot of times, these data centers belong to ISPs that are directly connected to the end-user thereby providing large benefits.

You can read more about it on Wikipedia Content delivery network

Bundling

Browsers generally limit the number of network connections opened by a website. Further it has smaller limits on connections per domain name. Some modern browsers have this limit set to a value as low as 6.

So, what do you do when you have a single-page application that consists of (literally) thousands of script files? You combine these files into one large file - or perhaps 6. In reality the number of files you combine all your JavaScript into would vary based on several factors, such as - is all the code required on load. More on this later in the "lazy-loading" section.

By bundling of client code, we reap benefits by improving on a bunch of limitations: -

Browser limits: I've already talked about this above. Since there's fewer files to download, the browser limit would be hit less often.

Latencies: If there's a significant latency from the client machine to the web-server, multiple downloads could cause severe degradation in performance. This is especially true for large-scale enterprise applications that need to be accessed from across the globe.

Minification: Most minifiers provide better results the more information they have. When you combine multiple dependent JavaScript files together, a minifier can reason over the code better and reduce the code size even further.

Compression: Web-servers today serve scripts in a zipped format to browsers. These compression algorithms work really when there's repeated patterns. The likelihood of this happening increases based on the amount of code in the payload.

Note though, there's a few things you should be careful about when combining scripts into bundles.

It may impact caching. Lets say, 20 files are combined into a script. When a new release is deployed, even if only 1 of those files has changes, the cache will have to be busted so that the bundled file is re-downloaded.
With the advent of HTTP/2 some of the aforementioned reasons that performance improves may be rendered obsolete. You can find more details here

Lazy-loading

A single-page application may have a large set of capabilities. Not all of these may be required when the web page is loaded. So, it would make sense to defer the loading of scripts associated to these capabilities to the point in time when they are required - if at all. Depending on the subset of capabilities initiated at startup, this could result in a huge boost in performance.

If bundling of script files were to be done according to the dependencies and relation between JavaScript files, you could imagine a scheme where all files associated to individual capabilities are bundled together and only downloaded when necessary.

This actually ties in nicely to the benefits provided by HTTP caching. It is likely that developers work on individual capabilities and are likely to modify multiple files that build that capability. Therefore, having a separate bundle for that would likely improve performance.

Real-world example

With the Azure Portal we've employed a combination of these techniques to great effect. For example, we've experimented with a few CDNs and picked one that suited us better. We've built a mechanism to cache scripts for a long duration and bust the cache only when there are changes to scripts, thereby reducing network overhead.

Further, to reduce the initial download, and defer the loading of scripts not required at startup, we've built algorithms for detecting dependencies between JavaScript files and generating bundles with unique URLs, having long cache duration. I'll talk about this more in my next post in this series.

In subsequent posts in this series I'll talk more about what kind of bundling algorithms can be used to split a large number of files into bundles. I'll also talk more about how to do analytics on such an application as well as code-partitioning to scale out the development of an SPA to a large team or sets of teams.

Challenges in a (really) large single-page application - how to optimize script downloads