Browsers were originally designed for stateless HTML pages. SPAs have deviated from that model quite an extent. So, when building an SPA, there's a few challenges that have emerged and need to be overcome. Some of them being: -
- Search engine optimization
- Client/Server code partitioning
- Browser history
- Speed of initial load
I've already posted a little bit on this topic earlier in the context of the Azure Portal: What it takes to make a single page application performant.
I plan on exploring this area further through this new series of posts on "Challenges in a (really) large single-page applications".
Today, I'll dig further into the performance aspects of an SPA. The topic at hand - how to speed the initial load by optimizing script downloads necessary to run the application.
There are a few well-established techniques that are employed to improve the performance of script downloads on a web page.
In the case of a single-page application there's yet another technique that's quite useful
This is a process where you run an algorithm on your code to reduce the size it takes without impacting functionality. By doing this, network bandwidth required to transfer code to the client browser is reduced thereby speeding up the web page.
As part of this process, the algorithm may remove unnecessary characters including white spaces, new line characters, comments, sometimes certain delimiters. Depending on the sophistication of the algorithm, it may perform other modifications to the code, such as rename identifiers to have shorter names.
You can read more about this at https://en.wikipedia.org/wiki/Minification_(programming)
If a user has already opened the SPA and so downloaded scripts required to run it, it would be desirable to not have to download them all over again. Further, not all scripts in an application change with every release. In such cases, the preferred approach would be to only download the scripts that have changed.
This can be achieved through HTTP caching.
I found a post that explores and explains this topic really well, so I'd recommend you read How To Optimize Your Site With HTTP Caching.
Content delivery networks are designed for the whole purpose of serving Content to the end-user with high-availability and high-performance. Generally they consist of a bunch of proxy servers hosted in data centers spread out across multiple locations. A lot of times, these data centers belong to ISPs that are directly connected to the end-user thereby providing large benefits.
You can read more about it on Wikipedia Content delivery network
Browsers generally limit the number of network connections opened by a website. Further it has smaller limits on connections per domain name. Some modern browsers have this limit set to a value as low as 6.
By bundling of client code, we reap benefits by improving on a bunch of limitations: -
Browser limits: I've already talked about this above. Since there's fewer files to download, the browser limit would be hit less often.
Latencies: If there's a significant latency from the client machine to the web-server, multiple downloads could cause severe degradation in performance. This is especially true for large-scale enterprise applications that need to be accessed from across the globe.
Compression: Web-servers today serve scripts in a zipped format to browsers. These compression algorithms work really when there's repeated patterns. The likelihood of this happening increases based on the amount of code in the payload.
Note though, there's a few things you should be careful about when combining scripts into bundles.
- It may impact caching. Lets say, 20 files are combined into a script. When a new release is deployed, even if only 1 of those files has changes, the cache will have to be busted so that the bundled file is re-downloaded.
- With the advent of HTTP/2 some of the aforementioned reasons that performance improves may be rendered obsolete. You can find more details here
A single-page application may have a large set of capabilities. Not all of these may be required when the web page is loaded. So, it would make sense to defer the loading of scripts associated to these capabilities to the point in time when they are required - if at all. Depending on the subset of capabilities initiated at startup, this could result in a huge boost in performance.
This actually ties in nicely to the benefits provided by HTTP caching. It is likely that developers work on individual capabilities and are likely to modify multiple files that build that capability. Therefore, having a separate bundle for that would likely improve performance.
With the Azure Portal we've employed a combination of these techniques to great effect. For example, we've experimented with a few CDNs and picked one that suited us better. We've built a mechanism to cache scripts for a long duration and bust the cache only when there are changes to scripts, thereby reducing network overhead.
In subsequent posts in this series I'll talk more about what kind of bundling algorithms can be used to split a large number of files into bundles. I'll also talk more about how to do analytics on such an application as well as code-partitioning to scale out the development of an SPA to a large team or sets of teams.