Why not webpack?

ClojureScript recently released version 1.10.312 and with that came the official webpack Guide for ClojureScript.

In this post I will go over the reasons why shadow-cljs does not “just use” webpack for its npm integration. webpack is certainly the most used JS build tool out there so there is some appeal to just use it instead of rolling a custom solution. Many early prototypes of shadow-cljs actually did use it but there were several limitations for which I did not find acceptable solutions.

What is ClosureJS?

First of all we need a bit a background about how the webpack world sees JavaScript and how the Closure Compiler does. ClosureJS is used pretty much exclusively throughout the Closure Library and the ClojureScript compiler generates this style of code for us. Writing this style of code by hand is pretty tedious and not many people do. I guess that is the main reason why it never caught on with the greater JS community.

The major difference to almost all other forms of JS is that everything is namespaced. goog is the base object with provide and require methods to setup a pseudo-ish namespace dependency system built on top of normal JS objects. If you say goog.provide("goog.object") it will setup a global var goog = { object: {}}; which is then used to assign properties to. The goog.require is used to sort dependencies in the proper order and to ensure they are loaded before the file that required them.

Lets see what the CLJS compiler generates for this code:

(ns demo.app
  (:require [goog.object :as obj]))

(def obj #js {:foo "bar"})

(obj/get obj "foo")
goog.provide("demo.app");
goog.require("goog.object");

demo.app.obj = {"foo":"bar"};

goog.object.get(demo.app.obj, "foo");

Since there is no convenient namespace aliasing in JS you always have to type out the full namespace of everything which is very inconvenient. Luckily CLJS makes this very easy so its no bother at all.

Since everything is namespaced we can freely execute everything in the global scope or concatenate all files together. This is how the Closure Compiler works as well. Everything is using a shared global scope and things can freely be moved around or renamed. goog.object.get just becomes something really short (e.g. x) and others get removed completely when not used.

What is CommonJS? UMD? AMD? ESM?

In contrast to that we have several other mechanisms for organizing JavaScript and pretty much all of them have one fundamental idea: Each file has its own scope. Each file can only see it own local variables unless it specifically imports the exports of other files. So your filesystem provides the sort of pseudo-ish namespacing system which is not explicitly repeated in the code.

// foo.js
var foo = 1;
exports.hello = function() { return foo; };

// bar.js
var foo = require("./foo");
foo.hello();

As you can probably see this would break pretty badly if we just concatenate the files together like that. So unlike ClosureJS we need to wrap each file in a function so isolate their scope and then wire up the exports properly. In node or webpack the simplified version looks like this:

// foo.js
function(module, exports, require) {
    var foo = 1;
    exports.hello = function() { return foo; };
}

// bar.js
function(module, exports, require) {
    var foo = require("./foo");
    foo.hello();
}

The module system then adds some helper functions to ensure that the wrapped function each get their own module, exports and require arguments and that the require properly maps to the exports of others. There are several module systems like UMD and AMD or just plain CommonJS but they all work like this basically, I simplified a bit but its close enough.

What about webpack then?

webpack was built for the system above. Everything is wrapped in functions. Nothing is in the global scope. There are some fairly recent attempts to get rid of (or combine) some of the wrapping functions but for the most part this is not the norm yet.

So CLJS and Closure want everything to be global and namespaced but all we have is IIFEs (immediately-invoked function expressions). We somehow need to bridge the two systems and that is exactly what the guide is all about. You manually setup a .js file that pulls the JS dependencies you want into the global scope by explicitly assigning them to window.

import React from 'react';
import ReactDOM from 'react-dom';
window.React = React;
window.ReactDOM = ReactDOM;

This actually uses the newer EcmaScript 6 import syntax which is whole other can of worms we may get to later.

The evolution of shadow-cljs + npm

When I started with the npm support in shadow-cljs I started with exactly what the new ClojureScript guide suggests and tried to automate everything. First I would compile all ClojureScript and assign a pseudo-ish namespace for every JS require I found. Well I first had to make the JS requires declarative to get rid of actual js/require calls and global js/React uses but I’ll skip that part as CLJS supports this as well nowadays.

(ns demo.app
  (:require ["react" :refer (createElement)]))

(createElement "h1" nil "Hello World")

CLJS output

goog.provide("demo.app");

shadow$js["react"].createElement("h1", null, "Hello World");

And it then generated a index.js for webpack to process.

window.shadow$js = {
    "react":require("react"),
    ...
};

As long as the generated webpack output is loaded before the actual CLJS output everything is fine and actually worked pretty well. I had a fully automated system that enabled me to make easy use of everything on npm. I was happy with this for a while and actually close to releasing it.

Problem #1: Code-Splitting

You may have noticed the “As long as” in the previous paragraph. I am using :modules aka. code-splitting in pretty much all my :browser builds. Code Splitting in webpack works completely differently than Closure Modules so combining both proved exceptionally hard. I tried creating solutions using externals but none of them worked satisfactory. The “vendor” approach, ie. packaging all deps into one big .js file, worked fine but offended my perfectionism and pages that didn’t use certain npm deps ended up loading them regardless just because some other page did. I failed getting webpack to generate a .js file per Closure Module. This is of course not a problem if you are not using code-splitting at all. I think you can make this work but I didn’t feel like going that deep into webpack.

Problem #2: Full Interop

One of the things I absolutely wanted to work was full 100% two way interop. JS code should be able to use CLJS code and CLJS should be able to use JS. No compromise allowed.

In the naive approach both systems run independently. CLJS is compiled on its own just as the JS. Using some JS globals to glue them together without ever really knowing where those globals actually came from.

In one iteration I had a webpack “loader” that would just that would just translate require("../path/to/demo/app.js") to basically return the demo.app global created elsewhere. This looks integrated but it really isn’t since ALL CLJS code is still together and all JS code is loaded together. They still aren’t mixed.

Out of this :npm-module was born. It outputs the CLJS code in a CommonJS-ish format that webpack can understand. All files are generated into a flat directory which defaults to node_modules/shadow-cljs. Due to how webpack or node in general resolve dependencies we can conveniently import it and everything maps nicely together.

(ns demo.app
  (:require
    ["react" :as react]
    ["./someComponent" :as comp]))

(defn hello []
  (react/createElement comp #js {:teh "prop"}
    (react/createElement "h1" nil "hello world")))
// demo/someComponent.js
var React = require("react");
class SuperComponent extends React.Component {...};
module.exports = SuperComponent;

// index.js
var ReactDOM = require("react-dom");
var app = require("shadow-cljs/demo.app");
ReacDOM.render(app.hello(), ...);

The CLJS code can still be optimized by Closure :advanced compilation and webpack just takes the generated output and is ultimately in charge of the final output. The CLJS code will just emit a normal require("react") which webpack then fills in later when working on the rest of the .js files. Basically there would be small chunks of optmized JS code interposed with normal CommonJS. Since the code is CommonJS compatible it also works nicely with pretty much all other JS tools out there and also node directly.

However it also meant that webpack was ultimately in charge of packaging things for the browser. It would minimize the already :advanced compiled output again since it had no understanding what it was actually consuming. Just using :none sidestepped that issue but webpack really isn’t comparable when it comes to optimizing CLJS code so the output was huge. Also our nicely optimized CLJS code was now getting wrapped in IIFEs which adds a bit of unnecessary overhead.

During development things becomes especially hard since the usual live-reload for CLJS basically become impossible since the code had to be processed by webpack first to fill in the require calls. The REPL also became pretty unreliable and couldn’t load JS deps at all. You can’t just (require '["some-package" :as x]) to try packages at the REPL when using webpack.

Back to the Hammock

Running webpack as a separate build step proved very annoying in practice (YMMV) and ultimately insufficient when it came to more complex builds or REPL development. We don’t want JS to be a second-class citizen. ClojureScript has this other :npm-deps path which ultimately wants to pass all JS code through Closure :advanced compilation. I really tried making this work and some day it might still happen but as of today it is way too unreliable and will need some serious work in the Closure Compiler and on the ClojureScript side as well. The idea is simple: Resolve all the JS deps, order them, pass them to Closure and proceed as usual.

The Closure Compiler can translate the CommonJS/ESM code into ClosureJS code but instead of wrapping everything into functions all local variables are renamed to unique names to avoid clashes. Closure will then just rename or remove them later so it doesn’t matter how long the names are in the meantime.

Using the foo.js example from above we get:

// instead of the wrapped foo.js
function(module, exports, require) {
    var foo = 1;
    exports.hello = function() { return foo; };
}

// we get
var foo$module$demo$foo = 1;
var module$demo$foo = {exports:{}}:
module$demo$foo.exports.hello = function() { return foo$module$demo$foo };

// bar.js
var foo$module$demo$bar = module$demo$foo.exports;
foo$module$demo$bar.hello();

This is perfect in theory. We can consume code like that easily in CLJS since it behave like any other code from the Closure Library. In practice however things are a bit more complicated and much of the JS code on npm does not survive :advanced compilation.

How it works in shadow-cljs now

What shadow-cljs then does instead is resolving the code exactly like webpack would and passing that through Closure :simple optimizations while also wrapping them in functions to isolate them. CLJS is still processed by :advanced as usual.

Since shadow-cljs processes all of the JS it can also extract all the exported property names from the JS and automatically add them to the externs for the :advanced CLJS build. Meaning that you can get quite far without ever actually writing any externs by hand.

By directly resolving the actual files shadow-cljs can move them into the correct places to play nicely with :modules. They are still technically prepended to each Closure Module but they are executed at the correct time later on. They are not executed immediately since that doesn’t work with circular dependencies which JS unfortunately allows.

webpack will rewrite require calls to use numeric ids instead of the file names and basically ship one big JS object with id -> fn. This is a nightmare for caching since numeric ids can change if you dependencies resolve in a different order (after adding/removing deps). shadow-cljs instead keeps the pseudo-ish names that Closure would generate based on the file path.

// node_modules/lib/foo.js
shadow$provide["module$node_modules$lib$foo"] = function(module, global, process, exports, require) {
    var foo = 1;
    exports.hello = function() { return foo; };
}

// node_modules/lib/bar.js
shadow$provide["module$node_modules$lib$bar"] = function(module, global, process, exports, require) {
    var foo = require("module$node_modules$lib$foo");
    foo.hello();
}

The require function basically just looks at the shadow$provide object to get the function handle and stores the result before returning. When called again it will just return the previous result. Since the file names are stable they can easily be cached so the overhead of running through :simple is only paid once leading to far better compile performance in watch.

Note that the above is the code before :simple, after optimizations it’ll look more like

shadow$provide.module$node_modules$lib$foo = function(a,b,c,d,e) {
    d.hello = function() { return 1; };
}

shadow$provide.module$node_modules$lib$bar = function(a,b,c,d,e) {
    e("module$node_modules$lib$foo").hello();
}

Closure does some pretty wild tweaks to the code of the result is pretty much always better than what comparable JS tools achieve. It is not quite :advanced but it is still pretty good.

Note that the long pseudo-ish names are preserved in the strings but gzip takes care of most of that. I might still revisit this at some point but for now you’ll see kinda long strings in the code sometimes. Important part for caching is that the names are stable.

Conclusion

I consider the npm/JS integration in shadow-cljs a solved problem. For the most part you can just install any npm package and use it right away without any additional configuration or tool setup. It all just works. Everything is in place to fully switch to Closure :advanced once the support gets more stable and reliable (or the JS code gets more usable, notably strict ES6+). You won’t have to change a thing.

“Just” using webpack proved non-trivial and problematic in many cases so that path was ultimately abandoned. It is however a viable solution in “simple” builds that just have a few isolated npm deps which can be kept out of the actual CLJS build.

Unfortunately a small percentage of the JS world actually stopped writing ECMAScript and started writing WebpackJS. This means that you’ll sometimes find JS libs on npm that will require("./some.css"). shadow-cljs will just ignore these for now but that means you have to get your .css some other way which is not always easy. I hope to add support for this rather weird convention some day but the CSS processing support in shadow-cljs is still hanging out in the Hammock.

:npm-module is a solution for projects that are primarily webpack driven and just starting to introduce CLJS.

Someone more familiar with the webpack internals may be able to create something more usable for CLJS interop where I failed and I’d be very curious to see that.

Add a Comment