Concurrency Creep in JavaScript
It’s no secret that I move(d) away from JavaScript and Node.js in the last year and switched to using Go for simple command line applications and web apps. In this post I’m going to explain one of the reasons for that: Concurrency.
Please note that this has nothing to do with performance. I’m not saying that JavaScript is slow and I’m not saying that Go is fast(er).
JavaScript: The Async Parts
One of the promises (no pun intended) of JavaScript on the server was that thanks to its non-blocking I/O and callbacks, you would never block (and you better never block the event loop) and could therefore handle much more load, because the stuff that slows down your application is database/disk I/O, not raw computations.
That is indeed true. I/O takes forever (relative speaking) and being able to do other things in the meantime sounds appealing. Also, thanks to JavaScript’s inherent single-threaded model, you basically have to work in this way, or else processing one incoming HTTP request would block your app server.
In the good ol’ days, we had callbacks. Many of them. With ES6 and TypeScript
and other recent developments, stuff like Promises and the await
keyword are
being added to JavaScript, as to make working with asynchronous code easier.
The only issue: IMHO things don’t get easier. Let’s take a look at some classic JavaScript:
database.doHeavyLifting(function(err, result) {
if (err) {
oops();
}
else {
much();
magic();
such()
wow();
}
});
We got one level of indentation. Now let’s look at the same logic using Promises:
database.doHeavyLifting().then(function(result) {
much();
magic();
such()
wow();
});
That … didn’t make things easier in my view. We still have to provide a callback, we still go one indentation level deeper. Sure, when it comes to chaining Promises, a real advantage over the callback hell becomes apparent, but if all you need to do is fetch something from the database and process it, you gained nothing.
And then there is the await
keyword, which would reduce the code to something
like this:
var result = await database.doHeavyLifting();
much();
magic();
such()
wow();
That’s much better. So why am I not jumping on the bandwaggon and start using
transpilers to use await
in my code today? One of the reasons is that I don’t
like pushing more and more tools and libraries and frameworks in between my
original code and what actually ends up running. I’m just no fan of having to
pipe my code through babel to run it, learn how to handle sourcemaps and then
still fall back to debug the generated code.
But there is another reason and that’s the one that made me write this article in the first place.
Async Creep
Asynchronous code is something that once you have it, you cannot get rid of it anymore. Take this simple example: In a project, there was a backend and a frontend. While discussions about how to implement the backend were still going on, the frontend simply mocked the backend away, creating code like this:
function Persistence() {}
Persistence.prototype.fetchStuff = function(id) {
return new MockModel(id);
};
Easy enough. Everywhere in the frontend code, the mocked persistence would be used in a synchronous way:
var model = persistence.fetchStuff(42);
The bad thing is: Once we were ready to actually implement the backend and talk to databases, we had to implement it using Promises/callbacks (thanks to the non-blocking I/O in Node.js). So all of a sudden, we had to replace tons of code and introduce callbacks/Promises everywhere.
function Persistence() {}
Persistence.prototype.fetchStuff = function(id) {
return new Promise(...);
};
var p = new Persistence();
p.fetchStuff().then(function(model) {
});
Now, every spot that talked to the persistence had to be rewritten. And all the spots that used the rewritten code had to be rewritten. Because we suddenly had some async code deep, deep in the backend code, the async-ness creeped up through all other layers and influenced all other parts of the application.
This is terrible. When writing a JavaScript library to do X, how do you design your API? Do you return values or do you plan ahead and return Promises / require callbacks?
Let’s imagine us writing a simple module that validates and email address. We might start like this:
module.exports = function(possibleEmailAddress) {
return regexMagic(possibleEmailAddress); // true or false
};
function regexMagic(s) {
return s.match(/something/);
}
Now you publish your code and start using it. Everyone is happy. A month later you decide that validating the MX record might be a good idea, but oh-no … that requires a network roundtrip and bam, you just forced your code to become async.
module.exports = function(possibleEmailAddress, callback) {
network.checkMX(extractDomain(possibleEmailAddress), callback);
};
There is simply no way (for good reasons) to force this I/O to be blocking, so that you can keep your module’s API stable.
So… this teaches us that we should have prepared for async stuff to happen in the future. And this is something that can happen to any library out there. If you want to have a stable API, plan ahead and make it async.
What’s the Deal?
The main problem I have with this is that, as a developer, I have to constantly think about asynchronous stuff. I can’t think “okay, ‘if email is valid then do X’.“, but rather “okay, ‘check email validity and then, someday, when it finished, do X.‘”
I gain nothing from this. My application doesn’t improve by me using callbacks
or Promises to “work around” JavaScript’s single-threaded, non-blocking I/O. Even
when using await
, I have to remember to put another keyword in front of
my method calls or else weird things will happen.
Also, I have no choice. I can’t tell V8 to please just block, because my
particular application doesn’t need to do stuff while waiting for the database.
Most of the time, my code fetches something from the database, then processes it
and then updates the database. These things depend on each other and on a specific
order – the callbacks/Promises are just bloat to me (this includes
async.waterfall
, which is my weapon of choice nowadays but still forces me
to think async everywhere).
This is distracting to me. My brain likes to think sequentially and having to split my business logic into lots of tiny pieces makes it harder to reason about.
Alternatives
I can’t really imagine a “fix” in JavaScript for that. It’s just the nature of the language and it is basically a must-have when considering the single thread that JavaScript runs in. You could try to use WebWorkers to simulate threads, but again, you would have to chain Workers together to achieve your goal (I don’t even know if WebWorkers could do I/O).
I understand the neccessity of having callbacks in JavaScript, but I don’t think it’s an evolution over what we already had in other languages for the last 30 years.
Go taught me how to do concurrency much better. Instead of forcing the developer to deal with non-blocking I/O all the time, everywhere, just allow them to explicitely perform concurrent work. In Go (or any other language with blocking I/O and a model of threads) I decide for myself when it is time to start a gorutine and I decide when it’s time to merge two or more of them again.
I can’t stress this enough: In JavaScript, the entire application has to deal with callbacks. So if you have a webserver, you must use callbacks everywhere so it stays responsive. All your controllers, services, utilities have to deal with that. In Go, we start a goroutine per incoming connection and just work in a synchronous way in our application. There’s no reason to spread the asyncness through the entire codebase; instead it’s just handled at the HTTP handler.
Threads/goroutines are considered somewhat dangerous and coordinating them can lead to nasty bugs. But at least I am in control and reasoning about my code is much easier in most cases.
Conclusion
JavaScript’s single-threadedness has its advantages. You don’t risk race conditions and callbacks can provide a nice way to decouple parts of your application. But for many, if not most tasks, I found that synchronous code is just as fast, but much more understandable.
There is a reason why so many *Sync
methods exist in Node’s fs
module…