While I was wrestling with some ghastly batch script the other day, it struck me that we seem to be relatively loathe to hold the, often quite dark, art of scripting to the same standards as we do other aspects of software engineering principles.
In random order:
- Scripting languages have limited expressivity.
This probably stems from the principle that scripts should “just” string together lots of — very imperative — invocations of “real” programs (such as
sed/…), without interacting directly with the results of those invocations. Instead, they should implement the actual logic largely by virtue of piping untyped text to the likes of
awk/… together with a magic incantation that leaves the average virgin or goat scurrying for the hills, lest they be sacrificed to ensure proper execution of said incantation. This means that the core of what you’re trying to achieve with any script is already largely external to that script, begging the question why it’s there at all.
- Scripting languages have limited tool support.
How many IDEs really understand your scripts’ prose? Among those, how many understand the stuff the script actually calls? Apparently, IDE developers cannot even be bothered too much with providing basic language support like syntax highlighting and syntax error reporting (and how hard can parsing be, anno 2018), let alone more advanced features like content assist and type checking.
- Scripting languages have limited portability.
command.exe, etc., and all of these have dialects/versions depending on OS, and none of them are compatible to any degree of usefulness. Getting them to work on both your own as well as the CI-machine might already be…challenging — and I guess “nerve-wrecking” would sooner be the operative word here — let alone across a wider range.
- Scripts are not tested/testable.
Have you ever unit tested (part of) a script? Yup, I didn’t think so. That’s a pity, since scripts typically do a lot of things which all have to behave perfectly well in order for the whole thing to succeed. It’s no wonder that scripts are notoriously brittle — again leaving virgins and live stock in fear of their lives. Even if the language allows some degree of modularisation or abstraction — through, say, parametrised subroutines — you cannot readily “call into” a script: the paradigm only allows wholesale execution.
So, why do we use scripts? Often, we have no other way of automating certain tasks (or at least we think we do…). Often enough, we can try to use build systems like Maven, or even good’ole
make to achieve what we need, only to find that we need to orchestrate invocation of that in a way that spans more than one a couple of keyboard keystrokes.
The mere existence of scripting languages also seems to prevent that better solutions to “typical” use cases (like CI/CD) arise. We all know that
make/Maven/Gradle/Tycho/etc. are not the ultimate answer, but hasn’t anyone thought “bugger, let’s pull a Torvalds on build systems” — i.e.: start from scratch and make every design choice go the complete opposite way relative to what you’re trying to replace? That’s hard to imagine in this day and age, although it’s a lot easier to see that no such effort has gained any traction.
As a side note: Haskell uses a DSL implemented in Haskell itself to power the build system for itself: see this article. This looks like a promising approach suitable for a much broader range than just Haskell.
Scripting also effectively prevents incrementality: the concept that small changes to the state/input are picked up automatically and propagated to the end result in an automatic and efficient/performant manner. This requires that the semantics of tasks can be tracked, reasoned about, or even “differentiated” in a sense. For scripts, that’d mean that that applies to everything invoked as well, which obviously is never going to happen.
A possible solution
One way to address these concerns is to not use pure scripting languages like
sh, and their ilk, but instead to use regular GPLs that have good support for unit testing, type safety, and/or incrementality — and haven’t they all these days?
Even modern Java is not so horrible for this task as it used to be, thanks to Java8 Streams and closures, Apache Commons IO, and the like. In fact, I’ve had quite some success with combining Java, ZeroTunaround’s JRebel, and the FileMonitor facilities from Apache Commons IO, to craft a build process that’s incremental — which totally compensates for having to spun up a JVM, at least in my book. Incrementality still doesn’t come for free, though, but at least there’s more room for it.
On other occassions, I’ve used TypeScript under Node.js combining things like promises with fibers to get a degree of parallelisation.
And there’s of course Puppet. While this may not scratch all actual scripting itches, I’ve heard that for typical sysadmin things it works well.
What are your thoughts on this? Please share below.