Getting started with js-csp — part 2
Perviously, in part 1 we took a look at how to work with the basics of js-csp
in the context of writing a small CLI app. After writing the thing and stepping back a bit — I realised that, although I had split the problem up into nice understandable parts — nothing was happening concurrently. This post addresses that issue. (btw — if you’re interested in getting some practice, I’d suggest trying to do this yourself after reading the recap in the intro here).
Lets recap some of the main points we covered in part 1:
yield csp.take(channel)
can be used to suspend execution of a generator function and wait for a message to be available onchannel
before resuming. Can only be used in a generator function executed bycsp.go
.yield csp.put(channel, msg)
can be used to suspend execution of a generator function and wait for a message to be consumed fromchannel
before resuming the generator function. Can only be used in a generator function executed bycsp.go
.const channel = csp.go(function * () { … })
can be used to run a generator function (containingput / take
ops). It returns achannel
which will receive one message. This message will be whatever the generator function returns when it’s done.
Now, let me spell out why the code in part 1 has no concurrency in it. I didn’t go into this previously as an exercise for the reader (ok I was also tired and didn’t feel like haha but really if you did figure it out by yourself then you certainly did understand the code):
- We first make a request for all contributors (this is an un-paginated result). We cannot do anything productive while we wait for this HTTP request as we need the contributors to continue.
- Once we get the contributors, this is what I was doing:
const { json } = msg
for (let i = 0; i < json.length; i++) {
const contributor = json[i]
yield csp.put(
channels.outputCh,
AddContributor(contributor.login, contributor.html_url)
)
const doneCh = csp.go(
followerProcess,
[contributor.followers_url, contributor.login, csp.chan()]
)
yield csp.take(doneCh)
}
return yield csp.put(channels.outputCh, FlushMsg())
i.e. for every contributor, I was adding that contributor to the output and spawning another process to take care of fetching the contributor’s followers. I was then waiting for this process to finish before looping on to the next contributor. I was doing this because I needed to know when all followers were processed so that I could send the FlushMsg
.
This, however, is the main thing that’s slowing everything down. We clearly don’t need to wait for one contributor’s followers to be fetched and added before requesting another one. (One thing to note though: I do want the contributor to be added to the output before fetching any of its followers. The reason being is that adding a follower depends on it’s contributor to have been added first (otherwise there’s no key to append to in the output)).
So that’s the problem we’ll be solving in this part… and it’s almost a one liner.
One possible solution
We can just collect all channels returns by spawning all followerProcesses
and then wait for all of them to finish:
let doneChs = []
for (let i = 0; i < json.length; i++) {
const contributor = json[i]
yield csp.put(
outputCh,
AddContributor(contributor.login, contributor.html_url)
)
doneChs.push(
csp.go(
followerProcess,
[contributor.followers_url, contributor.login, csp.chan()]
)
)
}for (let doneCh of doneChs) { yield doneCh }// all followerProcesses have ended
return yield csp.put(outputCh, FlushMsg())
One thing I avoided to mention in part 1 (but now thing you’ve got enough experience to start shortening things) is that yielding
a channel is equivalent to taking from it so the following are identical:
yield doneCh
yield csp.take(doneCh)
Why are you switching to `for of` now?
You might have noticed that I’m finally using the for of
construct for iteration in this 2 part post. The reason is that, while writing part 1, I was getting some errors with the for of
which I wasn’t getting with a plain for
loop. In any case, I decided to scrap them altogether for the time being.
In particular, the reason why I’m pointing this out here is to highlight something I forgot to mention in part 1. Basically, avoid using .forEach
to iterate over your collections — or any other construct that requires you to pass a callback function. The reason being that any code you write in your callback will be outside the wrapping generator function and you will no longer be able to csp.take / put .
Check out the refactored code
I’ve refactored the code (split in multiple files) and added the little tweak above in branch post2
here.
Time?
I ran against ubulonton/js-csp
again and the total time came in at 38s
… a marked improvement from the previous 1m 35s
.
I noticed that the result wasn’t exactly the same though — the project now has one less contributor… wasn’t aware that contributor counts can go down… That said, the total lines in the output is close enough.
In any case, that’s all I got for this time. Happy coding :)