Node.js Getting Started with Alibaba Cloud Function Compute

neilspink
neilspink
Dec 6, 2019 · 8 min read

You have likely run into a similar situation where you needed to automated a job you would otherwise have to do manually. I had large CSV files in Alibaba Cloud Open Storage Service (OSS) where I had to extract a specific column of data.

I have for you here three Node JS function compute programs; one for debugging, a second showing how to copy a file from one OSS bucket to another and lastly, one that parses a CSV file to extract, transform and load into a text file (classic ETL).

The Source Code

https://github.com/neilspink/ali-fc

Assumptions

You have an Alibaba Cloud account and activated the following services; Log Service, OSS and Function Compute.

Additionally, I would also recommend Alibaba KMS so you can then have AES256 server-side encryption on your OSS — data stored even for a minute should be encrypted.

Documents You Will Probably Use A Lot

Alibaba Function Compute Node.js Runtime Documentation

Alibaba Node.js OSS SDK Documentation on GitHub

You Need Async and Await

When creating a Node JS function compute program, add async onto the function, as shown below in bold. Anything other than a “hello world” example will need it:

exports.handler = async function (event, context, callback) {

Any calls to SDK functions typically use promises, which is the JavaScript construct for asynchronous code. You need to add the await keyword before your function call like this:

await store.copy(targetName, sourceName).then((result) => {

Be careful when calling await multiple times, it can slow your code.

Create your function

The first information you are prompted to give is a Service Name, this is for grouping your programs. Generally, you should be chunking work into small specific tasks which makes it easier to develop, maintain and reuse. The additional benefit is having one central configuration for multiple functions.

The first function I am creating is for the setup of logging and debugging:

Create a RAM Role

You’ll want your functions to be able to access services like OSS and the Log Service. Go into the RAM console and select RAM roles and create a new one:

I prefix my roles with fc, so you quickly know it’s for function compute. Over time you’ll have many roles, so get your naming practice right early on:

Now the scary part - giving these functions broad permissions to do stuff with the policies AliyunOSSFullAccess and AliyunLogFullAccess:

This is giving more than just read/write access, but I think this is fine because from a risk perspective we are explicitly assigning these to functions which are codified in a way that can’t be abused. I’ve had problems before on other cloud platforms when giving least privilege access, so generally wouldn’t suggest that unless it is warranted, say for example because the function is interacting with the internet or users.

You’ll need to assign the role to the function. So go back to the Function Compute console:

Use the update button to edit the service configuration of your service function and then assign the role you created:

Hooking up the Log Service

You need logging or you won’t know what happened when functions have been running. This service is complicated, and we will do the minimum setup.

In the console, you go to the Log Service and will need to create a project.

Generally, I would suggest you give it the same name as the function service name, but maybe you prefer a central log for all function compute?

Make sure you are creating it in the same region as your functions. I chose “detail logs” with pay as you go.

There are a couple of decisions to make when creating a logstore. I turned off permanent storage and chose data retention for only 30 days (disabling permanent storage):

Alibaba Create Logstore with Data Retention for only 30 days
Alibaba Create Logstore with Data Retention for only 30 days

You might also get asked to import data, just X the dialog box.

There is one last important task in the logstore. Click on the “Search & Analysis” icon and then enable indexing:

If you try the “automatic index generation” button, it’s probably going to complain there is no data. You could run a function with hello world a couple of times, but we don’t have any true data right now, skip it and click on OK:

Back in the Function Compute console you can either use the service configuration tab as shown previously when assigning the function role or use the log tab:

Now add the new logstore:

Checkpoint: I would now try running the hello world function a couple of times to assure yourself logs are appearing:

Logging multi-row data can be cumbersome, and perhaps you are quickly trying to access some JSON data, a solution for this is to output as base64 to the log:

https://github.com/neilspink/ali-fc/blob/master/logging-event-object/logger-debug.js

module.exports.handler = function(event, context, callback) {
console.log('base64 of event');
let buff = new Buffer(event);
let base64data = buff.toString('base64');
console.log(base64data); callback(null, 'finished!');
}

It gives you some unintelligible text in the log, but you can then at least copy out of the log in one go:

then use a service like https://www.base64decode.org/ to get the data:

Event-Driven Functions

The trigger tab of the function allows you to set this up:

My use case is a CSV is uploaded to a bucket, so the trigger type is OSS. There are 3 events I catch; PutObject, PostObject and CompletedMultipartUpload.

The trigger below uses the suffix filter CSV. While I was developing my functions, I found it was not permitted to have multiple triggers using the same suffix, so suggest using a prefix like a folder name.

The trigger needs permission to be able to notify the Function Compute service. Use the “quick authorize” option — see below:

The trigger management screen will look something like this when you are done:

Copying A CSV To Another Bucket

I created an additional bucket called “csv-processing” which is where we will later do some parsing and I have created a new event for the copy-data function below.

The most important part here is knowing the OSS instance must be the bucket to which you copy too because you cannot push objects using the copy command.

https://github.com/neilspink/ali-fc/blob/master/file-copy/copy-oss-object.js

I think the event property (on line 8) is worth mentioning here. It was a stumbling block for me when writing my first function on Alibaba Cloud. If an event triggers the function then you get a list, I don’t know how multiple events could trigger a function at the same time, but it seems possible, you can see on line 12 that I assume it’s the first slice of the array.

If you set up the Log Service as was documented above, then you will have seen the code for logging the event in base64 to the logstore, that can be very handy here because you could then put the JSON into the “Test Event”, which saves you from continually having to upload a file while programming, debugging or testing your function.

Parsing A One Column In A CSV File

It can be precarious work, and usually, I would opt for a library because of all the formatting issues you run into with CSV files, but frankly, I wanted to avoid having to upload the function and npm packages as a zip file.

If you are looking for a library to assist in parsing your CSV, I found the npm package parse-csv was I found quite good.

I’m quite new to Node JS and spent some time researching and reading. An article I found very helpful was How to Process Epic Amounts of Data in NodeJS, Tom May covers how to extend a stream with transform which I blatantly copied.

I’m not entirely convinced the current design is good, but have tried it with 50 to 70 Mb files and found it worked well enough:

When I developed my ETL program I did it locally on my computer and have included the code in my repository:

https://github.com/neilspink/ali-fc/blob/master/csv-extract/test.js

It is a matter of instantiating the parser class and on line 103 below you see how the pipeline method is used to pipe between streams.

The test code is slightly different from how I run it using the ali-oss SDK:

https://github.com/neilspink/ali-fc/blob/master/csv-extract/parse-column2.js

Try the code and cannibalise, it’s better than me trying to explain it in too much detail here.

In The End

It was a bit of a hassle learning both NodeJS and the Alibaba SDK at the same time. If you already know Python, Java or .NET Core then you might want to use one of those.

A couple of things in particular that took me time to get to grips with were, writing async code the NodeJS way and figuring out when under function compute runtime that the ali-oss SDK needs you to add .Wrapper (see on line 6 in source code).

I feel there are too many NodeJS tutorials out there that just don’t go into any depth on solving real problems and Alibaba Cloud documentation currently just isn’t as complete as it really could be.

This was so simple, not! I know and care, so please let me know how I can improve this article.

neilspink

Written by

neilspink

Don’t miss it… Take control of the future now!

More From Medium

Also tagged Serverless

Also tagged Serverless Framework

Also tagged Function Compute

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade