Python in ECMAScript Modules

Yorkie Neil
imgcook
Published in
5 min readMay 22, 2020

Author: @rickyes

If you don’t know what Boa is before you see this article, you can check out this: Boa: Use Python functions in Node.js.

Introduction

Before this feature was implemented, Boa loaded the Python library like this:

const boa = require('@pipcook/boa');
const { range, len } = boa.builtins();
const { getpid } = boa.import('os');
const numpy = boa.import('numpy');

Import the name of the Python library through the boa.import function to import. If there are many packages to be loaded, import() may appear redundant. To this end, we have developed a custom import semantic declaration, using ES Module to achieve a more concise import statement:

import { getpid } from 'py:os';
import { range, len } from 'py:builtins';
import {
array as NumpyArray,
int32 as NumpyInt32,
} from 'py:numpy';

The implementation of the above function depends on the experimental function of Node --experimental-loader. Sexual, so later versions added experimental.

— experimental-loader

When starting the program, you can implement a custom loader by specifying this flag and accepting a file with the suffix mjs. The mjs file provides several hooks to intercept the default loader: -resolve — getFormat — getSource — transformSource — getGlobalPreloadCode — dynamicInstantiate

The order of execution is:

( Each `import` runs the process once )                                  
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
( run once ) | ---> dynamicInstantiate |
getGlobalPreloadCode -> | resolve -> getFormat -> | |
| ---> getSource -> transformSource |
|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|

getGlobalPreloadCode Hook

Only run once, used to run some code in the global scope when the application starts. You can hang variables in the global scope through the globalThis object. Currently, only a getBuiltin (similar to require) function is provided to load the built-in module:

/**
* @returns {string} Code to run before application startup
*/
export function getGlobalPreloadCode() {
return `\
globalThis.someInjectedProperty = 42;
console.log('I just set some globals!');
const { createRequire } = getBuiltin('module');
const require = createRequire(process.cwd() + '/<preload>');
`;
}

Then someInjectedProperty can be used in all modules.

resolve Hook

It is the entrance of the loader, intercepting the import statement and import() function, which can judge the imported string and return a specific logical url:

const protocol = 'py:';/**
* @param {string} specifier
* @param {object} context
* @param {string} context.parentURL
* @param {function} defaultResolve
* @returns {object} response
* @returns {string} response.url
*/
export function resolve(specifier, context, defaultResolve) {
if (specifier.startsWith(protocol)) {
return {
url: specifier
};
}
return defaultResolve(specifier, context, defaultResolve);
}

Parameter definition:-specifier-string in import statement or import() expression:

import os from 'py:os'; // 'py:os'
async function load() {
const sys = await import('py:sys'); // 'py:sys'
}

parentURL — Import the module’s parent module url

import os from 'py:os'; // file:///.../app.mjs

Here is a detail point. The first parameter of the defaultResolve function only accepts strings of three protocols:-data:-javascript wasm string-nodejs:-built-in module-file:-third-party or user-defined module.

Boa’s custom py: protocol does not pass the default resolve Hook parameter check, so after determining the match above, directly return the url and pass it to getFormat Hook.

getFormat Hook

Provide multiple ways to define how to parse the URL passed by resolve Hook:-builtin-Node.js built-in module-commonjs-CommonJS module-dynamic-dynamic instantiation, trigger dynamicInstantiate Hook-json-JSON file-module-ECMAScript module-wasm- WebAssembly module.

From the functional description, only dynamic meets our needs:

export function getFormat(url, context, defaultGetFormat) {
// DynamicInstantiate hook triggered if boa protocol is matched
if (url.startsWith(protocol)) {
return {
format: 'dynamic'
}
}
// Other protocol are assigned to nodejs for internal judgment loading
return defaultGetFormat(url, context, defaultGetFormat);
}

When the URL matches the boa protocol, the dynamicInstantiate Hook is triggered. In other cases, the default parser is used to determine the load.

dynamicInstantiate Hook

Provide a dynamic loading module method of several parsing formats different from getFormat:

/**
* @param {string} url
* @returns {object} response
* @returns {array} response.exports
* @returns {function} response.execute
*/
export function dynamicInstantiate(url) {
const moduleInstance = boa.import(url.replace(protocol, ''));
// Get all the properties of the Python Object to construct named export
// const { dir } = boa.builtins();
const moduleExports = dir(moduleInstance);
return {
exports: ['default', ...moduleExports],
execute: exports => {
for (let name of moduleExports) {
exports[name].set(moduleInstance[name]);
}
exports.default.set(moduleInstance);
}
};
}

Use boa.import() to load the Python module, and use the built-in dir function of Python builtins to get all the attributes of the module. The hook needs to provide the export list in advance and pass it to the exports parameter to support Named exports, plus default to support Default exports. The execute function sets the attribute corresponding to the specified naming when initializing the dynamic hook.

getSource Hook

Used to pass the source code string, providing a different way to get the source code than the default loader reads files from the disk, such as network, memory, and hard coding:

export async function getSource(url, context, defaultGetSource) {
const { format } = context;
if (someCondition) {
// For some or all URLs, do some custom logic for retrieving the source.
// Always return an object of the form {source: <string|buffer>}.
return {
source: `export const message = 'Woohoo!'.toUpperCase();`
};
}
// Defer to Node.js for all other URLs.
return defaultGetSource(url, context, defaultGetSource);
}

What is more interesting here is that you can get the source code from different sources. For example, you can get the source code from the Internet in the same way as Deno.

transformSource Hook

After the getSource Hook is executed and the source code is loaded, the hook can modify the loaded source code:

export async function transformSource(source,
context,
defaultTransformSource) {
const { url, format } = context;
if (source && source.replace) {
// For some or all URLs, do some custom logic for modifying the source.
// Always return an object of the form {source: <string|buffer>}.
return {
source: source.replace(`'A message';`, `'A message'.toUpperCase();`)
};
}
// Defer to Node.js for all other sources.
return defaultTransformSource(
source, context, defaultTransformSource);
}

The example provided above replaces a specific string in the source code with a new string, and can also be compiled on the fly:

export function transformSource(source, context, defaultTransformSource) {
const { url, format } = context;
if (extensionsRegex.test(url)) {
return {
source: CoffeeScript.compile(source, { bare: true })
};
}
// Let Node.js handle all other sources.
return defaultTransformSource(source, context, defaultTransformSource);
}

End

Compared with the previous Node.js, you can only load modules in several built-in ways. Nowadays, the hooks that are open can combine a lot of interesting functions. You can dig more interesting scenarios. You can refer to https://github.com/alibaba/pipcook/pull/191.

References

--

--