When lazy is good: lazy import
Simple tips to boost your data science python package
You have a CLI python application, but even with a simple --help
command, it would need to wait a few tens seconds. Sounds familiar? What if by changing one line of code, we can speedup 100X the startup time? Might want to learn more? Let’s have a deep dive into lazy-importing
.
What is lazy import?
Lazy import of a module means the actual check and loading of the module will only be executed when its attribute is called. A more familier concept of lazy, as an analog of this in computing, is the lazy-computing, e.g. in Spark, when the DAG graph of processing command only be executing once an action is called.
Why we would need it?
Lazy import in python often mentioned in the context of CLI-package, when the run-time seems laggy. People start to notice that some of dependencies loading would take a huge of time. Often those are heavy dependency, e.g. , tensorflow or azureml etc… And the point is, some functionalities do not (yet) require those dependencies to run. Thus, if we can somehow delay the importing of a required dependency untill the moment that we do need it, that would improve the start-up time and fluidify the user’s experience of our python package.
Another important usage of lazy import is when you have a package with segmented functionalities: your end-user might only be interested in one segment utility of your package — and thus, should not be required to install dependencies which are non related to there final need.
Imagine that we need to create a package to abstract the access to different data sources on different platforms: Blobstorage, ADL, MySQL server and S3. For this purpose, we might need a bunch of different dependencies to cover those types of storages. However, among your end-users, there might be one who only care about access easily to the MySQL server. It is unreasonable to force him to install those dependencies required for interacting with azure-datalake — if he want to use our package. In such a scenario, lazy_import
might be what you might need.
Life before lazy
Without lazy import
Let’s start with an example of a module without lazy importing (eager import). Here we have two functions. One will require an optional package —e.g. azureml
. , another doesn’t require it. This example just exagerately simplifies a situtation with two use cases, that requires or not a dependency.
Now, if we want only to use the function func_without_optional_package
, ideally, we should not need to install the optional dependency, in this case, azureml
package.
However, if we test func_without_optional_package
:
Not supprisingly, we will have an import error. The ModuleNotFoundError
is due to the fact that we do not have the package azureml
installed.
Hacky import
With import at local scope
A simple fix that commonly used to fix this error is to import at local
scope, e.g. inside the concerned function. As you can see below, we move the importing for the func_requires_optional_package
inside its definition.
Now, if we test func_without_optional_package
:
Yippee! The test is passed. However, the problem is, with local import we risk of duplicating imports (if other functions would need those optional packages); and the import management will get nightmare when the code bases grow.
Let’s be Lazy
With lazy import
Another solution here is to use lazy import. We can implement a custom function for this, using LazyLoader
, class from importlib.util
. An example can be found from the stack overflow. Yet, another simpler way is to use a package called lazy_import
here. It can be install with from Pypi: pip install lazy_import
.
Now, if we test func_without_optional_package
:
Even simpler solutions?
regroup functions into separated modules
In the discussed sistuation, we can not help to wonder: Why don’t we just put all functions that might required optional package into separated modules. In this way, to use function without a requirement for an optional package, we just need to call the right module and its function without worry of non required optional package.
Let’s take the same problem as above, but now we dispatch func_with_optional_package
and func_without_optional_package
into two separated modules: module_with_optional_package
and module_without_optional_package
.
our modules now will look likes:
For module that will require the optional packages (e.g. azureml)
For module that will not require the optional packages
If now we retry our test:
Not surprisingly, everything will go smoothly without error.
We seems overcomplicated the problem? Not really. In many case, it is not possible to segment functionalities into different isolated modules. If we need to create a common entry point to abstract the detail implementation: e.g. a common interface to connect to different Machine learning algorithm or different storage types(and thus different packages), we might have problems.
Let’s take an example, we have an entry_module.py
, that dispatches requirement into the two modules with or without optional package.
The content of entry_module is as the following:
If we now test entry_function
:
We’re going to have an error:
To fix this, as discussed, we need to use lazy_import
.
If we test this function, everything is ok now:
What happen behind the scene?
Normally, when we do an import a_module
, there is two things happen:
- First, the required module is searched, and if found, Python will create a module object
types.ModuleType
and initialise it. the module’s name is registered into module namespace insys.modules
. If the required module cannot be found, Python will raise an errorModuleNotFoundError
. - Next, name binding step will define a name in local namespace where the import happen.
With the module lazy_import
, in the first step above is executed, and we actually have a lazy
module object registered in sys.models
but this is just a hollow module. The real loading and error raising will only be activated once an access to the lazy-module`s attributes. Thus in our case, we will not need to have the optional-module|package
installed to have the non-related function executed without the ImportErrors
.
We can actually see this with a simple example:
non_existed_azureml = lazy_import.lazy_module("azureml")
import numpy as np
we will see in locals()
:
‘non_existed_azureml’: Lazily-loaded module azureml
‘np’: <module ‘numpy’ from ‘.../lib/python3.8/site-packages/numpy/__init__.py’>
Note how lazy_import
has created a Lazily-loaded module
object that bypassed the real search, validation mechanism that would have applied to a normal import.
How to handle optional package?
Look cool! how will our end-user install optional package if needed?
Now that we know how to lazily import an optional package so that an end-user who does not need certain functionalities will not be forced to install those functionalities’dependencies. Yet, what if another user need those dependency packages: how we can properly document that package (and it version) so that the end-user know what to install?
For poetry users
- In the
pyproject.toml
addtool.poetry.extras
.
[tool.poetry.extras]
azureml = ["azureml-sdk>=1.45.0"]
and then to install: poetry install --extras "azureml"
.
For pip users
- In the
setup.py
, we can addextra_require
, e.g.
and to install: pip install "your_package[azureml]"
What is the catch?
Let’s go back to our initial example? What was the change that made the difference in the startup time of my dummy app?
My app requires functionalities provided by two modules. One is a heavy_module
with long loading time, and other is light_module
with very fast loading time.
If I use eager loading
, a simple access to my app, e.g., just to see the --help
option would required to load both modules. As you might guess, the catch is to use lazy-loading
for the heavy_module
, so that, we only load that module once we do need its functionality. So the change was:
***
I hope these information might be useful for some of you. The examples in this post have been exaggeratedly simplified for explaining purpose. In real life senario, other concerns might need to be taken care. As usual, please don’t hesitate to comment or suggest for a better practices.
Let’s be lazy in importing — but not in clapping :).
References
PEP 690: https://peps.python.org/pep-0690/
Python’s import doc: https://docs.python.org/3/reference/import.html
Detailed import statements: https://docs.python.org/3/reference/simple_stmts.html#import
Lazy import github: https://github.com/mnmelo/lazy_import