Breaking the ice between AWS Lambda & pyodbc

Connecting to MSSQL Server from AWS Lambda using Python

Anurag Narayan

--

Few weeks back at my workplace, I was required to write an API which would query data from an on-prem SQL Server Database. I didn’t like the overhead of building and maintaining a web application just for a single API. AWS API Gateway with Lambda proxy integration seemed to be perfect for this use case.

Within an hour in to the proof-of-concept, this error was staring in the face when running the Lambda function

{
"errorMessage": "Unable to import module 'lambda_function': libodbc.so.2: cannot open shared object file: No such file or directory",
"errorType": "Runtime.ImportModuleError"
}

This was after I followed the official Lambda instructions for packaging the python code along with pyodbc.

It took some time for me to sort this out and I thought the solution presents a good case for documentation for future reference. Note that some ready-made solutions are already available for this (See the references section). This is just a fresh take at explaining the problem and how to solve it from scratch!

Understanding the problem

Each Lambda instance under the hood is a container created from Amazon Linux AMI. pyodbc needs some native libraries to work which are not present by default inside the Lambda container. So to make things work, we need to ensure that the Lambda environment includes these native libraries in addition to pyodbc and our function code.

Overview of the solution

We will start with reproducing the problem locally by creating a container that is very similar to what Lambda uses. Then we will install all things that pyodbc needs by hand in that container. With the container ready, we will run the code (the one which talks to the database and will actually run in Lambda) locally in that container to confirm that everything works as expected. Lastly, we will package all those pyodbc dependencies in a neat zip file and upload it as a Lambda Layer

Reproducing the problem locally

We will use lambci/lambda docker image that mimics the Lambda environment nicely. If for some reason you cannot connect to the database server from your local environment, please follow the instructions to start an instance of MS SQL Server inside a docker container -

# STEP - 1 (Optional)
# Run containerized instance of MS SQL Server
docker run \
-e "ACCEPT_EULA=Y" \
-e "SA_PASSWORD=<DB_PASSWORD>" \
-p 1433:1433 --name sql \
-d mcr.microsoft.com/mssql/server:2019-CU5-ubuntu-18.04

Lambda assumes the user code to be inside /var/task and any layer code at /opt/python. Let’s ensure that -

# STEP - 2
# Create the function code
mkdir -p var/task
cat <<EOF > var/task/lambda_function.py
import pyodbc
print('Successfully import pyodbc')
server = 'sql'
username = 'SA'
password = '<DB_PASSWORD>'
def lambda_handler(event, context):
conn_str = 'DRIVER={ODBC Driver 17 for SQL Server};SERVER=' + server + ';UID=' + username + ';PWD=' + password
db = pyodbc.connect(conn_str)
print(db.execute('select GETUTCDATE()').fetchall())
EOF
# pyodbc library
mkdir -p opt/python/
pip install pyodbc -t opt/python/

The directory structure should look like this -

❯ tree .
.
├── opt
│ └── python
│ ├── pyodbc-4.0.30-py3.7.egg-info
│ │ ├── PKG-INFO
│ │ ├── SOURCES.txt
│ │ ├── dependency_links.txt
│ │ ├── installed-files.txt
│ │ └── top_level.txt
│ └── pyodbc.cpython-37m-x86_64-linux-gnu.so
└── var
└── task
├── __pycache__
│ └── lambda_function.cpython-37.pyc
└── lambda_function.py

Time to simulate the Lambda execution of our code locally -

# STEP - 3
# Run the function inside Lambda container
docker run \
--link sql \
--rm \
-v "$PWD"/var/task:/var/task \
-v "$PWD"/opt:/opt \
lambci/lambda:python3.7 \
lambda_function.lambda_handler

Notice how we are mounting the required directories and have named our function and the file accordingly — lambda_function.lambda_handler is in var/task and pyodbc is in opt/, both w.r.t to our current working directory. Here’s the output you would supposedly see -

{
"errorType":"Runtime.ImportModuleError",
"errorMessage":"Unable to import module 'lambda_function': libodbc.so.2: cannot open shared object file: No such file or directory"
}

which is the same error that I got when I ran the code inside AWS Lambda!

Let’s move on to the more interesting part now — actually solving the problem at hand.

Installing pyodbc with dependencies inside Lambda container

After cleaning up opt/ , let’s start a container that simulates Lambda environment and get a bash session inside it -

# Clean up opt/
rm -rf opt/
# Start Lambda container and get a shell inside it
docker run \
-it \
--rm \
--entrypoint bash \
-v ${PWD}:/host \
-e ODBCINI=/opt/odbc.ini \
-e ODBCSYSINI=/opt/ \
lambci/lambda:build-python3.7

Note that we are using a different docker image this time — lambci/lambda:build-python3.7 which has extra system packages installed intended for building and packaging the lambda function.ODBCINI and ODBCSYSINI environment variables are used by unixODBC driver manager to locate the configuration files.

Now that we are inside the container, let’s get to installing the dependencies.
pyodbc needs a Driver Manager and a compatible driver.

unixODBC — In unix environment, pyodbc requires a Driver Manager called unixODBC. We will need to download and compile it from source -

# Download unixODBC 
curl -s ftp://ftp.unixodbc.org/pub/unixODBC/unixODBC-2.3.7.tar.gz -O
# Install
tar xzvf unixODBC-2.3.7.tar.gz
cd unixODBC-2.3.7
./configure \
--sysconfdir=/opt \
--disable-gui \
--disable-drivers \
--enable-iconv \
--with-iconv-char-enc=UTF8 \
--with-iconv-ucode-enc=UTF16LE \
--prefix=/opt
make
make install
# Cleanup
cd ..
rm -rf unixODBC-2.3.7 unixODBC-2.3.7.tar.gz

Note that we have provided the installation directory as /opt

MS SQL Server ODBC Driver — Since we are attempting to connect to MSSQL Server, let’s use the ODBC Driver from microsoft. We need to download and install the driver and configure it to work at the same location as the one used in the unixODBC installation -

# Download
curl -s https://packages.microsoft.com/config/rhel/6/prod.repo > /etc/yum.repos.d/mssql-release.repo
# Install
yum -y install e2fsprogs.x86_64 0:1.43.5-2.43.amzn1 fuse-libs.x86_64 0:2.9.4-1.18.amzn1 libss.x86_64 0:1.43.5-2.43.amzn1
ACCEPT_EULA=Y yum -y install msodbcsql17 --disablerepo=amzn*export CFLAGS="-I/opt/include"
export LDFLAGS="-L/opt/lib"
# Move the driver to the correct location
cd /opt
cp -r /opt/microsoft/msodbcsql17/ .
# Cleanup
rm -rf /opt/microsoft/

With the two dependencies sorted, we are ready to install pyodbc.

pyodbc — Lambda expects the library/layer code to be at /opt/python Let’s install pydobc at the right place -

mkdir /opt/python/
cd /opt/python/
pip install pyodbc -t .

Let’s take care of configuring the unixODBC to work with the ODBC driver that we installed.

Configuration for unixODBCodbc.ini — This configuration file defines data sources and corresponding drivers whose definition is detailed in odbcinst.ini -

cd /opt
cat <<EOF > odbc.ini
[ODBC Driver 17 for SQL Server]
Driver = ODBC Driver 17 for SQL Server
Description = My ODBC Driver 17 for SQL Server
Trace = No
EOF

odbcinst.ini — This file is a registry and configuration file for ODBC driver and contains the definition of the driver that is specified in the Driver section of odbc.ini -

cat <<EOF > odbcinst.ini
[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/opt/msodbcsql17/lib64/libmsodbcsql-17.6.so.1.1
UsageCount=1
EOF

Note: Change the version specified for libmsodbcsql under Driver section in odbcinst.ini as per what you see under /opt/msodbcsql17/lib64/

ls -l /opt/msodbcsql17/lib64/libmsodbcsql*

And we are done installing the dependencies! On to testing!

Testing the solution

Before we exit the container, let’s create a zip file for the pydobc driver and all its dependencies -

# Package the content in a zip file to use as a lambda layer
cd /opt
zip -r9 ~/pyodbc-layer.zip .
# Copy pyodbc-layer to the host
cp ~/pyodbc-layer.zip /host
exit

Unzip the package in opt/ location -

mkdir opt
unzip ./pyodbc-layer.zip -d opt/

Ensure that
1. The PWD structure looks like below -

❯ tree opt opt/python -L 1
opt
├── ODBCDataSources
├── bin
├── include
├── lib
├── msodbcsql17
├── odbc.ini
├── odbcinst.ini
├── python
└── share
opt/python
├── pyodbc-4.0.30.dist-info
└── pyodbc.cpython-37m-x86_64-linux-gnu.so
❯ tree var
var
└── task
└── lambda_function.py

2. If the function code is not present at var/task and/or SQL Server container is not running, repeat first two steps from the Reproducing the problem locally section in the beginning.

Then repeat the third step -

# STEP - 3
# Run the function inside Lambda container
docker run \
--link sql \
--rm \
-v "$PWD"/var/task:/var/task \
-v "$PWD"/opt:/opt \
lambci/lambda:python3.7 \
lambda_function.lambda_handler

and we should see success!

Successfully import pyodbc
START RequestId: d62bf91b-2c40-1f61-104c-898f66210ddf Version: $LATEST
[(datetime.datetime(2020, 10, 11, 15, 44, 8, 690000), )]
END RequestId: d62bf91b-2c40-1f61-104c-898f66210ddf
REPORT RequestId: d62bf91b-2c40-1f61-104c-898f66210ddf Init Duration: 148.27 ms Duration: 131.83 ms Billed Duration: 200 ms Memory Size: 1536 MB Max Memory Used: 28 MB
null

Deploying the package as a Lambda layer

The last step is to make the life easy for our future-self and for others!
Let’s create a new Lambda layer containing the pyodbc package that we created -

aws lambda publish-layer-version --layer-name pyodbc-layer --description "pyodbc package" --zip-file fileb://pyodbc-layer.zip --compatible-runtimes "python3.7"

You can then associate this Lambda layer with the function of your choice -

aws lambda update-function-configuration --function-name my-function --layers <ARN of pyodbc-layer>

Once the layer gets associated, you can simply import pyodbc in your lambda function.

And the world is a happy place once again!

References

--

--