Troubleshooting While Turning a Pycharm Project into an AWS Lambda Function (Windows)

Kevin Lin
Kevin Lin
Jan 6 · 8 min read
Photo by Safar Safarov on Unsplash

The project I had originally was a bot that used Pytrends and Tweepy’s APIs to tweet out a joke whenever I ran the Python script. I wanted to make this into a Lambda function instead so that it could be run every day automatically. This guide showcases some errors that I ran into along with their solutions. Check out the table of contents if you have a specific error that you came looking for, or read through the whole thing if you’re trying to do a similar porting of a Python project into Lambda.

As a student, I am far from an expert, but the purpose of this article is to put out some answers that I wish I could find while working on my project. My greatest recommendation to anyone struggling with errors is to visit the AWS Discord (https://discord.gg/ZNGyXk). Remember to be courteous and to be patient with answers to your question. As a disclaimer, I don’t know if anything I did was best practice, but I fixed my own problems successfully.

Table of contents:

For some reason I need to click the link twice to navigate to the section.

Serverless vs Jetbrains plugin: read this section if you have no idea what you’re doing

Requirements.txt problems

Config file out, environment variables in

Reading and writing from s3 bucket txt files

AccessDenied when calling the GetObject operation

Cannot read property ‘artifact’ of undefined

Serverless vs Jetbrains plugin: read this section if you have no idea what you’re doing

The first problem was figuring out which technologies to use. I attempted to use the Pycharm AWS plugin, but I struggled to figure out how to avoid the ImportModuleError. The error looks like this, a response when running a Lambda function, with pytrends being replaced with whichever non-included library you’re using.

{ 
“errorMessage”: “Unable to import module ‘app’: No module named ‘pytrends’”,
“errorType”: “Runtime.ImportModuleError”
}

The problem is that the Python environment run in Lambda doesn’t have the requirements that you’ve installed. To fix this, I switched to the Serverless framework, which has the Serverless Python Requirements plugin to handle the requirements for you. This excellent Medium guide on using Serverless with Python demonstrates how to use the requirements plugin. That guide honestly carried me very far, as it sets the groundwork using the python templates.

The crucial takeaway from that guide is the ability to create a new .py file with a handler, add it to the serverless.yml file and deploy. After deploying, the Lambda function appears on the AWS Lambda console (check different servers if you don’t see it ex: Oregon, N. Virgina).

From there, I had some errors that are main contribution of this article.

Requirements.txt problems

Firstly, the aforementioned guide used

echo “requests” >> requirements.txt

to add requests to the requirements text file, but that actually added “requests” and not requests (no quotation marks). For your function to work, your requirements should look something like this:

requirements.txt of my project

You don’t need to put boto3, os, json because they are included in Lambda’s version of Python by default. If you don’t know which libraries are already included by default, you can do what I did and just copy down all the libraries from your import statements into requirements.txt, and deal with the errors one by one. The error for json looks like this on the sls deploy reply:

STDERR: ERROR: Could not find a version that satisfies the requirement json (from -r /var/task/requirements.txt (line 1)) (from versions: none)
ERROR: No matching distribution found for json (from -r /var/task/requirements.txt (line 1))

If you see this, it means you need to remove that line from your requirements.txt, in this case it was json that needed to be removed.

Config file out, environment variables in

Next, I originally had a config.json file to store my API keys for Twitter. Since Lambda doesn’t have access to that file, my choices were to upload the file to a database, retrieve it every time and parse it, or to set environment variables in Lambda, which was much easier.

Here is the “before” code, using the config file:

# Open config file
with open(‘config.json’) as config_file:
config = json.load(config_file)
# Tweepy authentication
auth = tweepy.OAuthHandler(config[‘keys’][‘api_key’], config[‘keys’][‘api_key_secret’])
auth.set_access_token(config[‘keys’][‘consumer_key’], config[‘keys’][‘consumer_key_secret’])
api = tweepy.API(auth)

What I needed to do was add environment variables by going to the Lambda console → functions →click your function →scroll down to Environment variables, and inputting your key value pairs as follows: (keys redacted)

Then, the “After” code looks like this:

# Tweepy authentication using AWS environmental values
auth = tweepy.OAuthHandler(os.environ['api_key'], os.environ['api_key_secret'])
auth.set_access_token(os.environ['consumer_key'], os.environ['consumer_key_secret'])
api = tweepy.API(auth)

Remember to import os at the top of your .py to let this work.

Reading and writing from s3 bucket txt files

The next big problem was that my bot originally read words from .txt files as potential candidates for tweeting. This has the same problem as the config.json, but this time, I was also writing to files to keep track of which words I had already used in tweets. This called for a database, and the people in the Discord recommended s3 because it is free and I didn’t need any heavy duty data flow for simple text files. I created a bucket and uploaded the text files.

To use s3, you need to import boto3 at te top of your .py file. Again, here is the before and after:

Before, fetching from a file called samples.txt in the same directory as the .py file:

lines = open("samples.txt").read().splitlines()
chosenSamplesString = random.choice(lines)

After, using a s3 bucket called playbarnesbot:

s3 = boto3.resource('s3')
samples = s3.Object('playbarnesbot', 'samples.txt')
body = samples.get()['Body'].read().splitlines()
chosenSamplesString = random.choice(body)
tempString = chosenSamplesString.decode('ascii') #ascii or utf-8

Notice that I have to decode the string because it is binary data when we pull it from the s3 bucket. A sampleWord seen in the .txt file would be retrieved as b’sampleWord’ without decoding.

Here is another sample, a method that checks to see if a string is contained within a file called search.txt in the playbarnesbot bucket:

def checkHistory(string):
s3 = boto3.resource('s3')
obj = s3.Object('playbarnesbot', 'search.txt')

if string in str(obj.get()['Body'].read()):
print(string + ' has been used already')
return True
else:
print(string + ' hasn\'t been used before')
return False

This offered a challenge going the other way: if it wasn’t in the search.txt file, how could I add it to the file? I learned that objects (such as .txt files) are immutable in s3 buckets, meaning that they can’t be changed. This meant that I couldn’t just append to the samples.txt file as I had before. Instead, I had to download the file to the /tmp/ storage allocated to every Lambda function, edit the file as normal, and upload it again to the s3 bucket.

Before, simply appending to the .txt file locally:

def addHistory(string):
with open('search.txt', "a", encoding="utf-8") as file:
file.write(string)
file.write("\n")

After, downloading the file to /tmp/ directory and reuploading it after editing:

def addHistory(string):
s3_client = boto3.client('s3')
s3_client.download_file('playbarnesbot','search.txt','/tmp/search.txt')
with open('/tmp/search.txt', "w", encoding="utf-8") as file:
file.write(string)
file.write("\n")
s3_client.upload_file('/tmp/search.txt','playbarnesbot','search.txt')

I’ve been advised to use tempfile instead of /tmp/, but this worked for my purposes so exploring tempfile can be a take-home exercise for the reader.

AccessDenied when calling the GetObject operation

The previous section’s code for retrieving objects should have worked, but it didn’t, instead giving me this error:

{
“errorMessage”: “An error occurred (AccessDenied) when calling the GetObject operation: Access Denied”,
“errorType”: “ClientError”,
“stackTrace”: [
“ File \”/var/task/playBarnes.py\”, line 89, in lambda_handler\n body = samples.get()[‘Body’].read().splitlines()\n”,
“ File \”/var/runtime/boto3/resources/factory.py\”, line 520, in do_action\n response = action(self, *args, **kwargs)\n”,
“ File \”/var/runtime/boto3/resources/action.py\”, line 83, in __call__\n response = getattr(parent.meta.client, operation_name)(**params)\n”,
“ File \”/var/runtime/botocore/client.py\”, line 272, in _api_call\n return self._make_api_call(operation_name, kwargs)\n”,
“ File \”/var/runtime/botocore/client.py\”, line 576, in _make_api_call\n raise error_class(parsed_response, operation_name)\n”
]
}

This error was caused by the serverless account not having permission to access the bucket’s data, and I am not 100% sure of what fixed this, but I did the following and the problem was resolved:

  1. I went to the s3 console, and made my bucket not block public access (Click bucket → Permissions →edit block public access). I’m not sure if this step was necessary.
  2. I wrote a bucket policy to allow access (bucket →Permissions →bucket policy)

The policy looks like this, but you’ll probably want to use the policy generator to make your own.


{
“Version”: “2012–10–17”,
“Id”: “Policy1578283143575”,
“Statement”: [
{
“Sid”: “Stmt1578283132694”,
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arnofyourserverlessuser”
},
“Action”: “s3:*”,
“Resource”: “arn:aws:s3:::playbarnesbot/*”
}
]
}

You can find the arn of your serverless user in the IAM console under Users →Serverless →User arn. Replace playbarnesbot with your bucket. This step was probably necessary.

3. In the IAM console, I wrote an inline policy for the serverless user, granting it access to the bucket.

The policy looks like this:

{
“Version”: “2012–10–17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“s3:ListBucket”
],
“Resource”: [
“arn:aws:s3:::playbarnesbot”
]
},
{
“Effect”: “Allow”,
“Action”: [
“s3:PutObject”,
“s3:GetObject”,
“s3:DeleteObject”
],
“Resource”: [
“arn:aws:s3:::playbarnesbot/*”
]
}
]
}

This step also seems important but the next step is what fixed the problem.

4. In the lambda settings, under execution role, I clicked the “view my-serverless-project… role”

and then attached the AdministratorAccess policy to the role. Note that the serverless user already had the AdministratorAccess policy, but this was the role.

This fixed my issue, but if you still have trouble, you can try the troubleshooting guide by Amazon.

Cannot read property ‘artifact’ of undefined

The last bit of trouble I encountered was running sls deploy and sls invoke -f playBarnes (standard procedure for testing the Lambda function), running into no Python/Lambda bugs but then getting this error:

TypeError: Cannot read property ‘artifact’ of undefined
at ServerlessPythonRequirements.BbPromise.bind.then.then.then (C:\Users\Kevin\myService2\node_modules\serverless-python-requirements\index.js:176:48)
at ServerlessPythonRequirements.tryCatcher (C:\Users\Kevin\myService2\node_modules\bluebird\js\release\util.js:16:23)
at Promise._settlePromiseFromHandler (C:\Users\Kevin\myService2\node_modules\bluebird\js\release\promise.js:547:31)
at Promise._settlePromise (C:\Users\Kevin\myService2\node_modules\bluebird\js\release\promise.js:604:18)
at Promise._settlePromise0 (C:\Users\Kevin\myService2\node_modules\bluebird\js\release\promise.js:649:10)
at Promise._settlePromises (C:\Users\Kevin\myService2\node_modules\bluebird\js\release\promise.js:729:18)
at _drainQueueStep (C:\Users\Kevin\myService2\node_modules\bluebird\js\release\async.js:93:12)
at _drainQueue (C:\Users\Kevin\myService2\node_modules\bluebird\js\release\async.js:86:9)
at Async._drainQueues (C:\Users\Kevin\myService2\node_modules\bluebird\js\release\async.js:102:5)
at Immediate.Async.drainQueues [as _onImmediate] (C:\Users\Kevin\myService2\node_modules\bluebird\js\release\async.js:15:14)
at runCallback (timers.js:705:18)
at tryOnImmediate (timers.js:676:5)
at processImmediate (timers.js:658:5)
at process.topLevelDomainCallback (domain.js:120:23)

This is caused by the Python Requirements Plugin apparently, and it is discussed here. There didn’t seem to be a fix (besides reverting versions) at the time that I read the github issue, but I ended up getting it to work by using

sls deploy --stage devsls invoke -f playBarnes

instead of the standard deploy. I’m not sure why this worked, but it did, and subsequent regular deploys also allowed regular invoking.

I hope that at least one person finds an error they were experiencing and fixes it using this article.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Kevin Lin

Written by

Kevin Lin

Engineering Physics student, environmentalist, Youtuber/Twitch stream editor

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade