Snowpark For Python Open Source: How I Contributed And So Can You

Earlier this year during the Snowflake Summit 2022 two of the many exciting announcements for me personally were Snowpark for Python in Public Preview and open-sourcing of the Snowflake Snowpark Python API.

Why …?

I am glad you asked 😊 As a developer advocate and a technical evangelist (focused on Snowpark for Python), the more I know about a piece of technology and the closer I am to it, the better building blocks I can create for the community of developers (you!) so you can build bigger and better things with those blocks.

Also, as a developer at heart, I am always looking for opportunities to write code in any capacity–which not only helps me keep my hands dirty, but also helps me keep current and technical with the technologies. Not to mention this role at Snowflake gives me the best of both worlds. (BTW, we’re hiring…ping me!)

So… if I can, you can!

Before I share the steps and the process I followed, here’s my feature contribution — Ability to overwrite tables when using write_pandas(). So with the 0.9.0 release, as of August 30, 2022, you will be able to overwrite existing data, if any, in your Snowflake table with the data in your Pandas DataFrame using write_pandas() API.

Here’s how you can get started with contributing and implementing your own idea(s).

Ideation and Coding

Step 1 — The million dollar idea!

This should be relatively “easy” if you currently use Snowpark For Python 😜 Ok, I am not sure about the million dollars, but joking aside, see where you can extend the Snowpark for Python API functionality. And if you are new to this world and interested in learning, please checkout the Snowpark for Python examples on GitHub and QuickStart Guides.

Step 2 — Check the docs

But wait, before you proceed please check the documentation to see if what you’re looking for already exists or if there’s an issue created for the bug or a feature you have in mind.

Step 3 — Create an issue

You made it this far so you’re in luck! You get to contribute 👏 To begin the process, create an issue in the form of either a Bug Report or a Feature Request. At this point you may get comments/feedback from the engineering team. If you don’t, that’s mostly likely because the team is preparing for a new release. Yay! So please be patient.

Step 4 — Fork and clone the repo

Fork https://github.com/snowflakedb/snowpark-python and create a clone of the forked repo on your local machine.

Step 5 — Setup your development environment

  • Change folder and go to your cloned repository
  • Create a new Python virtual environment with the currently supported version Python 3.8. For example, if you are using Conda,conda create -name snowpark-dev python=3.8
  • Activate the new Python virtual environment. For example, if you are using Conda,conda activate snowpark-dev
  • From the root folder of the cloned repository, install the Snowpark API in development/edit mode by running python -m pip install -e “.[development, pandas]"

Step 6 — Setup your favorite IDE

I like to use PyCharm, but you can also use other IDEs like VS Code.

Step 7 — Setup your project

If you are using PyCharm, open the project and browse to the cloned git directory. Then right-click the directory src and click “Mark Directory as” -> “Source Root”. Note: VS Code doesn’t have “Source Root” so you can skip this step if you use VS Code.

Step 8 — Setup Python Interpreter

Depending on which IDE you are using, configure PyCharm interpreter or VS Code interpreter to use the previously created Python virtual environment in step 5.

Violla! Now you are ready to code and implement your idea. Good luck!

Testing

As with most coding activities though, you’re not done unless you’re done. In other words, writing tests with “good” coverage and running them to make sure they all pass is crucial and equally important to maintain the quality of the codebase. To configure your environment for testing your code using pytest or tox, follow the instructions outlined here.

Pull Request

So close! Once you’re done coding and testing, the final step is for you to create a pull request. At this point you may get feedback from the engineering team that might require you to either modify your code and/or the tests. Eventually when everything looks good on both ends, your PR will be merged into the master branch.

That’s it!

In Summary

I am so excited and proud of myself to have come up with a new feature idea, discuss it with the engineering team, code and test my implementation, and to get it merged into the master branch and have it released in the 0.9.0 version, as of August 30, 2022.

I look forward to hearing about your contributions and hopefully using them in my future projects. Remember… if I can contribute, so can you! 😎

Learn more about Open Source at Snowflake.

Thanks for your time, and please follow me on Twitter and LinkedIn where I share demo videos, code snippets, and other interesting artifacts specifically around Snowpark.

--

--

Dash Desai
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Lead Developer Advocate @ Snowflake | AWS Machine Learning Specialty | #DataScience | #ML | #CloudComputing | #Photog