How to start with Snowflake ❄️
You might be starting your career as a Data Engineer, Data Scientist or Analyst. You might be already proficient in any traditional relational database and you would like to do a next step into modern, cloud based data platform as Snowflake today is. But you just don’t know how and where to start. You are at the right place. In this blog post we will go through the basic concepts around Snowflake, explain the architecture, share the important learning materials.
This post should contain everything what you might need to successfully start using Snowflake, understand the key features and being proficient with your new platform. Following list of resources should be comprehensive enough to cover everything what you might need at the beginning but it is not intended to be complete.
What is Snowflake
Today, Snowflake is cloud based data platform supporting many data related workloads including Data Warehouses, Data Lakes, Data Science, Data Apps or newly also Cybersecurity. Snowflake supports all form of data. Starting with traditional structured data, through semi-structured (e.g. JSON files) up to unstructured data like. Snowflake works as Software as a Service (SaaS) meaning there is no license to buy or any HW to maintain. You just sign-up for an account and start using it. It is self-managed platform with almost zero maintenance required from you. It supports all major clouds (AWS, Azure, GCP) which makes it global, cross cloud and cross region platform. Snowflake is platform with governed data access, strong security in place but it is also programmable and offering all the possible connectors. Newly also SnowPark for Python allowing you to run your Python code directly on Snowflake virtual warehouses! Last but not least Snowflake has great Data Sharing options which makes data collaboration smooth and easy. It also includes the Data Marketplace for monetization of your data.
In terms of architecture you can find multiple schemas which shows architecture from different perspectives. Let’s start with brand new schema presented at Snowflake Summit 2022 showing how Snowflake Data Cloud is changing in relation to newly introduced features like Unistore or CyberSecurity workload.
This schema shows Snowflake infrastructure as unified, cross cloud and cross region platform. Thanks to Snowgrid data silos are broken down, data can be instantly shared in governed way. Business continuity could be ensured thanks to cross-cloud and cross-region support.
At The top layer of that blue box you can see all supported data approaches. Starting with traditional data warehouse (OLAP), through file based data lakes, up to Unistore which represent traditional transactional data processing (OLTP).
Last, but not least, supported workloads are represented as coloured boxes sitting on the whole data cloud platform. Including data engineering, Data Science and others.
This was Snowflake architecture view from a distance. Now let’s check how Snowflake Data Cloud platform looks inside, under the hood. You might have already noticed that many times we speak about separated storage and compute in relation to Snowflake. I will try to explain what is it and how it works on following schema. This schema shows „internal“ architecture of Snowflake platform.
Here you can see three layer architecture with centralized storage. Centralized in relation to your account and region where you have your Snowflake account provisioned. Each region has multiple availability zones. Data are always stored encrypted & compressed. On the top of storage layer is multi-cluster compute layer.
Compute clusters are called virtual warehouses in Snowflake terminology. As you can see you can have multiple instances of virtual warehouses. Each can have different size. Snowflake utilizes T-shirt sizes for virtual warehouses. Starting with extra small (XS) up to 6x extra large (6XL). T-shirt size determines how much compute power you get for a virtual warehouse same as how much you will be billed for that.
Cost management is so complex topic that I have already covered it in 3 different stories. Please go there for more details. There is Snowflake Cost Management Overview providing the basics about costs in Snowflake. And then two more blog posts covering the cost optimization from two angles:
- Compute cost optimization: Snowflake Cost Optimization: Part I
- Storage cost optimization & performance tuning tips: Snowflake Cost Optimization Part II
The top layer of Snowflake architecture is called Cloud service layer. It process all the requests which comes to your account . No matter of which connector is used or whether the request has come through UI or CLI. The utilization of cloud services is free up to 10% of daily computer credits, which means most customers will not see incremental charges for cloud services usage.
How to start?
Start using Snowflake is as easy as create a new email box. Just go to trial.snowflake.com, choose the license (I would recommend to try Enterprise to have access to more features) and your favourite cloud provider + region.
And that’s it! You have your Snowflake account. Go to provided URL address to login. Please note that trial account is valid for 30 days and you will receive $400 equivalent in Snowflake credits. Don’t worry, it is enough for trying all the features and performance of the Snowflake.
Snowflake currently offers two different web UI which can be used to administer and interact with your account. First one is called Classic Web Interface.
And the new one is called Snowsight. Try both and choose whatever will suit you better. I am just trying to move from Classic one to the Snowsight. 🙂
Snowflake also provides sample data if you do not have your own data which you would like to use for testing. You can find them in database called SNOWFLAKE_SAMPLE_DATA. This database will be available for you in the trial account. There are various datasets with different sizes. There are even tutorials available. How to work with sample data can be found in documentation.
What materials to use for learning? There are so many great resources provided directly by Snowflake and they are available for free. Let’s say you want to shape your theoretical knowledge about Snowflake first. Then head to Snowflake University here 👇 https://community.snowflake.com/s/snowflake-university
There You can find various courses covering different areas from Snowflake Data Cloud. Do all of them or just select what you want to know. You will receive nicely looking digital badge when you finish the course.
OK, you have done some courses at Snowflake university. Now you want to get some more real hands-on experience. Maybe you are interested how to build CI/CD for Snowflake or How to work with DBT and Snowflake? Then go to Snowflake quickstarts 👉 https://quickstarts.snowflake.com/. This web is full of great hands-on tutorials covering many features of Snowflake. Just find what you want to try and go!
Virtual hands-on labs
If you want some guidance during your hands-on activities then try to check the actual offer of free, instructor-led, virtual hands-on labs provided by Snowflake 👉 https://www.snowflake.com/virtual-hands-on-lab/
From zero to Snowflake
If you would like to go through complex material covering many topics from those really simple ones like creating your first DB and schema through data imports, data sharing, RBAC model, semi-structured data… To make long story short, material which covers almost everything. Then go and check the From Zero to Snowflake Series from my fellow Data Superhero Chris Hastie. It is available here👇 https://interworks.com/blog/chastie/2019/10/18/zero-to-snowflake-creating-your-first-database/
Are you struggling with something?
Probably you will get stuck at some time. In such situations is always better to ask other folks for a hint or recommendation. There are several places where you can discuss your Snowflake related issues.
Of course there is a StackOverFlow. You can find all Snowflake related discussions under tag [snowflake-cloud-data-platform] 👇 https://stackoverflow.com/questions/tagged/snowflake-cloud-data-platform
We have also quite new Snowflake community space here 👇 https://community.snowflake.com/s/forum
Apart from forums, you can find all data Superheroes there, different User Groups, Knowledge Base.
Snowflake Forums is not only a place for your issues. You can discuss anything Snowflake related, covering the new features, up to your wishes.
Last but not least, if you are a Reddit fan, you can find also Snowflake related content there👇
There is also Snowflake official medium publication full of stories related to Snowflake from each and every perspective. You can find there reference architectures, how others build things around Snowflake, first hands-on experiences with latest features and much more, join us here👇 https://medium.com/snowflake
That is not all for sure. There are tons of other sources, you just need to be cautious about their quality. Not everything what is online is top notch. You can find several courses related to Snowflake on Udemy. Just check reviews and select what covers the topic you want to learn.
Then there is YouTube. There you can find a channel from another Data Superhero Rajiv Gupta who covers different topics from Snowflake world: https://www.youtube.com/c/RajivGuptaEverydayLearning)
Snowflake has own YouTube channel as well 👇 https://www.youtube.com/c/SnowflakeInc
It is full of various content including Snowflake 101 or features introduced at the latest Summit.
I want to be certified!
Have you managed to go through all the courses at Snowflake University? Have you got some real hands-on experience from various Snowflake projects and you want to validate your skills with gaining the certification? Cool! Let’s elaborate little bit about Snowflake certifications.
There are two levels:
- SnowPro Core — entry level certification
- SnowPro Advanced — role based advanced certifications (Architect, Administrator, Data Engineer, Data scientist)
In order to get the advanced certification you need to hold the SnowPro Core. Although the SnowPro core is an „entry level“ certification it does not mean it is easy to get. It is going to be reworked in Autumn 2022 with new set of questions and updated covered areas. You need to prove that you really know Snowflake platform, understand the architecture concepts, and all other aspects which are required in order to be proficient in using this platform.
All details related to Snowflake certifications could be found here👇 https://www.snowflake.com/certifications/
How to prepare for the exam itself? I have already published two separated blog posts which covers the preparation in detail. If you are going to start your preparation journey then you can check those posts.
The first one is related to SnowPro Core and Advanced Architect preparation 👇
How to Prepare for Snowflake Certifications
Check out my preparation journey, study tips, and top resources for Snowflake certifications
The second one is then dedicated to the Advanced Administrator certification 👇
I hope you will find the list useful. The aim was collect all important sources and create a good sitemap with all meaningful resources which might be needed when you start exploring a new platform. The created list is not intended to be complete. It is almost impossible as basically there are being created a new resources on daily basis. If you think there is something missing in the list, something what you have used, please share it with us as a response to this story.