Hey, thanks for this post. Super helpful.

One minor correction please, if you will, the docker repo "snowsec/samui" no longer exists. You may refer to - https://hub.docker.com/u/snowsec

And so, the new docker pull command should be:

docker pull snowsec/snowalert-webui

Also, the docker run command should be:

docker run -it -d -p 8000:8000 --env-file snowalert-<ACCOUNTNAME>.envs snowsec/snowalert-webui

Photo by Kelly Sikkema on Unsplash

Recommendations for a Snowflake powered data lake

Modern enterprises, today, deal with a wide variety of massive, fast moving data sources that put a huge strain on data teams to consistently extract, transform and load data to derive meaningful insights. Data, and thus information, does not provide tangible value unless managed strategically for the purposes of analytics.

What is a Data Lake?

How to setup Dark Mode in Snowflake Web UI?

Photo by Tim Mossholder on Unsplash

I’ve been tinkering with the Snowflake Data Warehouse for a while now. I think it is the best data warehouse on the market at the moment considering it’s performance, scalability and other unique and incredible features (which I will try to capture in another article). …

The easiest way to MapReduce

Photo by Fabian Blank on Unsplash

In 2006, the good folks at Yahoo! Research developed a simple and intuitive way to create and execute MapReduce jobs on very large data sets. The following year, the project was accepted by Apache Software Foundation and shortly thereafter, released as Apache Pig.

Simplifying the MapReduce Framework

Apache Hadoop MapReduce Architecture

In 2003, Google suggested a fascinating framework to implement parallel processing on large datasets distributed across multiple nodes, through their revolutionary whitepaper titled, “MapReduce: Simplified Data Processing on Large Clusters”.

Now, MapReduce (MR) is Hadoop’s primary processing framework that is leveraged across multiple applications such as Sqoop, Pig, Hive, etc.

Data is stored in HDFS

Trickle-feed unstructured data into HDFS using Apache Flume

Unstructured Log — Photo by Joel & Jasmin Førestbird on Unsplash

We’ve discussed how Apache Sqoop is used to extract structured data from our relational MySQL database (RDBMS) and how to push that data into HDFS and back.

The question now is how do we get unstructured data into HDFS? We use Apache Kafka, no no no…Flume. Apache Flume.

Apache Flume…

RDBMS to HDFS and back

Across the globe, the most popular databases are SQL based. Considering this, it is paramount for any data lake to be able to pull data from an RDBMS database. Thus, Apache Sqoop was born.

Apache Sqoop High-Level Data Flow

Apache Sqoop supports bi-directional movement of data between any RDBMS and HDFS, Hive or HBase, etc…

All about Resource Allocation and High Availability in Hadoop

YARN Architecture

Architecture and Working

YARN or “Yet Another Resource Negotiator” does exactly as its name says, it negotiates for resources to run a job.

YARN, just like any other Hadoop application, follows a “Master-Slave” architecture, wherein the Resource Manager is the master and the Node Manager is the slave. The master allocates jobs and…

A comprehensive guide to understanding HDFS and it’s inner workings

From a computing perspective, there are essentially 2 types of scaling — vertical and horizontal. In vertical scaling, we simply add more RAM and storage to a single computer/machine aka “node”. In horizontal scaling, we add more nodes connected through a common network, thereby increasing the overall capacity of the…

Reduce storage overhead significantly in your HDFS cluster by leveraging Erasure Coding

Assumptions:

You’ve gone through and internalized Hadoop Distributed File System or HDFS’ basic concepts — Blocks & Replication Factor, Storage & Replication and Rack Awareness

Background

Hadoop Distributed File System (HDFS) blocks and replication methodology has two key concepts, i.e. “Block Size” and “Replication Factor”. Each file that enters HDFS is…

Prathamesh Nimkar

Tech Enthusiast — Data Engineering | Data Analytics | LinkedIN: https://www.linkedin.com/in/prathameshnimkar/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store