Scalable Analytics in Python w/ Dask

Aditya Yadav
Sep 30, 2019 · 4 min read

Scalable Analytics in Python w/ Dask

What is Dask?

Dask provides high-level Array, Bag, and DataFrame collections that mimic NumPy, lists, and Pandas but can operate in parallel on datasets that don’t fit into main memory. Low Level schedulers: Dask provides dynamic task schedulers that execute task graphs in parallel.

Dask vs Spark

Dask is an Alternative to Spark.

  • Spark dataframes will be much better when you have large SQL-style queries (think 100+ line queries) where their query optimizer can kick in.
  • Dask dataframes will be much better when queries go beyond typical database queries. This happens most often in time series, random access, and other complex computations.
  • Spark will integrate better with JVM and data engineering technology. Spark will also come with everything pre-packaged. Spark is its own ecosystem.
  • Dask will integrate better with Python code. Dask is designed to integrate with other libraries and pre-existing systems. If you’re coming from an existing Pandas-based workflow then it’s usually much easier to evolve to Dask.
Distributed Execution on a Dask Cluster

First Step — Setup The Dask Conda Environment

Please download the files for this article from here http://bit.ly/2okHfgu

Double click create-dask-environment.cmd

Next Step — Setup A Distributed Dask Cluster

A Distributed Dask Cluster consists for a Dask Scheduler and multiple Dask Workers

Start ‘one’ dask scheduler and multiple dask workers as follows…

Double click start-dask-scheduler.cmd

Double click start-dask-worker.cmd 2–3 times to start 2–3 workers

Double click start-dask-web-interface.cmd

Finally Lets Start Jupyter

Double Click on start-jupyter.cmd it will launch jupyter in the browser. You will find 5 notebooks, open them one by one and execute them.

Conclusion

This (Below) is where we are heading. And by starting with Dask we have come quite far.

Scientific Computing in Python

Time to light a cigar…

Where Can I Learn More?

About us

This is our Website http://automatski.com

NirvanaThroughKarma

The transcendent state in which there is neither suffering, desire, nor sense of self, and the subject is released from the effects of karma and the cycle of death and rebirth.

Aditya Yadav

Written by

Millennium Inventor

NirvanaThroughKarma

The transcendent state in which there is neither suffering, desire, nor sense of self, and the subject is released from the effects of karma and the cycle of death and rebirth.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade