Homepage
Sign in / Sign up
Go to the profile of Sander Stepanov
Sander Stepanov
Aug 7, 2016
Large scale matrix multiplication with pyspark (or — how to match two large datasets of company…
Ran Tavory
203

now understand why they wrote this

If you are looking to manage a terabyte or less of tabular CSV or JSON data then you should forget both Spark and Dask and use Postgres or MongoDB.

http://dask.pydata.org/en/latest/spark.html

  • Go to the profile of Sander Stepanov

    Sander Stepanov

    • Share
    Go to the profile of Sander Stepanov
    Never miss a story from Sander Stepanov, when you sign up for Medium. Learn more
    Never miss a story from Sander Stepanov