Sander StepanovBlockedUnblockFollowFollowingAug 7, 2016Large scale matrix multiplication with pyspark (or — how to match two large datasets of company…Ran Tavory203now understand why they wrote thisIf you are looking to manage a terabyte or less of tabular CSV or JSON data then you should forget both Spark and Dask and use Postgres or MongoDB.http://dask.pydata.org/en/latest/spark.html