Tanner Beam

Tanner Beam

Databricks

7 stories

Tanner Beam

Tanner Beam

Spark

8 stories

Relative difference between Dask vs. PySpark on 10 TB scale running on a cluster with about 5 TB of memory and 1280 CPUs. Orange represents queries where Dask is faster, blue where PySpark is faster, and grey where PySpark failed.
Tanner Beam

Tanner Beam

Python

8 stories

Relative difference between Dask vs. PySpark on 10 TB scale running on a cluster with about 5 TB of memory and 1280 CPUs. Orange represents queries where Dask is faster, blue where PySpark is faster, and grey where PySpark failed.
Tanner Beam

Tanner Beam

Snowflake

1 story

Tanner Beam

Tanner Beam

Data Quality

5 stories

Five aspects of data quality — semantic correctness, consistent, complete/unique, well-formed, timely
Tanner Beam

Tanner Beam

Great Expectations

18 stories

A terminal with code and a data contract being validated.
Tanner Beam

Tanner Beam

Tools

4 stories

Tanner Beam

Tanner Beam

Predictive Modeling

7 stories

Imbalance distribution of labels in dataset