Matt CollinsinTowards Data ScienceMethods for generating synthetic descriptive dataUse various data source types to quickly generate text data for artificial datasets.7 min read·Jan 4, 2024--2--2
Matt CollinsinTowards Data ScienceCreate Many-To-One relationships Between Columns in a Synthetic Table with PySpark UDFsLeverage some simple equations to generate related columns in test tables.7 min read·Dec 9, 2023--1--1
Matt CollinsinTowards Data ScienceParallelising Python on Spark: Options for concurrency with PandasLeverage the benefits of Spark when working with Pandas8 min read·Nov 18, 2023--1--1
Matt CollinsThree ways to profile data with Azure DatabricksGet a feel for your data quality and shape quickly with data profiling6 min read·Nov 16, 2023----
Matt CollinsClassification Model Serving bug on Databricks cluster runtime 12.2 LTS MLDetails on the errors and workarounds6 min read·May 15, 2023----
Matt CollinsMastering MLOps: A 6 month learning plan with MLflowA structured learning path for MLOps5 min read·Apr 12, 2023----
Matt CollinsinTowards Data ScienceAutomate ML model retraining and deployment with MLflow in DatabricksEfficiently manage and deploy production models with MLflow8 min read·Mar 15, 2023--1--1
Matt CollinsinTowards Data Science5 Quick Tips to Improve Your MLflow Model ExperimentationUse the MLflow python API to drive better model development7 min read·Mar 13, 2023----