rbahaguejrBuilding data products without the MeshIn 2019, distributed data mesh has been offered as a route for companies to leverage data at scale. The next generation enterprise data…Feb 26, 2022Feb 26, 2022
rbahaguejrServe Data Models with MLFlow in ProductionServe data models using MLFlowFeb 14, 2019Feb 14, 2019
rbahaguejrData Science for the 99% CoursesI’m developing an offline course for local activists for data science applications in the context of campaigns on social issues.Apr 13, 2017Apr 13, 2017
rbahaguejrThreaded Tasks in PySpark JobsThere are circumstances when tasks (Spark action, e.g. save, count, etc) in a PySpark job can be spawned on separate threads. Doing so…Apr 6, 20174Apr 6, 20174
rbahaguejrSpark in a Box: Making Apache Spark Accessible to Small Enterprises and Academic InstitutionsApache Spark is a powerful tool for data processing. However, it is becoming a restrictive tool available only to Big Enterprises…Mar 8, 2017Mar 8, 2017
rbahaguejrWord cloud on PythonLast year our Vice President resigned from the Cabinet. She was appointed “Housing Czar” but the President is not able to trust her.Feb 15, 2017Feb 15, 2017
rbahaguejrWindow Function on PySparkHere’s how to get the least value of col5 for a group:Feb 8, 2017Feb 8, 2017
rbahaguejrAdding Python Files to PySpark JobThere are varying suggestions on how to do this on SO. However, the pointers are creating more frustrations even for us familiar with…Jan 26, 20177Jan 26, 20177
rbahaguejrSupport the Relief and Rehabilitation Efforts in Catanduanes, Philippines Due to Typhoon Nina…My first post for the year is a call for support on the on-going rehabilitation efforts in areas devastated by Typhoon Nock-Ten/Nina.Jan 7, 2017Jan 7, 2017
rbahaguejrHow-to Declutter Your Data Science WorkspaceWorking on a data science project is almost always equivalent to an amazing clutter in the working directory. Data scientists would most…Nov 11, 2016Nov 11, 2016