didier deshommesUpserting rows in postgresql 9.5 with pySparkAlthough the current Postgres JDBC data source allows SELECT and INSERT operations with Spark, it doesn’t allow for upserts. Since Postgres…Dec 15, 2017Dec 15, 2017
didier deshommesWriting millions of small files to S3 using SparkThis is somewhat the inverse of reading of millions of tiny files on S3. Both tasks share the same main characteristic: this is a…Sep 15, 2017Sep 15, 2017
didier deshommesSetting and accessing enviroment variables in Spark jobs on EMROur use case is: we want to access a custom environment variable in our Spark job that we set previously. This use case seemed pretty…Aug 25, 20171Aug 25, 20171
didier deshommesHow does Spark’s `wholeTextFiles()` work?Spark’s wholeTextFilesis a pretty painless way to read many small files. It returns an RDD of pair values, where the key is the path of…Aug 9, 20171Aug 9, 20171
didier deshommesMore Spark on EMR tipsUse EMR’s Spark maximizeResourceAllocation wiselyAug 4, 2017Aug 4, 2017
didier deshommesWriting a map-reduce job to concatenate a millions of small documentsRunning hadoop jobs on small files is usually discouraged but sometimes you have no choice. Sometimes you even have million of files you…Jul 28, 2017Jul 28, 2017
didier deshommesAnsible First ImpressionsWhen it comes to server provisioning, I’ve only had experience with Chef. The new place I work in uses ansible and I was tasked with…May 5, 2017May 5, 2017
didier deshommesBetter test coverage workflow for Cython modulesThis is simply a nicer way to include test coverage data in Python modules built with Cython. You can look through this excellent link…Mar 2, 20172Mar 2, 20172
didier deshommesSpark on AWS EMR configuration tipsThese tips are mainly for ephemeral Spark clusters. Things may change for long-running Spark clusters.Dec 19, 2016Dec 19, 2016
didier deshommesKafka 0.10.0 support on PythonIt seems like Kafka releases have picked up recently. Version 0.9 was released about 6 months ago, and 0.10 was released about a month ago…Jul 29, 2016Jul 29, 2016