Bogdan Cojocar – Medium

Bogdan Cojocar

Bogdan Cojocar

How to read data from s3 using PySpark and IAM roles

In this tutorial we will go over the steps to read data from S3 using an IAM role in AWS.

Nov 7, 2022

Nov 7, 2022

Bogdan Cojocar

PySpark integration with the native python package of XGBoost

In this tutorial we will highlight how to use the latest XGBoost library version 1.7.0 that works natively with PySpark

Oct 21, 2022

PySpark integration with the native python package of XGBoost

Oct 21, 2022

Bogdan Cojocar

How to read data from AWS S3 and Athena in pandas with column validation

This is a step by step tutorial on reading data from AWS S3 and Athena into a pandas DataFrame and doing column validation to assess the…

Oct 5, 2022

Oct 5, 2022

Bogdan Cojocar

PySpark ML and XGBoost setup using a docker image

I this tutorial we will build and test a docker image where we will be able to run a jupyter notebook with xgboost fully integrated.

Oct 3, 2022

Oct 3, 2022

Bogdan Cojocar

Predicting similar political donors for UK parties using graph data

In this tutorial we will train a ML graph algorithm that will find similar likely political donors based on their UK companies donations to…

Sep 16, 2022

Predicting similar political donors for UK parties using graph data

Sep 16, 2022

Bogdan Cojocar
in
Towards Data Science

Building a Health Entity labelling service using Azure Kubernetes Service, Seldon Core and Azure…

In this tutorial we will build an inference service entirely in Kubernetes in the Azure ecosystem

Jun 16, 2022

Building a Health Entity labelling service using Azure Kubernetes Service, Seldon Core and Azure…

Jun 16, 2022

Bogdan Cojocar
in
Towards Data Science

Building a Serverless Azure ML Service Using Cognitive and CDKTF

In this tutorial we will go over using cloud services such as Azure Functions and Cognitive to build a sentiment analysis service

May 26, 2022

Building a Serverless Azure ML Service Using Cognitive and CDKTF

May 26, 2022

Bogdan Cojocar
in
Towards Data Science

Building a Credit Card Fraud Detection Online Training Pipeline with River ML and Apache Flink

In this tutorial, we will go over writing real time python Apache Flink applications to train an online model

Apr 30, 2022

Building a Credit Card Fraud Detection Online Training Pipeline with River ML and Apache Flink

Apr 30, 2022

Bogdan Cojocar

How to read parquet data from S3 using the S3A protocol and temporary credentials in PySpark

When we access AWS, sometimes, for security reasons, we might need to use temporary credentials, using AWS STS instead of the same AWS…

Jul 21, 2020

Jul 21, 2020

Bogdan Cojocar
in
Towards Data Science

How to run a PySpark job in Kubernetes (AWS EKS)

A complete tutorial on deploying an EKS cluster with Terraform and running a PySpark job using the Spark Operator

Jul 16, 2020

How to run a PySpark job in Kubernetes (AWS EKS)

Jul 16, 2020

Bogdan Cojocar

Bogdan Cojocar

Big data consultant. I write about the wonderful world of data.

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams