MySQL, ETL and Jupyter Notebooks Made Simple

Victor Martin
Nerd For Tech
Published in
4 min readMay 6, 2021
MySQL, Data Science and Data Integration Workshop

As a professional Data Scientist, your Jupyter Notebook is just part of the whole picture. Implementing your ML models is a process as good as the data you use to train and improve your algorithms. Plus some brainpower, experience and intuition.

In a real-world scenario, data lives in many places. Luckily, there are many tools to help you to extract, transform and load that data.

Data Integration might not look like an exciting concept, mainly because you probably tend to think about boring legacy connectors with various repositories and databases. This article wants to cover how exciting Data Integration can be: object storage CSV files, MySQL databases (on-prem and in the cloud) to put your data in Jupyter Notebook. Let’s dive in.

The set of technologies we are describing are part of the Oracle Cloud Free Tier. They include services in Oracle Cloud Infrastructure (OCI) like:

  • OCI Object Storage
  • OCI Data Integration
  • OCI MySQL Database Service
  • OCI Data Science
OCI Services working together to form your Data Science Pipeline.

Do you have an Oracle Cloud Account? If not, you can create one here:

Remember, Oracle Cloud does not charge you anything unless you explicitly request the upgrade to pay as you go. No problem.

Oracle Cloud Sign-up for free

OCI Data Integration

Data Integration Degisner

Features:

  • Cloud-Native
  • Serverless (nothing to administrate or manage)
  • Graphical Interface
  • Native Integration with MySQL
  • Interactive Data Preparation
  • Powered by Spark-ETL (or E-LT SQL Push-Down)

You want to move data from Object Storage (a CSV file with one of your datasets). Technically, it can be a database connection, on-prem and in the cloud. You can aggregate information from different sources.

You will represent sources of data and target as Data Assets where you can parametrize connection details.

The next step is to create a Data Flow that is the graphical definition of the ETL process.

Finally, you will wrap this process in an Integration Task that you can publish and run. You can also build complex Pipelines by aggregating Integration Tasks.

Something like this:

OCI Data Science

Data Science example.
  • Fully managed service.
  • Common Python ML libraries: Jupyter Notebooks.
  • Train and Manage Machine Learning Models.
  • Preconfigured environment with an NVIDIA GPU and CUDA.
  • Integration with other OCI services.

MySQL Database Service

MySQL Service

The MySQL Database Service is the only database service 100% developed, managed, and supported by the MySQL team. MySQL Database Service makes it easy for organizations to deploy cloud-native applications using the world’s most popular open-source database. It delivers significant savings over on-premises database management and “overforked” versions from third party Cloud platforms.

Start the Workshop to Learn More

If you want to learn and have fun with a fish survey dataset in Object Storage, integrate the dataset with Data Integration into MySQL DB System and create a Jupyter Notebook with that data: you can follow the detailed step-by-step guide on my GitHub project:

MySQL, Data Integration and Data Science Workshop

Want to learn more?

There are periodic free training sessions, around 1 hour long, delivered by experts and Oracle Cloud Advocates. These are instructor-led training. You can follow along with our team to solve your questions on the spot.

Join me on Oracle Cloud Infrastructure Discord Channel for any question.

And keep tuned for more articles about the amazing things you can build with Oracle Cloud.

I am Victor Martin, a Software Developer. I deploy on Oracle Cloud Infrastructure.

Feel free to get connected with me on LinkedIn.

I am also interested in Scuba diving and space engineering. Happy to help, everything is easier than rocket science!

--

--

Nerd For Tech
Nerd For Tech

Published in Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Victor Martin
Victor Martin

Written by Victor Martin

Principal Cloud Engineer. All opinions are my own. @OracleCloudInfrastructure