Big Data and Machine Learning Fundamentals using GCP for Data Professionals.

DataMount
HackerNoon.com
7 min readMay 18, 2019

--

You Will get a brief overview of GCP and tools that power machine learning and Big data in GCP.

So What Is GCP?

GCP stands for Googles cloud platform ….means in a simple way Googles offering of could solutions . Now I hope that you have understanding about what is cloud. If you ever working in AN organizations that have there own data center you have a bit idea about cloud . It simply means giving access to do what you wanna do with a machine but restricts your physical level access. So how you as an organisation get benefit from cloud well as you don’t have any physical access to it so all the physical barrier regarding a machine (hardware,networking) dropped instantly.

so what are the products offered by GCP for data process

Now lets start by computer on cloud …..a physical computer on a high level consists of compute engine and storage option and networking. A cloud computer is also nothing but replica of it with physical abandatation.

The design goal of google cloud is based on No-ops policy means minimum to no operation regarding machine required for example suppose you want to create a vm for your work then you should go to the cloud dashboard click on compute engine a screen like this will come

click on create and you will prompt to a page like this

fill the form (kidding)/requirements of yours check bill side by side to adjust your budget and click on create. once clicked you will get a page like this

click on ssh access you will get a window like this

And that’s it your new vm in front of you with a few click.

To check your machine details type cat/proc/cpuinfo and you you will get result like this

Now as your compute problem solved our next problem is storage don’t get tense we have multiple option for different file types but as a staging ground you can store everything on cloud storage.

so why not to store data on persistent disk of the compute engine well the answer is the disk is attached with the compute engine so if the engine goes away so the data was but lets say you need the compute engine for one project only but the data for most of the other projects so will you upload the data every time no you will use cloud storage as primary solution and use it for other services .

Now how to transfer files from local to cloud storage , to store data on cloud you need to provide a bucket (a container that holds your data) for that you will use gsutil command line tool like to copy files from local to cloud you will type gsutil cp local_files_address cloud_bucket_address.

you can think bucket as domain name needs to be unique , general rule is to use some sort of name related to your organisation.

You can use ls to list files in a hierarchical way . mv to move files between buckets mb to make a bucket , rm to remove basically all unix based commands.To understand about cloud storage lets first ingest data into our vm by typing following command

what we are doing we are removing any file name earthquakes.csv from our persistent disk and redownloading earthquakes.csv again.You can make this two as a bash file with .sh extension.To run bash file hit bash file_name.sh command. To check that you have actually downloaded the data use hierarchical file view command ls and you will see

Now to work with transformation we will transformation and manipulation in a python way , you can install packages in your computer(vm instance )by downloading it manually or put it in a bash file and run the script. the python file to transform input file (earthquake.csv) looks like this

If you understand the code you will notice the end product is a png file and if you don’t understand it like me just run it . so run it in terminal

now if you type ls you will see file like this

Now as we say earlier we need this files in our cloud storage as for that we need bucket .go to cloud console webUI search in storage click on bucket remember it needs to be unique globally (use projectID + some random crap )

Now go to console type gsutil cp file_name cloud _storage_address to copy files .

Your bucket should look like this

Now any one in world with same url can access your work earthquakes.png

Now see some of the storage solutions provided by Google cloud

cloud SQL is

This reasons are the bottleneck between you and your marketplace mysql like if you like to install mysql in your vm just like cloud launcher you will miss all this benefits Google provides.

So lets work with some cloud SQL features

first get your project id from console UI or type gcloud config project list on shell

Now as cloud Sql is an instance of MySQL or postgreSql we will write a sql command to generate tables. again ist you need a bucket to store your data primarily on cloud .

create a bucket again, transfer your files from your vm to cloud storage by gsutil cp or mv command , you can see files in your bucket like this

Now create a sql instance and go to it you will see an overview like this

Now import sql statement to make tables and views

Now import data into the table

Now open your database connection

Now you can run your sql jobs easily on cloud sql

In next posts we will discover hadoop frameworks and other ml stuffs in cloud stay tuned, you can follow me on medium.

If you like the post and want your colleagues or friends to learn the same hit the like button share it on linkedin ,facebook, twitter let’s grow together for a bias free machine-human compiled future.

--

--

DataMount
HackerNoon.com

Introducing "DataMount": Your Premier Destination for Data Science and Data Engineering Training and Project Content on YouTube, Medium and Facebook