An intro to Cloud Computing for Data Scientists and Data Engineers

Nishant Shah
Nov 21 · 7 min read

There has been a lot of debate about what the cloud is. Many people think of the cloud as a collection of technologies. It’s true that there is a set of common technologies that typically make up a cloud environment, but these technologies are not the essence of the cloud. The cloud is actually a service or group of services. This is partially the reason that the cloud has been so hard to define. But let me make it simple and straightforward for you.

Image for post
Image for post

Cloud Computing Definition

Cloud computing is the delivery of technology services-including compute,storage,databases,networking,software,and many more-over the internet with pay-as-you-go pricing.

Cloud computing mainly makes it possible for companies to get their applications deployed faster, without the need for excessive maintenance, which is managed by the service provider. This also leads to better use of computing resources, as per the needs and requirements of a business from time to time.

The importance of the cloud and cloud data services

Image for post
Image for post

The demand of cloud has been increased over the period of time and also in the past five years, a shift in Cloud Vendor offerings has fundamentally changed how companies buy, deploy and run big data systems. Cloud Vendors have absorbed more back-end data storage and transformation technologies into their core offerings and are now highlighting their data pipeline, analysis, and modeling tools. This is great news for companies deploying, migrating, or upgrading big data systems. Companies can now focus on generating value from data and Machine Learning (ML), rather than building teams to support hardware, infrastructure, and application deployment/monitoring.

Why the Data Scientist and Data Engineer Need to Understand the Cloud and it’s data services?

More and more application workloads are moving to the different cloud platforms. This could be a move to a public, private or hybrid cloud (where the latter is a mixture of public and private). Big data and analytics application workloads are on the move too. It is important that the data science engineering community has a good understanding of these clouds at a deeper level so as to make the best use of them for doing their analytics work more effectively.

Data scientists and data engineers have been accustomed to running their data processing and analysis work on a bare metal or physical environment up to now. But with the recent rapid growth in cloud infrastructure, these folks need to understand the new virtualized infrastructure within their clouds, as it is now underlying and controlling their workloads.

Image for post
Image for post
Source : kdnuggets

While the Internet is full of terms related to the cloud, here are some pretty basic, but important ones, that one should definitely have some knowledge about. Knowing these key terms will help you understand industry developments and future trends in cloud computing.

Let us have a look and understand the basics.

1. XaaS (Anything-as-a-Service)

This is a generic term which refers to any service which is available as cloud enabled service through internet. Some time it is also called ‘everything-as-a-service’. It includes SaaS, DaaS, PaaS and IaaS etc.

2. Software-as-a-Service (SaaS)

SaaS comprises of software applications, which are run on distantly located computers that happens to be owned, as well as operated by others. A good example of such an application would be Google Docs, which is an online word processor based on cloud environment.

SaaS offers several key benefits, such as instant access and usage of applications, accessibility from any machine that is connected, and also that there is no likely loss of data, as it is stored in the cloud.

3. Platform-as-a-Service (PaaS)

PaaS is mainly a cloud-based environment that offers everything that is required to support the building and deployment of cloud-based applications. This is possible without the developer of the application having to purchase hardware, software, management and even hosting.

The primary benefits obtained from PaaS are that applications may be deployed really fast, without worrying about the platform. Also, these service models largely save costs and abstract the underlying intricacies.

4. Infrastructure-as-a-Service (IaaS)

Infrastructure as a Service, or IaaS, provides basic infrastructure services to customers. These services may include physical machines, virtual machines, networking, storage, or some combination of these. You are then able to build whatever you need on top of the managed infrastructure. IaaS implementations are used to replace internally managed datacenters. They allow organizations more flexibility but at a reduced cost.

Let’s take a Car Analogy to understand the cloud service models.

Image for post
Image for post
Understanding cloud service models using car analogy
Image for post
Image for post

Compare the above image with the below one for better understanding

Image for post
Image for post

5. Public Cloud

When most people think about cloud computing, they are thinking of the public cloud service model. In the public service model, all the systems and resources that provide the service are housed at an external service provider. That service provider is responsible for the management and administration of the systems that are used to provide the service. The client is only responsible for any software or client application that is installed on the end-user system. Connections to public cloud providers are usually made through the Internet.

6. Private Cloud

In a private cloud, the systems and resources that provide the service are located internal to the company or organization that uses them. That organization is responsible for the management and administration of the systems that are used to provide the service. In addition, the organization is also responsible for any software or client application that is installed on the end-user system. Private clouds are usually accessed through the local LAN or wide area network (WAN). In the case of remote users, the access will generally be provided through the Internet or occasionally through the use of a virtual private network (VPN).

7. Hybrid Cloud

The term hybrid cloud implies the usage of a private cloud infrastructure, along with the use of cloud services that are public in nature. Truth be told, a private cloud cannot really exist solely by itself. Most businesses, which have a private cloud setup, end up accessing public cloud resources for various day-to-day tasks. This gives birth to the term hybrid cloud.

Image for post
Image for post

Cloud Services :

Compute: provide the brains to process your workload

Storage: save and store data

Databases: store more structured sets of data

Cloud Computing Characteristics :

1.Virtualization- Fundamental technology that powers cloud computing .

Virtualization is at the core of all modern cloud environments — it is the cloud infrastructure shown below. The unit that provides the flexibility, elasticity, ease of management and scaling in any cloud is the virtual machine — essentially through the hardware independence and portability that virtual machines offer.

2. Cost- Only pay for resources when you are using them

Pay-as-you-go

No capital expenses of : Buying hardware and software

Managing on-site infrastructure

In some cases,a non-premise solution might be more cost-efficient.The best solution depends on the use case.

3. Reliability- Building reliability into your environment can be very costly. It usually involves having multiple systems or even multiple datacenter locations. You have to do disaster recovery (DR) and continuity planning and simulations. Many cloud providers already have multiple locations set up, so if you use their services, you can instantly add reliability to your environment. You may have to request to have your service use multiple locations, but at least it’s an option.

4. Speed- Immediate access to ready-to-go cloud resources

On-demand resourcing

Fast set-up time

Deploy services in a matter of minutes

5. Performance- Performance in cloud systems is constantly being measured and monitored. If performance falls below a certain level, the systems can automatically adjust to provide more capacity, if that is what’s needed. The presence of a service-level agreement (SLA) is also a benefit. An SLA guarantees a certain level of performance. If that level is not met, the service provider must generally meet some level of restitution. This restitution is often in the form of a chargeback or a fee reduction. So, although performance itself is not assured, there can be an assurance that the cost of a lack of performance can be mitigated.

6. Scalability- Easily add and remove resources as you need them

Example:e-commerce site

Needs more resources during peak times

Scale resources as necessary

7. Agility- Cloud environments can offer great agility. You can easily re appropriate resources when needed. This allows you to add resources to systems that need them and take them away from systems that don’t. You can also easily add systems to expand your capacity. Internal cloud environments allow you to make better use of your internal infrastructure resources. A cloud infrastructure that uses virtualization can help you increase your density and the percentage of utilization from your infrastructure. As a result, you will be less likely to have systems sitting idle.

8.Security-Secure storage and management of your data

External party responsible for security

Particularly risky for businesses in highly regulated sectors

Cloud is becoming more and more secure

In some cases,a non-premise solution might be preferred.The best solution depends on the use case.

If you found this article useful give it a clap and share it with others.

Happy Learning

Thank You

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Nishant Shah

Written by

A very keen ambivert Data Science and Machine Learning Enthusiast.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Nishant Shah

Written by

A very keen ambivert Data Science and Machine Learning Enthusiast.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store