Tips for large, medium, or small businesses

Image for post
Image for post
Photo by Adeolu Eletu on Unsplash

There is a lot of talk about how data, data science, and machine learning can all be applied to help make critical business decisions.

As our team works with various companies across the U.S. and in various industries, we have had a lot of opportunities to do more than just talk. We have helped many companies take their data and turn it into valuable decisions.

Not every data project has required complex models or machine learning. …


Managing The Data Life Cycle

Image for post
Image for post

Photo by Campaign Creators on Unsplash

Information.

Data.

It’s always been the key to making good decisions. Whether you are a commander on a battlefield in 1000 BC or a CEO in a boardroom in 2000.

Having critical information at the right time can make help make million and billion dollar decisions.

But, just having the data is not enough.

I could give you random stock prices, the temperature it will be for the next 3 weeks in Vegas and the number of kids being born tomorrow. …


What’s the difference?

Image for post
Image for post
Photo by Yancy Min on Unsplash

In this tech-centric era, companies aim to extract maximum business value from their data in order to stay relevant and efficient via data-driven business decisions. In order to do so, they rely on Artificial Intelligence and Machine Learning. In fact, research shows that artificial intelligence can increase business productivity by 40%! Therefore, it is essential to implement the best tech practices and make the jobs of tech workers easier.

In the year 2020, many new ML and AI tools have been introduced to track data models and control data sets, but the work is still challenging. To optimize the complete ML production lifecycle, we need to bring in automation for the process between modeling and production. …


Please, don’t average averages

Image for post
Image for post
Photo by Jonathan Francisca on Unsplash

SQL has become a common skill requirement across industries and job profiles over the last decade.

Companies like Amazon and Google will often demand that their data analysts, data scientists and product managers are at least be familiar with SQL. This is because SQL remains the language of data. So, in order to be data-driven, people need to know how to access and analyze data.

With so many people looking at, slicing, manipulating, and analyzing data we wanted to provide some tips to help improve your SQL.

These tips and tricks we have picked up along the way while writing SQL. Some of them are do’s and don’ts others are just best practices. Overall we hope that they will help bring your SQL to the next level. …


Distributed data processing, collaborative SQL, and open-source

Image for post
Image for post
Photo by Scott Graham on Unsplash.

SQL is one of the most in-demand technical skills in the workplace today. Developed back in the 1970s, it is still the way we interface most of our data systems today. Regardless of what drag-and-drop tools come around or what new query paradigms try to overtake it, it has remained.

Many modern database technologies we will talk about today are constantly having to comply with SQL policies vs. SQL needs to be updated.

However, this isn’t to say the SQL landscape hasn’t changed a lot in the past few decades and doesn’t continue to evolve. …


Building more maintainable, readable and optimized data workflows

Image for post
Image for post
Photo by Alexandru Acea on Unsplash

SQL remains the language for data. Developed back in the 1970s, it’s one of the few technologies that has remained constant. Regardless of what drag and drop tools come around or what new query paradigms try to overtake it.

SQL remains the most widely used technologies to interact with data. With the advent of databases that utilize NOSQL or (Not Only SQL), layers like Presto and Hive have been developed on top, to provide a friendly SQL interact.

Not only that, but the use of SQL has far expanded beyond data engineers and analysts. Product managers, analytical partners, and software engineers at large tech companies all use SQL to access data and answer questions quickly. …


Using the Google Sheets API to load data into MySQL

Image for post
Image for post
Photo by Daniele Levis Pelusi on Unsplash.

Much like in our recent article about automating emails with Python, today we want to discuss a common task that is easy to automate.

We’ll outline how you can write a Python script that can export data from Google Sheets and insert it into MySQL.

When would you use this?

One way this could be useful is when you create some sort of form with Google Forms and you want to load new data automatically every night. …


Hiring a data science guide

Image for post
Image for post
Photo by Free To Use Sounds on Unsplash

Hiring the right data scientist, or any data professional, remains a difficult task for HR.

There are so many different skill sets required to take data from application systems and put them into analytical ones. From there, it takes a whole set of other skills to create machine learning and statistical models.

The problem is not purely just finding the right data scientist, but defining what your team needs.

To help simplify this process, our team has put together a few pointers as well as interview questions to help those out there looking to hire new data scientists.

Let’s first start by defining some of the skill sets you will often need when looking for a data scientist. …


What’s the difference?

Image for post
Image for post
Photo by Mr Cup / Fabien Barral on Unsplash

With technology changing rapidly, more and more data is being generated all the time.

It’s estimated that the amount of data generated in the entire world will grow to 175 zettabytes by 2025, according to the most recent Global Datasphere.

Companies now require improved software to manage these massive amounts of data. They‘re’ constantly looking for ways to process and store data, and distribute it across different servers so that they can make use of it.

In this article, we’ll discuss a specific family of data management tools that often get confused and used interchangeably when discussed. …


Use the Gmail API to send scraped data to your email

Image for post
Image for post
Photo by Webaroo on Unsplash

Automating daily tasks with Python is easy. With a combination of APIs and easy-to-understand libraries, you can easily set up systems that scrape websites, send emails, and manage data and analytics.

One very common task you’ll need to automate in the corporate world is the ability to scrape public government data. This usually comes from sites like data.gov and other end points to get information on healthcare, trade, transportation, legal, and so much more.

There are actually a lot of government agency sites that are somewhat hidden that are still producing a lot of valuable data that billion dollar companies rely on to make million-dollar decisions. …


Whether you’re a programmer, data scientist or machine learning engineer

Image for post
Image for post
Photo by 2Photo Pots on Unsplash

Medium is flooded with thousands of developers writing articles on data science, automation, programming, and machine learning.

There are hundreds of intro to Python, Firebase, SQL, and AWS posts.

With all these writers and articles, it can be really easy to miss some interesting writers and technology experts who love their craft and frankly might not get enough exposure.

So I wanted to put together a list of a few writers I enjoy reading for various reasons. Some are much more technical and really go in-depth about specific subjects, others just have interesting experiences.

Perhaps you will enjoy them too!


5 Of My Favorite Medium Technology Authors


Randy Au

Randy Au is a Quant UX Researcher at Google who writes in-depth articles about query optimization and data science. …


Pulling live data from San Francisco’s 311 feed

Image for post
Image for post
Photo by Dennis Kummer on Unsplash.

Over the past few weeks, we have discussed several important topics in the world of data engineering and automation.

We have laid the groundwork for understanding the vocabulary and basic concepts a data engineer uses. Now it is time to start building your first set of batch jobs.

To do so, we will be using the Apache Airflow library to help automate our work.

Also, for our live data source, we will be using the sfgov 311 dataset. …


A Free Webinar

Image for post
Image for post
Photo by Simon Matzinger on Unsplash

Are you looking to learn more about Tableau and how you can use it to concisely tell your data story?

Our team is putting on a webinar where we will be walking through the development of a Tableau Dashboard from start to finish.

We will show you different styles of charts, best-practices and leave you with a completely finished Tableau dashboard that you can use on future projects.

We believe that data visualization tools like Tableau are not just about flashy charts and lots of drop-downs.

They are instead communication tools we can use to help guide your end-users through your analysis. Leaving them with a deeper understanding as well as a clear plan for where action needs to occur. …


A Free Webinar To Help Your Team Utilize Your Data

Image for post
Image for post
Photo by Tobias Fischer on Unsplash

About this Event

Whether you are a large billion dollar corporation or a small business, gaining insight from your data can provide a huge competitive advantage.

Access to data allows your team the ability to find pain-points faster and track the impact changes and improvements have on your business.

However, in order for any business to fully realize the opportunities data provides you need to properly plan and build out a baseline data infrastructure and strategy.

This means having a solid data analytics process, data storage system and overall a general strategy for how your team will approach data.

Otherwise, your team might invest hundreds if not thousands of hours going in the wrong direction. …


Explaining The Importance Of Virtual Private Clouds, Elastic Compute and More

Image for post
Image for post
Photo by Laura Vinck on Unsplash

Cloud computing has gone from a technology limited to big billion-dollar corporations to one that is accessible to anyone. Most developers probably have forgotten about an EC2 instance running an AWS occasionally because, at the end of the day, it might only be 10$ a month to keep a Linux box going.

This accessibility provides an opportunity for small and medium-size businesses that want to have access to the same level of technology as larger corporations have for a fraction of the cost.

These technologies can help your business scale and take advantage of technologies that used to require multiple system admins and developers to set up. …


Using the command line

Image for post
Image for post
Photo by Chaitanya Maheshwari on Unsplash

As a developer, you will often need to create APIs to interact with various systems and integrations. Traditionally, this required a lot of work as far as developing the infrastructure and deploying the code on either an on-premise or cloud server like EC2.

However, this method is slow and costly.

Thus, the concept of serverless computing has gained popularity in the past few years.

An example of a serverless service is AWS Lambda. Lambda works similarly to an API end-point in the sense that you can call a single function as long as you have access to said function.

This service executes code only when it is needed, and scales the requests per day as per requirement automatically. It comes in handy when combined with API gateways to develop one optimal solution. …


For Companies Looking To Improve Or Start Their Data Analytics Strategy

Image for post
Image for post
Photo by William Iven on Unsplash

There are plenty of cliches about data and its likeness to oil or companies being data driven. Is there truth to all this hype about data strategy, predictive modeling, data visualization and machine learning?

In our experience, these cliches are true. In the past few years, we have already helped several small and medium-sized businesses take their data and develop new products, gain invaluable insights and create new opportunities for their businesses that they didn’t have before.

Many small and medium-sized businesses are starting to take advantage of the ease of access to cloud computing technologies such as AWS that allow your teams to perform data analysis easier and anywhere using the same technology billion-dollar corporations use at a fraction of the cost. …


Are all data streaming services made equal?

Image for post
Image for post
Photo by Levi Jones on Unsplash

A vital part of the successful completion of any project is the selection of the right tools for performing essential basic functions. For developers, the availability of several messaging services to pick from always poses a challenge.

One crucial question unanswered is the use of Apache Kafka or RabbitMQ services. Both platforms feature several functionalities and use cases that can help users make an informed decision.

Apache Kafka and RabbitMQ are two top platforms in the area of messaging services. Although both platforms handle messaging differently, the difference lies in their selected architecture, design, and approach to delivery.


But What Are Apache Kafka and RabbitMQ?

Apache Kafka and RabbitMQ are open-source platforms that are utilized for streaming data as well as come equipped with pub/sub (which we will describe later) systems that are commercial — supported and used by several enterprises. …


From big data to cloud computing

Image for post
Image for post
Photo by Hope House Press — Leather Diary Studio on Unsplash

According to a recent survey by Statista, the data market is expected to grow 175 zettabytes in volume by the year 2025. In order to dive in further into big data, we need to understand how one can work on it — and that’s where data engineering comes in.

Simply put, data engineering is all about managing big datasets and extracting information from them efficiently and accurately.


What’s a Data Engineer?

A data engineer is someone who’s responsible for maintaining and building the architecture of data in a DS project.

Some of the main roles and responsibilities of a data engineer include having to ensure an uninterrupted flow of data between server and application, integrating new data management software, improving data foundational processes, and building data pipelines. …


Going over IaaS, SaaS, PaaS, and FaaS

Image for post
Image for post
Photo by Uwe Hensel on Unsplash

In the 20th century, companies relied on servers and computers that were on the premises.

This meant when new servers had to be spun up, it could take weeks or even months to get everything set up. From getting the budget approved, to putting out orders, to having servers shipped and then installed — it was a long and arduous process.

But times have changed and the concept of companies having millions of dollars of unused servers on-site has been replaced by cloud computing services.

Simply put, cloud computing is a remote service that takes the form of infrastructure, software, storage, platforms, and a host of others. …


In Airflow and Luigi

Image for post
Image for post
Photo by Mike Benna on Unsplash

One of the main roles of a data engineer can be summed up as getting data from point A to point B.

We often need to pull data out of one system and insert it into another. This could be for various purposes. This includes analytics, integrations, and machine learning.

But in order to get that data moving, we need to use what are known as ETLs/Data pipelines.

These are processes that pipe data from one data system to another.

One question we need to answer as data engineers is how often do we need this data to be updated. This is where the question about batch vs. …


How does data get to the data scientists and machine learning engineers anyway?

Image for post
Image for post
Photo by Science in HD on Unsplash

Unlike software engineering, there aren’t a lot of college courses in data engineering. Nor a huge list of boot camps teaching the practice.

This leads to best practices often being learned on the job, as well as a cornucopia of technologies being used across teams and companies.

So, what defines a data engineer?

Especially as this role evolves away from its roots in database management and BI, it really has changed and expects much more than it used to. Data engineers used to be able to get away with just knowing basic data warehousing, ETLs, and data visualization.

However, in recent years, the demand for understanding distributed computing, DevOps, data ops, and implementing machine learning models has challenged that notion. …


Getting A Hold Of Your Data, Models, and Dashboards

Image for post
Image for post
Photo by Marius Masalar on Unsplash

Imagine what your organization would accomplish if it had accurate, detailed information regarding the processes, products, market, and customers.

Well, this is the age of big data, and it demands organizations to bring data engineers and scientists on the same page to construct efficient and accurate insights to gain a competitive edge. A relevant solution — data operations — was coined back a few decades back but the last 5 years have been proven to bring significance understanding and working in the field.

DataOps, or data operations, is an emerging discipline in the data science field which brings data scientists and engineers together to provide organizational structures, processes, and tools for a data-focused organization. …


Managing big data no longer means just buying bigger and faster servers

Image for post
Image for post
Photo by Nathan John on Unsplash

It now also means needing to understand the concept of parallel computing.

The list of tools and data systems that are helping manage this specific concept continues to grow on a yearly basis. Whether it be using AWS and querying on Redshift or custom libraries, the need to learn how to wrangle data in parallel is very valuable.

Python — being the most popular language owing to its ease of use — offers a number of libraries that enable programmers to develop more powerful software for the purpose of running models and data transforms in parallel.

What if a magical solution appears and offers parallel computing, speeded up algorithms, and even allows you to integrate NumPy and pandas with the XGBoost libraries? …


finance building for fintech
finance building for fintech
Photo by Sean Pollock on Unsplash

You may think of ATM workings as a revolutionary experience but since the advent of Fintech, the entire financial services domain has entered a new era. Whether you purchase a cup of coffee or manage your finances, fintech is everywhere. From payments via apps such as Payoneer or Paypal to getting reports, or even using cryptocurrency, fintech is everywhere. This decade — 2020 — is bringing along loads of useful technological developments and therefore, you need to implement these updates to stay ahead in the industry and offer better services.

So how are companies using fintech to benefit from intelligent technology? In this article, we will take a look at what exactly Fintech is and how the top finance industry organizations are using it. …

About

SeattleDataGuy

#Data #Engineer, Strategy Development Consultant and All Around Data Guy #deeplearning #machinelearning #datascience #tech #management http://bit.ly/2uKsTVw

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store