Connect PySpark to Snowflake ❄️

Background

In part three of this three-part series, in Part 1 we learned about PySpark, Snowflake, Azure, and Jupyter Notebook, then in Part 2 we launched a PySpark cluster in Azure on HDInsight. …


Launch 🚀 a PySpark Cluster

Background

This is part two, of a three-part series. In part one we learned about PySpark, Snowflake, Azure, and Jupyter Notebook. Now in part two, we’ll learn how to launch a PySpark Cluster and connect to an existing Snowflake instance.

Step 1: Prepare for cluster build-out

Precursor

We will launch a production-grade PySpark…


Prepare to start, Snowflake + PySpark = 🌞

Purpose

At the end of this three-part series, you’ll be able to launch a Spark cluster running in Azure on HDInsight, query live data from Snowflake using the Snowflake Connector with pushdown capability, all done through Jupyter notebook, using a Python 3.5 …


Data is forever, your pipeline, storage, and shape isn’t.

Have you ever needed to compare the same data in two different databases 🤔?

This is required for data migration, warehouse consolidation, ingestion re-writes, or major system releases. …


“I can’t move on-prem to the cloud because management doesn’t support it”

I hear that from financial companies small and large. Early on, before the cloud was a thing, I helped companies understand why you’d want to do this.

I’m going to share with you the same experiences that worked…


It’s all about Alpha baby!

Driving alpha at the speed of streams.

Excess returns originate from experimentation, and rapid iterations are enabled by speed, in 2020 that means streaming.

This boat has already set sail at leading tech companies and many world-class investment managers. …


I bet your data platform, is one of the following:

a) custom built using a variety of open-source libraries and tools

b) a one-stop-shop software-enabled vendor does 100% for you

c) you’ve bought a costly off-the-shelf monolithic solution

What’s wrong with that? Well, if you sit back, think about it…


A data platform is not a solution or warehouse. The data platform enables you to create a solution. The solution provides addresses a value stream. The value stream answers a specific set of needs. Those needs are well defined and scoped. The outcome is measurable and repeatable using deliverables.

It’s that simple. Just like you decouple components and abstractions in a well designed architecture you must do the same when working hand-in-hand with your customers.


“Investment 360” in finance and “Threat 360” in cybersecurity are gaining momentum. Familiar with “Customer 360” and “Know Your Customers”? That desire has expanded into financial instruments and cybersecurity threats.

We’ve talked to 6 CTO’s across Fortune 500 companies and Top 25 Investment Managers. 5 out of 6 have are addressing these demands. Why? emerging technology, like knowledge graphs, artificial intelligence, time series databases, and streaming engines, are rapidly maturing and so is the interest in use cases.

The commonality, and why this is really cool, for what Open Aristos is doing, is it all comes back to resolving entities and relationships deep side of your data within milliseconds, seeing the “whole picture” in a few moments.

Open Aristos, is a marketplace of building blocks used to create your own data platform in less than 60 minutes.


You may be asking, what is master data? It is simply a real-world entity, for example a company, product, contract, person, etc.

Why do we need to master data? Because not all data sources have the same way of identifying an entity. This is especially true when you take into…

Doug Eisenstein

I see data and think oh, it’s a puzzle, please solve me! I’m an entrepreneur at the cross-section of data, finance, and technology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store