Web Scraping and Data Visualization

Basic web scraping with Selenium and BeautifulSoup4 to identify trends in how Bridgewater is investing.

Image for post
Image for post

I’m sure I’m not the first to tell you this, but there is a treasure trove of data online for you to use to help answer any questions you might have. For example, what if we were curious how some of the smartest minds in Finance were navigating the last ten years of equity markets in the U.S.? Bridgewater is a perfect example of this, and luckily for us they are mandated to publicly document the name and size of their holdings in a quarterly SEC report dubbed 13F-HR.

The Securities and Exchange Commission’s (SEC) Form 13F is a quarterly…

Chainlink on Ethereum

In this article, I will cover how my team set up dog walking data on MongoDB Atlas, exposed a GraphQL authenticated endpoint with Realm, and created a Chainlink External Adapter to get that data onto Ethereum Smart Contracts.

Image for post
Image for post
Photo by Feifei Peng on Unsplash

Last week, my team finished developing and submitting our MarketMake 2021 product FidoByte — a tokenized shelter-dog walking protocol. One of the most difficult parts of this project was figuring out how to manage a flexible database both on and off chain.

In this article, I want to show how data flows through our project and how you can replicate the same architecture.

Building a risk assessment framework with an interactive Plotly Dash app, natural language processing, statistics, and a lot of climate science.

Image for post
Image for post
Photo by Jonathan Ford on Unsplash

With climate change and climate risks moving to the forefront of the conversation in almost every industry, it’s more important now than ever to understand where we are today and what trajectory we are heading on. In this article, I want to focus on flooding specifically in three parts:

  1. What are the risks and causes of flooding and what are the impacts?
  2. What could an assessment framework look like, one that is generalized enough to be helpful for anyone?
  3. How can we fill the framework using data of any type from states along the U.S. …

Being tech forward as a credit analyst at a top investment bank has taught me a lot about bridging two fundamentally different learning curves.

All images except this first one are by the author.

Image for post
Image for post
Photo by Anders Jildén on Unsplash

If you work in technology, you likely see finance as just markets — where people try too hard to make “spreads wider” by 0.01%. And if you work in finance, you probably see technology as just a tool — where anything that doesn’t have a clear UI is too technical and the command line is never to be touched. While both are equal in terms of complexity, I’ve found it’s usually harder to convince someone in finance to pick up coding skills compared to the other way around. It’s easy…

Image for post
Image for post
Lucidity: A public project/procurement bidding and funds management platform

Ethereum Hackathon

A month long Ethereum hackathon is extremely tiring, yet rewarding. Follow along as I recount the technical and team challenges, and what I would’ve changed about our approach.

Since late 2018, I’ve actively participated in various crypto conferences in NYC and played around with DeFi. I think I tried cryptozombies.io back when they had only two or three lessons, but never really got that deep into Solidity development. However, earlier this September I attended an Ethereum Dev Onboarding session after finding Linda Xie’s tweet offering to help new developers get acquainted with the ecosystem. I highly recommend watching the recording of the panel, as the web3 fullstack tools and resources the panelists suggested helped save me likely hundreds of hours of frustration otherwise (three that are absolute musts…

Image for post
Image for post
Photo by Marius Masalar on Unsplash

We’ll use gensim, nltk, and spaCy to create and visualize tokenized vectors, cosine similarities, and entity-target relationships for Indeed Data Analyst job postings.

(The data and full Jupyter notebook walk-through can be found here.)

If you’re looking for a job as a data analyst or scientist, as well as trying to learn NLP — get ready to knock out two birds with one stone! For this article, I’ll be using 1000 job postings for “Data Analyst” on Indeed.com as my set of documents (corpora). I chose data analyst since there will be a larger variety of postings and requirements compared to data scientist. The columns are pretty self-explanatory:

Image for post
Image for post
Photo by Mat Reding on Unsplash

Data Journalism

A deep dive into the partnerships of the seventeen SDGs using web scraping, exploratory data analysis, time series analysis, and natural language processing/soft cosine matrices.

The 2030 Agenda stands for renewed commitment to people, prosperity, planet and peace through strengthened global partnership. The Sustainable Development Goals (SDG) were set in that agenda in 2015, following the Millennium Goals of the previous decade that had targeted developing countries. It’s been five years since that announcement, so I thought it would be a good time to do some preliminary analysis of the partnerships (which are the largest drivers of this change). For those unfamiliar with the goals, you can find a summary of their statements, targets, and indicators here.

While there are 17 goals, they can each…

Image for post
Image for post
Photo by Max Bender on Unsplash

End-to-End Project

In this article I’ll be scoping out my project roadmap which will go from Arduino to Data Science/Analysis to Augmented Reality, and start with the first step.

I’ve always been an avid cyclist: from biking to and from high school every day over miles of hills, to now taking a Citibike 12 miles every day between Brooklyn and Times Square for work. Here’s a screenshot of my Citibike statistics over the two years as a member (before this I would skateboard around the city):

Image for post
Image for post
Photo by NeONBRAND on Unsplash

Data Science

Using web-scraping, dimensionality reduction, and unsupervised machine learning on 15,000+ Instagram posts from the 500 largest companies in the US to try and constitutes the style of a normal corporate post.

Back when I used to help plan and manage college events, the phrase “Style Guide” was thrown around every day. It’s the idea that everything you post should have consistent fonts, sizing, color palette, etc. I thought that it would be an interesting exercise to see which industries have the most diverse image posts (or least standard style guide) as well as text captions off of Instagram.

My theory…

Image for post
Image for post
Photo by Hennie Stander on Unsplash

Data Journalism

How can we use public data to analyze and understand the challenges Black communities face in the US?

In my last article, I created a dashboard using BLS and US Census data to quickly analyze US unemployment by race and industry in each county. This article will continue off of that, researching some of the counties suffering the most from COVID-19, and try to paint a better picture of what issues surround these communities.

My goal is to identify primarily Black counties that are suffering economically, and find commonalities (or differences) between them.

I’ve separated this process into four steps:

  1. Find Counties of Interest to Compare (using USA Data Dashboard)
  2. Contextualize the Counties Through Media and Policy (with Webscraping and NLP)
  3. Quantify the Problems (using…

Andrew Hong

Humanizing Data Through Stories https://www.linkedin.com/in/andrew-hong-nyc/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store