Web Scraping, Programming

Multi-part series showing how to scrape, preprocess and apply & visualize short text topic modeling for any collection of tweets

Image for post
Image for post
Major News Sources with Health — Specific Twitter Accounts (Image by author)

Disclaimer: This article is only for educational purposes. We do not encourage anyone to scrape websites, especially those web properties that may have terms and conditions against such actions.

Introduction

Topic modeling is an unsupervised machine learning approach with the goal to find the “hidden” topics (or clusters) inside a collection of textual documents (a corpus). Its real strength is that you don’t need labeled or annotated data but instead it accepts the raw text data as input only, and hence why it is unsupervised. …


How to navigate the decision making process

Image for post
Image for post
Photo by Victoriano Izquierdo on Unsplash

Two years working in the professional world out of undergraduate was all it took before I realized I wanted to shift towards a more data-centric career. I was working as an IT Risk Consultant, which was a great job in terms of prospects and growth, but it wasn’t where my long-term passions resided.

The role I was in was somewhat technical, but I wasn’t getting my hands dirty with as much of the data and programming that I would have liked. So I made the tough decision to seek other opportunities that would allow me to build out a strong…


Have you ever thought about how many credit card points you missed out on? Find out.

Image for post
Image for post
Photo by Avery Evans on Unsplash

When it comes to earning credit reward card points and cashback, we all have one common goal, to maximize the amount of value we get back for our spending. For those of us who may have many credit cards, all with different reward structures, deciding which credit card to use so we don’t miss out can be tricky.

Have you ever thought about how many points you may have missed out on? It sounds like a first world problem (and it kind of is) but that’s where Birch Finance comes in.

Birch Finance is a Fintech startup that helps you…


Web Scraping

With access to RSS feeds—getting Podcast data using R is easy and can be done in 5 simple steps

Image for post
Image for post
Photo by Jason Rosewell on Unsplash

Disclaimer: This article is only for educational purposes. We do not encourage anyone to scrape websites, especially those web properties that may have terms and conditions against such actions.

When it comes to scraping or data harvesting from a website or a digital platform such as a podcast archive, you usually have two options:

  1. Use an application programming interface (API) provided by platform
  2. Build a custom web scraper designed to work with that particular website

Using an API is usually the easiest since it gives you direct access to the data you’d want but that access may be limited. On…


Data Visualization, Natural Language Processing, Programming

Multi-part series showing how to scrape, preprocess and apply & visualize short text topic modeling for any collection of tweets

Disclaimer: This article is only for educational purposes. We do not encourage anyone to scrape websites, especially those web properties that may have terms and conditions against such actions.

Introduction

Topic modeling is an unsupervised machine learning approach with the goal to find the “hidden” topics (or clusters) inside a collection of textual documents (a corpus). Its real strength is that you don’t need labeled or annotated data but instead it accepts the raw text data as input only, and hence why it is unsupervised. …


Data Mining, Natural Language Processing, Programming

Multi-part series showing how to scrape, preprocess and apply & visualize short text topic modeling for any collection of tweets

Disclaimer: This article is only for educational purposes. We do not encourage anyone to scrape websites, especially those web properties that may have terms and conditions against such actions.

Introduction

Topic modeling is an unsupervised machine learning approach with the goal to find the “hidden” topics (or clusters) inside a collection of textual documents (a corpus). Its real strength is that you don’t need labeled or annotated data but instead it accepts the raw text data as input only, and hence why it is unsupervised. …


Web Scraping, Programming, Natural Language Processing

Multi-part series showing how to scrape, preprocess and apply & visualize short text topic modeling for any collection of tweets

Disclaimer: This article is only for educational purposes. We do not encourage anyone to scrape websites, especially those web properties that may have terms and conditions against such actions.

Introduction

Topic modeling is an unsupervised machine learning approach with the goal to find the “hidden” topics (or clusters) inside a collection of textual documents (a corpus). Its real strength is that you don’t need labeled or annotated data but instead it accepts the raw text data as input only, and hence why it is unsupervised. …

John Bica

Data Analytics Consultant. M.S — Data Science. Enthusiast in all things data, investing, personal finance, and Fintech.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store