A few months ago, Google launched its Cloud Dataprep service (still in beta) which is…
In this post, I summarize how you can use text within Amazon review data to identify topics and predict user rating. A more detailed discussion with examples can be found in the notebook at the end of this…
Some of the guides I came across were outdated or a little complicated. This takes you through installing XGBoost with Anaconda in Windows using Visual Studio 2017
BigQuery: https://bigquery.cloud.google.com/dataset/jbencina-144002:fb_news
GitHub (Data & Script): https://github.com/jbencina/facebook-news
The other day, I came across a dataset on Kaggle posted by Zach Thoutt which contained roughly 150,000 wine reviews scraped from Wine Enthusiast magazine. The dataset contains the price of the…
On any given day, there are 200 to 400 motor vehicle collisions reported to the NYPD in New York City. This collision data has been made available through the NYC Open Data project and can also be found through Google’s…
Over the last few months, I’ve been noticing more and more notifications from Google Maps asking me to rate locations I’ve been to in the past. It got me wondering how much could you actually infer from my location history? I’ve been opted into Google Location History for…
In this series of posts, I collect and analyze NYPD motor vehicle collisions along with NOAA weather data. If you missed part one, head over there to find how the data was collected. You can also find the relevant Python…
These were the top 10 stories published by Data Insights; you can also dive into yearly archives: 2016, 2017, and 2018.