Machine Learning for Unbalanced Datasets using Neural Networks

Can neural networks be used for binary classification in the case of unbalanced datasets?

There are a few ways to address unbalanced datasets: from built-in class_weight in a logistic regression and sklearn estimators to manual oversampling, and SMOTE. We will look at whether neural networks can serve as a reliable out-of-the-box solution and what parameters can be tweaked to achieve a better performance.

Code is available on GitHub.

We’ll use the Framingham Heart Study data set from Kaggle for this exercise. It presents a binary classification problem in which we need to predict a value of the variable “TenYearCHD” (zero or one) that shows whether a patient will develop a heart disease. …

Machine learning on categorical variables

How to properly run and evaluate models

At first blush, categorical variables aren’t that different from numerical ones. But once you start digging deeper and implement your machine learning (and preprocessing) ideas in code, you will stop every minute asking questions such as “Do I do feature engineering on both train and test sets?” or “I heard something about cardinality — what is that and should I Google more about it?”

Let’s see if we can clear some of this up with an action plan for how you deal with data sets that have a lot of categorical variables and train a couple of models.

Kaggle will…

Classification of unbalanced datasets

How to properly do a classification analysis using sklearn when your dataset is unbalanced and improve its results.

Let’s imagine you have a dataset with a dozen features and need to classify each observation. It can be either a two-class problem (your output is either 1 or 0; true or false) or a multi-class problem (more than two alternatives are possible). In this case, however, there is a twist. The data is unbalanced. Think about patients who may or may not have cancer (the majority probably won’t) or a decision to extend a credit line (the majority of bank clients get an extension). Your machine learning algorithm will be “overexposed” to one class, and “underexposed” to another. There…

MBA Students: types and personalities

It’s been more than two years since I graduated from the University of Michigan with an MBA degree. Since then, I had a chance to live and work in North Carolina, New Jersey and finally return to New York, NY. Over the course of this journey, I met a lot of peers bearing the same degree: from the places ranging from the University of Iowa to the University of Pennsylvania and Harvard. Because business schools, in general, seek out people with similar backgrounds and goals, it becomes relatively easy to range all of them and put into buckets. …

Why Nike Is Still A Compelling Investment From An Operational Standpoint

I have published an article on Seeking Alpha about Nike’s operations. In the past few weeks, Nike’s stock has been tumbling and hitting historical lows. As always, a myriad of analysts on the above mentioned website started shorting the stock delivering prophecies of various level of pessimism. And, as always, the majority of them got the wrong sow by the ear. Nike’s most recent earnings came in better than analysts expected.

In my article, I cautiously walked away from looking at P/E ratios but rather assessed the company’s core operations, inventory levels and the most recent steps undertaken by the senior management team (like exiting certain categories). According to the existing contract, I cannot publish the whole text here, but I urge you to hit the link. You can share your thoughts either here or on Seeking Alpha.

How I took my CFA exam

When preparing to take the first level of the CFA exam, I read a lot of resources: from the official web site to numerous forum posts. The bottom line is that all these advises work… but not for everyone. I have decided to condense in one post everything that made sense in my situation.

My background: some investment banking and various finance classes in business school coupled with a non-finance undergraduate degree. I consider it an advantage because some of the topics from Fixed Income and Equity were extremely familiar. Problems faced: commute time (I spent around 40 minutes traveling…

17 Must-View Slides from KPCB’s “Internet Trends” Report

The annual report on Internet trends of 2015 prepared by KPCB is an absolutely must-see. It captures the major changes in the Internet world and expands to various other industries and verticals. I decided to highlight the most important slides and conclusions in case you don’t have time to go over 195 pages.

Slide 4. From 1995 to 2014 the number of Internet users grew 80 times, from 35M to 2.8B. Nowadays this number represents 39% of the world’s population. Slide 116 supplies some data about the US: here over 80% of people have access to the Internet.

Slide 10

Social Media Overseas

A recent article on how young people use social media ‘Teenager’s View on Social Media’, by Andrew Watts became so popular that it was reposted or mentioned by the likes of Business Insider, TechCrunch and Reddit. The author soon published a second part and then really grabbed the attention of analysts and media professionals. One of them, Danah Boyd, pointed out some flaws in Andrew’s stories. Probably the most important problem Boyd diagnosed was the way in which Watts equalized users across various geographical, income and racial segments. …

The New Republic: blame the youngest

Chris Hughes is blamed (and this is pretty reasonable) for breaking apart “The New Republic”: more than a dozen editors and staff members followed the magazine’s departing editor Franklin Foer and its literary editor Leon Wieseltier in leaving the publication. They were all replaced by newcomers appointed by Guy Vidra, the newly appointed chief executive and Hughes protégé.

As a result, everyone is unhappy now. Contributors and editors accuse Vidra of being snobbish, using faulty language and having a weak understanding of TNR’s roots, while the owner and senior management have already given up their hopes of putting out a…

Fox and Time Warner: what was it?

The generous offer made by Fox became the first step for Rupert Murdoch, a media executive famous for his deliberate way of pursuing companies he wants to put under his belt, in building the largest vertically integrated media company. But what would Time Warner bring to Mr. Murdoch should this deal finally come through?

First of all, let’s think about the challenges and problems Fox is currently facing. To do that, we need to depict the whole production cycle of bringing a new show to life. …

Michael Kareev

Basketball → Finance → Data Science

Get the Medium app