Don’t forget (to read) the word “Logistic” in Logistic Regression!

Laurae: This post is about not getting fooled by the word “Regression” in Logistic Regression. Quickly reading can trip yourself off for no reason! It can happen to anyone, even myself. The original topic can be found at Kaggle.
Moral of the story: remember the Logistic Regression has a Softmax function after the regression to cap predicted values between [0, 1] =)

Antonio Augusto Santos wrote:
I fail to see how you can use regression on this problem… The hotel cluster values are not ints, they are IDs, so if a hotel cluster is 1 it is not any closer to 2 than it is from 99.
Also, you have to give five results (that depends on the probability of all clusters) what, again, regression can’t do for you.

Logistic Regression = Classification

If you use a two-class logistic regression, you will need 100 models: ID=1 against all others, ID=2 against all others, ID=3 against all others, etc. Once you have the probabilities for each ID, you rank them by how high the probability is. If you use a multi-class logistic regression, 1 model is sufficient, you only need to sort ranks afterwards.

Quick expected response, as it was just read too fast!

Antonio Augusto Santos wrote:
Sorry guys! My mistake :) I faield to see the LOGISTIC before the REGRESSION lol!
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.