[Week 3 — YelpGuesser]

YelpGuesser
bbm406f16
Published in
2 min readDec 23, 2016

Why We Need To Training and Test Set?

As we have already mentioned, our project we will use Yelp dataset. Firstly we have to say Yelp data is very huge.If we want to use our dataset useful for the project, we have to ask “When our predictions have an error, how can we solve problems?”

We interested the quality of our predictions. For example, our project we will look reviews and stars. Which reviews corresponding to stars our test data. After that, we can predict “ what should be review’s of the star?”.We turn to this idea now. We want to see perfect or approximately good predictions. Such problems we divided our data test and training set. Thanks to that partitions we can improve our predictions.

Test set performs to successful results on our data.Also, we can estimate the error in data.It’s important because if we see the errors on the data we can improve that then we can make good predictions.

We divided into two partitions in our dataset. In addition, we split dataset %80 is training set,%20 test set.We working with CSV files on the command line.On testing, we can evaluate our performance. All of the details of data derive from training set.Is prediction successful? We can see that.We are on the right way if the results are close. It’s very important for our models.

For our algorithms, we will use training set. We can implement algorithms. On the test set in order to improve the predictions. If predictions have an error and so bad we have to start from our training set again until the achieve good performance.Next week we will discuss the which algorithms will suit our project.
See you again!

--

--