[Week 6-Real Estate Price Estimation]

Ufuk Baran Karakaya
bbm406f18
Published in
2 min readJan 6, 2019

Overview of the Week

In this week, The project is carried on with observing CatBoost and XGBoost algorithms. This blog post includes the results of algorithms, metrics, and analysis of features.

XGBoost and CatBoost

If we want to talk about XGBoost and Catboost shortly,

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting that solve many data science problems in a fast and accurate way. The same code runs on a major distributed environment and can solve problems beyond billions of examples.

XGBoost depends on iteration , according to hyperparameters such as boost count ,early stopping round etc. The program will be stop when it finds the best MAE score for train and validation.

source: https://medium.com/@pushkarmandot/https-medium-com-pushkarmandot-what-is-lightgbm-how-to-implement-it-how-to-fine-tune-the-parameters-60347819b7fc

CatBoost is based on gradient boosted decision trees. During training, a set of decision trees is built consecutively. Each successive tree is built with reduced loss compared to the previous trees.CatBoost is more successful for categorical features. This advantage creates more robust model.

Cat features are:

  • transaction_year
  • transaction_month
  • transaction_day
  • transaction_quarter
  • airconditioningtypeid
  • buildingqualitytypeid
  • fips
  • heatingorsystemtypeid
  • propertycountylandusecode
  • propertylandusetypeid
  • regionidcity
  • regionidcounty
  • regionidneighborhood
  • regionidzip
  • yearbuilt
  • assessmentyear

In the project this features are given the above will be used for prediction in CatBoost.CatBoost is based on decision trees. The project works with hyperparameters such as iteration count, learning rate, leaf_regression and depth.

The model which is trained by CatBoost is recalculated step by step during iteration.

XGBoost and CatBoost uses Mean Absolute Error metric for calculation of error. MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.If the absolute value is not used (the errors are not removed), the mean error is the Average Bias Error (MBE) and is typically used to measure the average model bias. The MBE may transmit useful information, but it must be carefully interpreted as it will cancel positive and negative errors. Thus,using MAE is better than MBE.

References

--

--