Boston Housing: Prediction of House Price

EDA- Exploratory Data Analysis

Summary

Correlation

Heatmap

Linear Regression

plt.figure(figsize=(12,10));
sns.regplot(X, y,robust=True);
plt.xlabel(‘average number of rooms per dwelling’)
plt.ylabel(“Median value of owner-occupied homes in \$1000's”)
plt.show();

sns.jointplot(x=’RM’, y=’MEDV’, data=df, kind=’reg’, size=10);
plt.show();

X = df[‘LSTAT’].values.reshape(-1,1)
y = df[‘MEDV’].values
model.fit(X, y)
plt.figure(figsize=(12,10));
sns.regplot(X, y);
plt.xlabel(‘% Lower status of the population’)
plt.ylabel(“Median value of owner-occupied homes in \$1000's”)
plt.show();

sns.jointplot(x=’LSTAT’, y=’MEDV’, data=df, kind=’reg’, size=10);
plt.show();

Robust Regression

`from sklearn.linear_model import RANSACRegressorransac = RANSACRegressor()ransac.fit(X, y)RANSACRegressor(base_estimator=None, is_data_valid=None, is_model_valid=None,        loss='absolute_loss', max_skips=inf, max_trials=100,        min_samples=None, random_state=None, residual_metric=None,        residual_threshold=None, stop_n_inliers=inf, stop_probability=0.99,        stop_score=inf)inlier_mask = ransac.inlier_mask_outlier_mask = np.logical_not(inlier_mask)np.arange(3, 10, 1)line_X = np.arange(3, 10, 1)line_y_ransac = ransac.predict(line_X.reshape(-1, 1))#plotsns.set(style='darkgrid', context='notebook')plt.figure(figsize=(12,10));plt.scatter(X[inlier_mask], y[inlier_mask],             c='blue', marker='o', label='Inliers')plt.scatter(X[outlier_mask], y[outlier_mask],            c='brown', marker='s', label='Outliers')plt.plot(line_X, line_y_ransac, color='red')plt.xlabel('average number of rooms per dwelling')plt.ylabel("Median value of owner-occupied homes in \$1000's")plt.legend(loc='upper left')plt.show()`
`X = df['LSTAT'].values.reshape(-1,1)y = df['MEDV'].valuesransac.fit(X, y)inlier_mask = ransac.inlier_mask_outlier_mask = np.logical_not(inlier_mask)line_X = np.arange(0, 40, 1)line_y_ransac = ransac.predict(line_X.reshape(-1, 1))sns.set(style='darkgrid', context='notebook')plt.figure(figsize=(12,10));plt.scatter(X[inlier_mask], y[inlier_mask],             c='blue', marker='o', label='Inliers')plt.scatter(X[outlier_mask], y[outlier_mask],            c='brown', marker='s', label='Outliers')plt.plot(line_X, line_y_ransac, color='red')plt.xlabel('% lower status of the population')plt.ylabel("Median value of owner-occupied homes in \$1000's")plt.legend(loc='upper right')plt.show()`

Method 1: Residual Analysis

plt.figure(figsize=(12,8))
plt.scatter(y_train_pred, y_train_pred — y_train, c=’blue’, marker=’o’, label=’Training data’)
plt.scatter(y_test_pred, y_test_pred — y_test, c=’red’, marker=’*’, label=’Test data’)
plt.xlabel(‘Predicted values’)
plt.ylabel(‘Residuals’)
plt.legend(loc=’upper left’)
plt.hlines(y=0, xmin=-10, xmax=50, lw=2, color=’k’)
plt.xlim([-10, 50])
plt.show()

Method 3: Coefficient of Determination

SSE: Sum of squared errors

SST: Total sum of squares

