How to create a Hastie 10- 2 dataset with sklearn

Crystal X
Geek Culture
Published in
4 min readSep 29, 2021

--

As I have been going through all of the datasets that can be made using sklearn, I came across the Hastie_10_2 dataset. This dataset has 10 input features and one label that is nonlinear.

The Hastie 10 2 dataset is a dataset that has a nonlinear label with 10 input features. This dataset was used in the work of Hastie et al in 2009, entitled “The Elements of Statistical Learning Ed. 2”, which can be found here:- https://web.stanford.edu/~hastie/ElemStatLearn/

This dataset has also been used in sklearn’s website to illustrate the differences in performance of two algorithms, the link to the project being here:- https://scikit-learn.org/stable/auto_examples/ensemble/plot_adaboost_hastie_10_2.html

I personally don’t have any experience with this dataset, so it is for that reason that I decided to use it on a machine learning project to see the outcome of any predictions derived from it.

I have written the program that I used to experiment on the Hastie_10_2 dataset in Google Colab, which is Google’s free online Jupyter Notebook. This is a great Jupyter Notebook to use because it is free and portable. It can be used on any computer that has internet access and a search engine. The only detraction that I can see of this platform is the fact that it does not have any undo function, so care…

--

--

Crystal X
Geek Culture

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.