At the start of machine learning examples you will need some example data sets.
Scikit-learn provides few good examples, here is more what it provides and how
to use it.
Scikit-learn example data sets
Scikit-learn comes with number of existing example data sets you can use
when starting with machine learning. This is how you can find them:
You can see three type of functions:
make_<SOME_DATA_SET> - Sample generators.
Generate samples of synthetic data sets.
load_<SOME_DATA_SET> - Load local data sets which come with
Scikit-learn. These are parts of python sklearn package.
All available:
boston_house_prices.csv
breast_cancer.csv
diabetes_data.csv.gz
diabetes_target.csv.gz
digits.csv.gz
iris.csv
linnerud_exercise.csv
linnerud_physiological.csv
wine_data.csv
fetch_<SOME_DATA_SET> - Load external data set,
download them from Internet. These are too bit to include them
into Python package and some of them include multiple binary files.
All available:
fetch_20newsgroups
fetch_20newsgroups_vectorized
fetch_lfw_pairs
fetch_lfw_people
fetch_mldata
fetch_olivetti_faces
fetch_species_distributions
fetch_california_housing
fetch_covtype
fetch_rcv1
fetch_kddcup99
fetch_openml
And this is how you can fetch some data:
Scikit-learn examples data format
This is how you can load the data:
Where sklearn.utils.Bunch is a dictionary-like object that exposes its
keys as attributes.