Scikit-learn

scikit-learn

#scikit-learn

Table of Contents

About

1

Chapter 1: Getting started with scikit-learn

2

Remarks

2

Examples

2

Installation of scikit-learn

2

Train a classifier with cross-validation

2

Creating pipelines

3

Interfaces and conventions:

4

Sample datasets

4

Chapter 2: Classification

6

Examples

6

Using Support Vector Machines

6

RandomForestClassifier

6

Analyzing Classification Reports

7

GradientBoostingClassifier

8

A Decision Tree

8

Classification using Logistic Regression

9

Chapter 3: Dimensionality reduction (Feature selection)

11

Examples

11

Reducing The Dimension With Principal Component Analysis

11

Chapter 4: Feature selection

13

Examples

13

Low-Variance Feature Removal

13

Chapter 5: Model selection

15

Examples

15

Cross-validation

15

K-Fold Cross Validation

15

K-Fold

16

ShuffleSplit

16

Chapter 6: Receiver Operating Characteristic (ROC)

17

Examples

17

Introduction to ROC and AUC

17

ROC-AUC score with overriding and cross validation

18

Chapter 7: Regression

20

Examples

20

Ordinary Least Squares

20

Credits

22

About

You can share this PDF with anyone you feel could benefit from it, downloaded the latest version from: scikit-learn

It is an unofficial and free scikit-learn ebook created for educational purposes. All the content is extracted from Stack Overflow Documentation, which is written by many hardworking individuals at Stack Overflow. It is neither affiliated with Stack Overflow nor official scikit-learn.

The content is released under Creative Commons BY-SA, and the list of contributors to each chapter are provided in the credits section at the end of this book. Images may be copyright of their respective owners unless otherwise specified. All trademarks and registered trademarks are the property of their respective company owners.

Use the content presented in this book at your own risk; it is not guaranteed to be correct nor accurate, please send your feedback and corrections to info@



1

Chapter 1: Getting started with scikit-learn

Remarks

scikit-learn is a general-purpose open-source library for data analysis written in python. It is based on other python libraries: NumPy, SciPy, and matplotlib scikit-learncontains a number of implementation for different popular algorithms of machine learning.

Examples

Installation of scikit-learn

The current stable version of scikit-learn requires: ? Python (>= 2.6 or >= 3.3), ? NumPy (>= 1.6.1), ? SciPy (>= 0.9).

For most installation pip python package manager can install python and all of its dependencies:

pip install scikit-learn

However for linux systems it is recommended to use conda package manager to avoid possible build processes

conda install scikit-learn

To check that you have scikit-learn, execute in shell:

python -c 'import sklearn; print(sklearn.__version__)'

Windows and Mac OSX Installation: Canopy and Anaconda both ship a recent version of scikit-learn, in addition to a large set of scientific python library for Windows, Mac OSX (also relevant for Linux).

Train a classifier with cross-validation

Using iris dataset:

import sklearn.datasets iris_dataset = sklearn.datasets.load_iris()



2

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download