Scikit-Learn - Tutorialspoint

嚜燙cikit-Learn

i

Scikit-Learn

About the Tutorial

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python.

It provides a selection of efficient tools for machine learning and statistical modeling

including classification, regression, clustering and dimensionality reduction via a

consistence interface in Python. This library, which is largely written in Python, is built

upon NumPy, SciPy and Matplotlib.

Audience

This tutorial will be useful for graduates, postgraduates, and research students who either

have an interest in this Machine Learning subject or have this subject as a part of their

curriculum. The reader can be a beginner or an advanced learner.

Prerequisites

The reader must have basic knowledge about Machine Learning. He/she should also be

aware about Python, NumPy, Scipy, Matplotlib. If you are new to any of these concepts,

we recommend you take up tutorials concerning these topics, before you dig further into

this tutorial.

Copyright & Disclaimer

? Copyright 2019 by Tutorials Point (I) Pvt. Ltd.

All the content and graphics published in this e-book are the property of Tutorials Point (I)

Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish

any contents or a part of contents of this e-book in any manner without written consent

of the publisher.

We strive to update the contents of our website and tutorials as timely and as precisely as

possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.

Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our

website or its contents including this tutorial. If you discover any errors on our website or

in this tutorial, please notify us at contact@

ii

Scikit-Learn

Table of Contents

About the Tutorial ........................................................................................................................................... ii

Audience .......................................................................................................................................................... ii

Prerequisites .................................................................................................................................................... ii

Copyright & Disclaimer .................................................................................................................................... ii

Table of Contents ........................................................................................................................................... iii

1.

Scikit-Learn 〞 Introduction ...................................................................................................................... 1

What is Scikit-Learn (Sklearn)? ........................................................................................................................ 1

Origin of Scikit-Learn ....................................................................................................................................... 1

Community & contributors.............................................................................................................................. 1

Prerequisites .................................................................................................................................................... 2

Installation ....................................................................................................................................................... 2

Features ........................................................................................................................................................... 3

2.

Scikit-Learn 求 Modelling Process ............................................................................................................. 4

Dataset Loading ............................................................................................................................................... 4

Splitting the dataset ........................................................................................................................................ 6

Train the Model ............................................................................................................................................... 7

Model Persistence ........................................................................................................................................... 8

Preprocessing the Data ................................................................................................................................... 9

Binarisation ...................................................................................................................................................... 9

Mean Removal................................................................................................................................................. 9

Scaling ............................................................................................................................................................ 10

Normalisation ................................................................................................................................................ 11

3.

Scikit-Learn 〞 Data Representation ....................................................................................................... 13

Data as table .................................................................................................................................................. 13

Data as Feature Matrix .................................................................................................................................. 13

Data as Target array ...................................................................................................................................... 14

iii

Scikit-Learn

4.

Scikit-Learn 求 Estimator API ................................................................................................................... 16

What is Estimator API? .................................................................................................................................. 16

Use of Estimator API ...................................................................................................................................... 16

Guiding Principles .......................................................................................................................................... 17

Steps in using Estimator API .......................................................................................................................... 18

Supervised Learning Example ........................................................................................................................ 18

Unsupervised Learning Example ................................................................................................................... 23

5.

Scikit-Learn 〞 Conventions .................................................................................................................... 26

Purpose of Conventions ................................................................................................................................ 26

Various Conventions ...................................................................................................................................... 26

6.

Scikit-Learn 求 Linear Modeling .............................................................................................................. 31

Linear Regression .......................................................................................................................................... 32

Logistic Regression ........................................................................................................................................ 34

Ridge Regression ........................................................................................................................................... 37

Bayesian Ridge Regression ............................................................................................................................ 40

LASSO (Least Absolute Shrinkage and Selection Operator)........................................................................... 43

Multi-task LASSO ........................................................................................................................................... 45

Elastic-Net...................................................................................................................................................... 47

MultiTaskElasticNet ....................................................................................................................................... 51

7.

Scikit-Learn 〞 Extended Linear Modeling ............................................................................................... 54

Introduction to Polynomial Features ............................................................................................................. 54

Streamlining using Pipeline tools .................................................................................................................. 55

8.

Scikit-Learn 求 Stochastic Gradient Descent ............................................................................................ 57

SGD Classifier ................................................................................................................................................. 57

SGD Regressor ............................................................................................................................................... 61

Pros and Cons of SGD .................................................................................................................................... 63

9.

Scikit-Learn 〞 Support Vector Machines (SVMs) .................................................................................... 64

Introduction ................................................................................................................................................... 64

iv

Scikit-Learn

Classification of SVM ..................................................................................................................................... 65

SVC ................................................................................................................................................................. 65

NuSVC ............................................................................................................................................................ 69

LinearSVC ....................................................................................................................................................... 70

Regression with SVM ..................................................................................................................................... 71

SVR................................................................................................................................................................. 71

NuSVR ............................................................................................................................................................ 72

LinearSVR ....................................................................................................................................................... 73

10. Scikit-Learn 求 Anomaly Detection .......................................................................................................... 75

Methods ........................................................................................................................................................ 75

Sklearn algorithms for Outlier Detection ...................................................................................................... 76

Fitting an elliptic envelop .............................................................................................................................. 76

Isolation Forest .............................................................................................................................................. 78

Local Outlier Factor ....................................................................................................................................... 80

One-Class SVM............................................................................................................................................... 82

11. Scikit-Learn 〞 K-Nearest Neighbors (KNN) ............................................................................................. 84

Types of algorithms ....................................................................................................................................... 84

Choosing Nearest Neighbors Algorithm ........................................................................................................ 85

12. Scikit-Learn 求 KNN Learning................................................................................................................... 87

Unsupervised KNN Learning .......................................................................................................................... 87

Supervised KNN Learning .............................................................................................................................. 91

KNeighborsClassifier ...................................................................................................................................... 91

RadiusNeighborsClassifier ............................................................................................................................. 97

Nearest Neighbor Regressor ......................................................................................................................... 99

KNeighborsRegressor .................................................................................................................................... 99

RadiusNeighborsRegressor .......................................................................................................................... 101

13. Scikit-Learn 求 Classification with Na?ve Bayes ..................................................................................... 104

Gaussian Na?ve Bayes .................................................................................................................................. 105

v

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download