Mr.Ghanshyam Dhomse (घनश्याम ढोमसे)



SNJB’s Late Sau.K. B. Jain College of Engineering, Chandwad

Department of Computer Engineering

Class-BE Computer Class- BE Computer Subject-Machine Learning Staff- Dhomse G.P.

Mock Insem Paper Solution 2018-19 SEM-II

Q1 A) Explain the concept of adaptive machines with reference to machine learning,

Ans-

- [pic]

• Adaptive Learning- Spam filtering, Natural Language Processing, visual tracking with a webcam or a smartphone, and predictive analysis are only a few applications that revolutionized human-machine interaction and increased our expectations. Such a system isn't based on static or permanent structures (model parameters and architectures) but rather on a continuous ability to adapt its behavior to external signals (datasets or real-time inputs) and, like a human being, to predict the future using uncertain and fragmentary pieces of information.

Q1 B) What does Machine learning exactly mean? Explain Application of Machine Learning for data scientists.

Ans- Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.

The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers learn automatically without human intervention or assistance and adjust actions accordingly.

Machine Learning is a growing field that is used when searching the web, placing ads, credit scoring, stock trading and for many other applications. This data science course is an introduction to machine learning and algorithms. ... We will also examine why algorithms play an essential role in Big Data analysis.

Machine learning and data science can work hand in hand. Take into consideration the definition of machine learning – the ability of a machine to generalize knowledge from data. Without data, there is very little that machines can learn. If anything, the increase in usage of machine learning in many industries will act as a catalyst to push data science to increase relevance. Machine learning is only as good as the data it is given and the ability of algorithms to consume it. Going forward, basic levels of machine learning will become a standard requirement for data scientists.

Q2 A) How machine Learning works for Big data applications?

Ans-

[pic]

Big Data is the collection of massive amounts of information, whether unstructured or structured.

Big Compute is the large-scale (often parallel) processing power required to extract value from Big Data.

Machine Learning is a branch of Computer Science that, instead of applying high-level algorithms to solve problems in explicit, imperative logic, applies low-level algorithms to discover patterns implicit in the data. (Think about this like how the human brain learns from life experiences vs. from explicit instructions.) The more data, the more effective the learning, which is why machine learning and big data are intricately tied together.

Predictive Analytics is using machine learning to predict future outcomes (extrapolation), or to infer unknown data points from known ones (interpolation).

Big data has got more to do with High Performance Computing, while Machine Learning is a part of Data Science. What happens in Big Data is large volumes of data which cannot be processed in reasonable amount of time, is processed quickly by various techniques and tools. In Machine Learning, a system learns from past experiences and is able to build a model which would most likely be able to comprehend future instances. 

One of the main reason why big data and machine learning are used together is because big data is more likely to be a preprocessing step to machine learning.

Q2 B) Explain how machine learning works for the following common un-supervised learning applications:

I. Object segmentation (for example, users, products, movies, songs, and so on)

II. Similarity detection

III. Automatic labeling

Ans- Nowadays, semantic segmentation is one of the key problems in the field of computer vision. Looking at the big picture, semantic segmentation is one of the high-level task that paves the way towards complete scene understanding. The importance of scene understanding as a core computer vision problem is highlighted by the fact that an increasing number of applications nourish from inferring knowledge from imagery. Some of those applications include self-driving vehicles, human-computer interaction, virtual reality etc. With the popularity of deep learning in recent years, many semantic segmentation problems are being tackled using deep architectures, most often Convolutional Neural Nets, which surpass other approaches by a large margin in terms of accuracy and efficiency.

Semantic segmentation is a natural step in the progression from coarse to fine inference:

• The origin could be located at classification, which consists of making a prediction for a whole input.

• The next step is localization / detection, which provide not only the classes but also additional information regarding the spatial location of those classes.

• Finally, semantic segmentation achieves fine-grained inference by making dense predictions inferring labels for every pixel, so that each pixel is labeled with the class of its enclosing object ore region.

II. Similarity detection-

• prepare our image database.

• Download the trained VGG model, and remove its last layers.

• Convert our image database into feature vectors using our dissected VGG model. If the output layer of the dissected model are convolutional filters then flatten the filters and append them make a single vector.

• Compute similarities between our image feature vectors using an inner-product such as cosine similarity or euclidean distance

• For each image, select the images with the top-k similarity scores to build the recommendation

[pic]

Q3 A) Justify the statement: Feature engineering is the first step in a machine learning pipeline and involves all the techniques adopted to clean existing datasets.

Ans- Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. If feature engineering is done correctly, it increases the predictive power of machine learning algorithms by creating features from raw data that help facilitate the machine learning process. Feature Engineering is an art.

Steps which are involved while solving any problem in machine learning are as follows:

• Gathering data.

• Cleaning data.

• Feature engineering.

• Defining model.

• Training, testing model and predicting the output.

Feature engineering is the most important art in machine learning which creates the huge difference between a good model and a bad model. Let’s see what feature engineering covers.

Q3 B) How categorical data are Managed in various classification problems?

Ans- Categorical features can only take on a limited, and usually fixed, number of possible values. For example, if a dataset is about information related to users, then you will typically find features like country, gender, age group, etc. Alternatively, if the data you're working with is related to products, you will find features like product type, manufacturer, seller and so on.

These are all categorical features in your dataset. These features are typically stored as text values which represent various traits of the observations. For example, gender is described as Male (M) or Female (F), product type could be described as electronics, apparels, food etc.

To help understand what causes delays, it also includes a number of other useful datasets:

• weather: the hourly meterological data for each airport

• planes: constructor information about each plane

• airports: airport names and locations

• airlines: translation between two letter carrier codes and names

The techniques that you'll use for Mange the Categorical Data are the following:

• Replacing values

• Encoding labels

• One-Hot encoding

• Binary encoding

• Backward difference encoding

• Miscellaneous features

Q4 A) Explain the process of Creating training and test sets for Iris Dataset.

Ans-As we work with datasets, a machine learning algorithm works in two stages. We usually split the data around 20%-80% between testing and training stages. Under supervised learning, we split a dataset into a training data and test data in Python ML.

[pic]

a. Prerequisites for Train and Test Data

We will need the following Python libraries for this tutorial- pandas and sklearn.

We can install these with pip-

1. pip install pandas

1. pip install sklearn

We use pandas to import the dataset and sklearn to perform the splitting. You can import these packages as-

1. >>> import pandas as pd

2. >>> from sklearn.model_selection import train_test_split

3. >>> from sklearn.datasets import load_iris

Let’s load the forestfires dataset using pandas.

1. >>> data=pd.read_csv('forestfires.csv')

2. >>> data.head()

Let’s split this data into labels and features. Now, what’s that? Using features, we predict labels. I mean using features (the data we use to predict labels), we predict labels (the data we want to predict).

1. >>> y=data.temp

2. >>> x=data.drop('temp',axis=1)

Temp is a label to predict temperatures in y; we use the drop() function to take all other data in x. Then, we split the data.

1. >>> x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

2. >>> x_train.head()

3. >>> x_train.shape

(413, 12)

1. >>> x_test.shape

(104, 12)

Q4 B) Write a short note on Sparse PCA and Kernel PCA

Ans- The standard PCA always finds linear principal components to represent the data in lower dimension. Sometime, we need non-linear principal components.If we apply standard PCA for the below data, it will fail to find good representative direction. Kernel PCA (KPCA) rectifies this limitation.

• Kernel PCA just performs PCA in a new space.

• It uses Kernel trick to find principal components in different space (Possibly High Dimensional Space).

• PCA finds new directions based on covariance matrix of original variables. It can extract maximum P (number of features) eigen values. KPCA finds new directions based on kernel matrix. It can extract n (number of observations) eigenvalues.

• PCA allow us to reconstruct pre-image using few eigenvectors from total P eigenvectors. It may not be possible in KPCA.

• The computational complexity for KPCA to extract principal components take more time compared to Standard PCA.

• Sparse PCA -Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

• If you think about the handwritten digits or other images that must be classified, their initial dimensionality can be quite high (a 10x10 image has 100 features).

• However, applying a standard PCA selects only the average most important features, assuming that every sample can be rebuilt using the same components.

• On the other hand, we can always use a limited number of components, but without the limitation given by a dense projection matrix.

• This can be achieved by using sparse matrices (or vectors), where the number of non-zero elements is quite low. In this way, each element can be rebuilt using its specific components

• Here the non-null components have been put into the first block (they don't have the same order as the previous expression), while all the other zero terms have been separated. In terms of linear algebra, the vectorial space now has the original dimensions.

Kernel PCA- the class KernelPCA, which performs a PCA with non-linearly separable data sets.

>>>from sklearn.datasets import make_circles

>>> Xb, Yb = make_circles(n_samples=500, factor=0.1, noise=0.05)

However, looking at the samples and using polar coordinates (therefore, a space where it's possible to project all the points), it's easy to separate the two sets, only considering the radius.

Q5 A) Explain linear classification algorithm with example.

Ans- In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object's characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use.

[pic]

At the most fundamental point, linear methods can only solve problems that are linearly separable (usually via a hyperplane).  If you can solve it with a linear method, you're usually better off.  However, if linear isn't working for your particular problem, the next step is to use a nonlinear method, which typically involves applying some type of transformation to your input dataset.  After the transformation, many techniques then try to use a linear method for separation.  

1.If accuracy is more important to you than the training time then use Non-linear else use Linear classifier. This is because linear classifier uses linear kernels and are faster than non-linear kernels used in the non-linear classifier.

2. Linear classifier (SVM) is used when number of features are very high, e.g., document classification. This is because Linear SVM gives almost similar accuracy as non linear SVM but Linear SVM is very very fast in such cases.

3. Use non-linear classifier when data is not linearly separable. Under such conditions, linear classifiers give very poor results (accuracy) and non-linear gives better results. This is because non-linear Kernels map (transform) the input data (Input Space) to higher dimensional space( called Feature Space) where a linear hyperplane can be easily found.

Q5 B) What is significance of ROC curve with reference to logistic regression?

Ans- The ROC curve (or receiver operating characteristics) is a valuable tool to compare different classifiers that can assign a score to their predictions.

In general, this score can be interpreted as a probability, so it's bounded between 0 and 1.

The plane is structured like in the following figure:

[pic]

• The x axis represents the increasing false positive rate (also known as specificity), while the y axis represents the true positive rate (also known as sensitivity).

• The dashed oblique line represents a perfectly random classifier, so all the curves below this threshold perform worse than a random choice, while the ones above it show better performances. Of course, the best classifier has an ROC curve split into the segments [0, 0] - [0, 1] and [0, 1] - [1, 1], and our goal is to find algorithms whose performances should be as close as possible to this limit.

• To show how to create a ROC curve with scikit-learn, we're going to train a model to determine the scores for the predictions (this can be achieved using the decision_function() or predict_proba() methods):

>>> X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25) 

>>> lr = LogisticRegression()

>>> lr.fit(X_train, Y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)

>>> Y_scores = lr.decision_function(X_test)

• we can compute the ROC curve:

from sklearn.metrics import roc_curve

>>> fpr, tpr, thresholds = roc_curve(Y_test, Y_scores)

• it's also useful to compute the area under the curve (AUC), whose value is bounded between 0 (worst performances) and 1 (best performances), with a perfectly random value corresponding to 0.5.

Q6 A) Compare linear and logistic regression?

Ans- Linear and Logistic regression are the most basic form of regression which are commonly used. The essential difference between these two is that Logistic regression is used when the dependent variable is binary in nature. In contrast, Linear regression is used when the dependent variable is continuous and nature of the regression line is linear. Linear Regression Logistics Regression

|Basic |The data is modelled using a straight |The probability of some obtained event is represented as a |

| |line. |linear function of a combination of predictor variables. |

|Linear relationship between dependent |Is required |Not required |

|and independent variables | | |

|The independent variable |Could be correlated with each other. |Should not be correlated with each other (no |

| |(Specially in multiple linear regression)|multicollinearity exist). |

 linear regression technique involves the continuous dependent variable and the independent variables can be continuous or discrete. By using best fit straight line linear regression sets up a relationship between dependent variable (Y) and one or more independent variables (X). In other words, there exist a linear relationship between independent and dependent variables.

The difference between linear and multiple linear regression is that the linear regression contains only one independent variable while multiple regression contains more than one independent variables. The best fit line in linear regression is obtained through least square method. [pic]

The following equation is used to represent a linear regression model: Where b0 is the intercept, b1 is the slope of the line and e is the error. Here Y is dependent variable and X is an independent variable.

The logistic regression technique involves dependent variable which can be represented in the binary (0 or 1, true or false, yes or no) values, means that the outcome could only be in either one form of two. For example, it can be utilized when we need to find the probability of successful or fail event. Here, the same formula is used with the additional sigmoid function, and the value of Y ranges from 0 to 1.

Logistic regression equation :[pic] By putting Y in Sigmoid function, we get the following result.

[pic]

Q6 B) Explain the following types of regression with Examples.

1.Ridge 2.Lasso 3.ElasticNet.

Ans-

• Ridge and Lasso regression are powerful techniques generally used for creating parsimonious models in presence of a ‘large’ number of features. Here ‘large’ can typically mean either of two things:

• Large enough to enhance the tendency of a model to overfit (as low as 10 variables might cause overfitting)

• Large enough to cause computational challenges. With modern systems, this situation might arise in case of millions or billions of features

• Though Ridge and Lasso might appear to work towards a common goal, the inherent properties and practical use cases differ substantially. If you’ve heard of them before, you must know that they work by penalizing the magnitude of coefficients of features along with minimizing the error between predicted and actual observations. These are called ‘regularization’ techniques.

• [pic]

• Ridge regression imposes an additional shrinkage penalty to the ordinary least squares loss function to limit its squared L2 norm:

• Ridge Regression: Performs L2 regularization, i.e. adds penalty equivalent to square of the magnitude of coefficients

• Minimization objective = LS Obj + α * (sum of square of coefficients)

• Note that here ‘LS Obj’ refers to ‘least squares objective’, i.e. the linear regression objective without regularization.

• The last alternative is ElasticNet, which combines both Lasso and Ridge into a single model with two penalty factors: one proportional to L1 norm and the other to L2 norm. In this way, the resulting model will be sparse like a pure Lasso, but with the same regularization ability as provided by Ridge. The resulting loss function is:

• [pic]

***************************************** THE END*******************************************

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download