Dhomaseghanshyam.files.wordpress.com



SNJB’s Late Sau. K. B. Jain of College Of Engineering, Chandwad

Department Of Computer Engineering

Sub- Machine Learning

[Mock Insem Solution 2019-20 Sem-II]

Q1 A) Explain the concept of classic and adaptive machines with reference to machine learning with example

Ans-Generic Representation of a Classical System that receives some input values, processes them, and produces output results:

[pic]

• Machine learning algorithms are described as learning a target function (f) that best maps input variables (X) to an output variable (Y).

Y = f(X)

• This is a general learning task where we would like to make predictions in the future (Y) given new examples of input variables (X).

• It is harder than you think. There is also error (e) that is independent of the input data (X).

Y = f(X) + e

• This error might be error such as not having enough attributes to sufficiently characterize the best mapping from X to Y. This error is called irreducible error because no matter how good we get at estimating the target function (f), we cannot reduce this error.

• Programmable computers are widespread, flexible, and more and more powerful instruments; moreover, the diffusion of the internet allowed us to share software applications and related information with minimal effort. The word-processing software that I'm using, my email client, a web browser, and many other common tools running on the same machine are all examples of such flexibility. It's undeniable that the IT revolution dramatically changed our lives and sometimes improved our daily jobs, but without machine learning(and all its applications), there are still many tasks that seem far out of computer domain. Spam filtering, Natural Language Processing, visual tracking with a webcam or a smartphone, and predictive analysis are only a few applications that revolutionized human-machine interaction and increased our expectations. In many cases, they transformed our electronic tools into actual cognitive extensions that are changing the way we interact with many daily situations.

Schematic Representation of an adaptive system

[pic]

Q1 B) "What does Machine learning exactly mean? Explain the application Sentiment analysis with reference to machine learning

Ans- A formal definition of machine learning can be, “A computer program learns from experience while doing some class of task if its performance measure improves at that task with experience”. This definition follows Alan Turing’s proposal in his paper “Computing Machinery and Intelligence”, in which the question he asked was “Can machine think rationally?” In layman term, we can say that Machine learning (ML) studies the mathematical models of a task with the help of various algorithms and use them to progressively improve the performance on a specific task.

Machine learning problems are classified into several broad categories. Supervised learning is one of them in which the algorithm builds a mathematical model of a data-set. Another one is Semi-supervised learning, although it is supervised but slightly different from supervised learning, in which learning algorithms develop mathematical models from an incomplete data-set, and a portion of the sample inputs are missing the expected output. Whereas classification and regression, algorithms are types of supervised learning. When the outputs are restricted to a specific range of values, classification algorithms are being used. While in regression algorithms are used for continuous outputs, that means that outputs can have any value within a range.

Unsupervised learning only takes data-set which has only inputs and builds a mathematical model of the data-set. Unsupervised learning finds structure or pattern in the data-set, and groups(clusters) them to a data point. The recognized pattern will be categorized as in feature learning.

[pic]

Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques. Sentiment analysis allows businesses to identify customer sentiment toward products, brands or services in online conversations and feedback.

Sentiment analysis models detect polarity within a text (e.g. a positive ornegative opinion), whether it’s a whole document, paragraph, sentence, or clause.

Understanding people’s emotions is essential for businesses since customers are able to express their thoughts and feelings more openly than ever before. By automatically analyzing customer feedback, from survey responses to social media conversations, brands are able to listen attentively to their customers, and tailor products and services to meet their needs.

Q2 A) Explain Reinforcement Learning with suitable example.

Ans- Reinforcement learning is also a machine learning algorithm in which machine predicts the outcome based on the previous feedback of positive or negative reinforcement in a dynamic environment. Machine learning algorithms can be utilized to find the probability of expected outcome in specific problems. In the meta-learning algorithm, inductive bias is learned by the algorithm itself based on previous experience.

• Reinforcement learning is also based on feedback provided by the environment. However, in this case, the information is more qualitative and doesn't help the agent in determining a precise measure of its error.

• this feedback is usually called reward (sometimes, a negative one is defined as a penalty) and it's useful to understand whether a certain action performed in a state is positive or not.

• an action can also be imperfect, but in terms of a global policy it has to offer the highest total reward.

• Reinforcement Learning is a framework for learning where an agent interacts with an environment and receives a reward for each interaction. The goal is to learn to accumulate as much reward as possible over time.

• The real advantage these systems have over conventional supervised learning is illustrated by this example I like a lot:

• Supervised Learning: Let us say that you know how to play chess. We record you playing games against a lot of people. Now we train a system in the supervised fashion to learn from your examples and call it KidPlayer. Let us say that we train another system on Vishwanathan Anand’s games and call this ProPlayer. Obviously the “policy” learned by KidPlayer will be an inferior player to the policy learned by ProPlayer because of the different capabilities of the teacher.

• Reinforcement Learning: In this setting, you make an agent play Chess against someone (usually against another copy of itself) and give it a reward for every time it wins a game.

• Example- to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representing the state (usually a screenshot or a memory dump).

• In the following figure, there's a schematic representation of a deep neural network trained to play a famous Atari game.

• As input, there are one or more subsequent screenshots (this can often be enough to capture the temporal dynamics as well).

• They are processed using different layers (discussed briefly later) to produce an output that represents the policy for a specific state transition.

• After applying this policy, the game produces a feedback (as a reward-penalty), and this result is used to refine the output until it becomes stable (so the states are correctly recognized and the suggested action is always the best one) and the total reward overcomes a predefined threshold.

Q2 B) Explain how machine learning works for the following common applications:

I. Pattern Detection

II. Automatic image classification

III. Automatic sequence processing (eg. music or speech)

Ans- Pattern Detection- Pattern is everything around in this digital world. A pattern can either be seen physically or it can be observed mathematically by applying algorithms.

Example: The colours on the clothes, speech pattern etc. In computer science, a pattern is represented using vector features values.

What is Pattern Recognition ?

Pattern recognition is the process of recognizing patterns by using machine learning algorithm. Pattern recognition can be defined as the classification of data based on knowledge already gained or on statistical information extracted from patterns and/or their representation. One of the important aspects of the pattern recognition is its application potential.

Examples: Speech recognition, speaker identification, multimedia document recognition (MDR), automatic medical diagnosis.

In a typical pattern recognition application, the raw data is processed and converted into a form that is amenable for a machine to use. Pattern recognition involves classification and cluster of patterns.

In classification, an appropriate class label is assigned to a pattern based on an abstraction that is generated using a set of training patterns or domain knowledge. Classification is used in supervised learning.

Clustering generated a partition of the data which helps decision making, the specific decision making activity of interest to us. Clustering is used in an unsupervised learning.

Features may be represented as continuous, discrete or discrete binary variables. A feature is a function of one or more measurements, computed so that it quantifies some significant characteristics of the object.

Example: consider our face then eyes, ears, nose etc are features of the face.

A set of features that are taken together, forms the features vector.

Example: In the above example of face, if all the features (eyes, ears, nose etc) taken together then the sequence is feature vector([eyes, ears, nose]). Feature vector is the sequence of a features represented as a d-dimensional column vector. In case of speech, MFCC (Melfrequency Cepstral Coefficent) is the spectral features of the speech. Sequence of first 13 features forms a feature vector.

Pattern recognition possesses the following features:

• Pattern recognition system should recognise familiar pattern quickly and accurate

• Recognize and classify unfamiliar objects

• Accurately recognize shapes and objects from different angles

• Identify patterns and objects even when partly hidden

• Recognise patterns quickly with ease, and with automaticity.

2. Automatic image classification-

Image classification is the task of assigning an input image one label from a fixed set of categories. This is one of the core problems in Computer Vision that, despite its simplicity, has a large variety of practical applications.

[pic]

Challenges

1. Viewpoint variation. A single instance of an object can be oriented in many ways with respect to the camera.

2. Scale variation. Visual classes often exhibit variation in their size (size in the real world, not only in terms of their extent in the image).

3. Deformation. Many objects of interest are not rigid bodies and can be deformed in extreme ways.

4. Occlusion. The objects of interest can be occluded. Sometimes only a small portion of an object (as little as few pixels) could be visible.

5. Illumination conditions. The effects of illumination are drastic on the pixel level.

3. Automatic Sequence Processing- Automatic speech recognition (ASR) is the use of computer hardware and software-based techniques to identify and process human voice. It is used to identify the words a person has spoken or to authenticate the identity of the person speaking into the system.

Automatic speech recognition is also known as automatic voice recognition (AVR), voice-to-text or simply speech recognition.

Automatic speech recognition is primarily used to convert spoken words into computer text. Additionally, automatic speech recognition is used for authenticating users via their voice (biometric authentication) and performing an action based on the instructions defined by the human. Typically, automatic speech recognition requires preconfigured or saved voices of the primary user(s). The human needs to train the automatic speech recognition system by storing speech patterns and vocabulary of their into the system.

Q3 A) "Justify the statement: Raw data has a significant impact on feature engineering process.

Ans- Better features means flexibility.

You can choose “the wrong models” (less than optimal) and still get good results. Most models can pick up on good structure in data. The flexibility of good features will allow you to use less complex models that are faster to run, easier to understand and easier to maintain. This is very desirable.

Better features means simpler models.

With well engineered features, you can choose “the wrong parameters” (less than optimal) and still get good results, for much the same reasons. You do not need to work as hard to pick the right models and the most optimized parameters.

With good features, you are closer to the underlying problem and a representation of all the data you have available and could use to best characterize that underlying problem.

Some observations are far too voluminous in their raw state to be modeled by predictive modeling algorithms directly.

Common examples include image, audio, and textual data, but could just as easily include tabular data with millions of attributes.

Feature extraction is a process of automatically reducing the dimensionality of these types of observations into a much smaller set that can be modelled.

For tabular data, this might include projection methods like Principal Component Analysis and unsupervised clustering methods. For image data, this might include line or edge detection. Depending on the domain, image, video and audio observations lend themselves to many of the same types of DSP methods.

Key to feature extraction is that the methods are automatic (although may need to be designed and constructed from simpler methods) and solve the problem of unmanageably high dimensional data, most typically used for analog observations stored in digital formats.

Q3 B) Justify the Statement: Feature engineering can substantially boost machine learning model performance

Ans- Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.

Although your model’s performance depends on several factors — data and features prepared, model’s used in training, problem statement, metrics to measure the model’s success etc — great features still play a crucial part to determine the success of a model.

With great features, it gives more room when it comes to model’s selection. You could choose a simpler model yet still be able to obtain good results as your data is now more representative and the less complex model can learn the underlying pattern easily.

At the end of the day, feature engineering boils down to problem representation. If your data has great features that represent the problem well, chances are your model will give better results as it has learned the pattern well.

Q4 A) Justify the Statement: Feature Engineering play an important role for data scaling and normalization tasks

Ans- it is true that preprocessing in machine learning is somewhat a very black art. It is not written down in papers a lot why several preprocessing steps are essential to make it work. I am also not sure if it is understood in every case. To make things more complicated, it depends heavily on the method you use and also on the problem domain.

Some methods e.g. are affine transformation invariant. If you have a neural network and just apply an affine transformation to your data, the network does not lose or gain anything in theory. In practice, however, a neural network works best if the inputs are centered and white. That means that their covariance is diagonal and the mean is the zero vector. Why does it improve things? It is only because the optimisation of the neural net works more gracefully, since the hidden activation functions don't saturate that fast and thus do not give you near zero gradients early on in learning.

Other methods, e.g. K-Means, might give you totally different solutions depending on the preprocessing. This is because an affine transformation implies a change in the metric space: the Euclidean distance btw two samples will be different after that transformation.

At the end of the day, you want to understand what you are doing to the data. E.g. whitening in computer vision and sample wise normalization is something that the human brain does as well in its vision pipeline.

When you collect data and extract features, many times the data is collected on different scales. For example, the age of employees in a company may be between 21-70 years, the size of the house they live is 500-5000 Sq feet and their salaries may range from 30000−30000−80000. In this situation if you use a simple Euclidean metric, the age feature will not play any role because it is several order smaller than other features. However, it may contain some important information that may be useful for the task. Here, you may want to normalize the features independently to the same scale, say [0,1], so they contribute equally while computing the distance.

There are two separate issues:

a) learning the right function eg k-means: the input scale basically specifies the similarity, so the clusters found depend on the scaling. regularisation - eg l2 weights regularisation - you assume each weight should be "equally small"- if your data are not scaled "appropriately" this will not be the case

b) optimization, namely by gradient descent ( eg most neural networks). For gradient descent, you need to choose the learning rate...but a good learning rate ( at least on 1st hidden layer) depends on the input scaling : small [relevant] inputs will typically require larger weights, so you would like larger learning rate for those weight ( to get there faster), and v.v for large inputs... since you only want to use a single learning rate, you rescale your inputs. ( and whitening ie decorellating is also important for the same reason)

Q4 B) Justify the Statement: Feature selector removes all low-variance features

Ans- Feature selector that removes all low-variance features.

This feature selection algorithm looks only at the features (X), not the desired outputs (y), and can thus be used for unsupervised learning.

VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples.

As an example, suppose that we have a dataset with boolean features, and we want to remove all features that are either one or zero (on or off) in more than 80% of the samples. Boolean features are Bernoulli random variables, and the variance of such variables is given by

>>> from sklearn.feature_selection import VarianceThreshold

>>> X = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]]

>>> sel = VarianceThreshold(threshold=(.8 * (1 - .8)))

>>> sel.fit_transform(X)

array([[0, 1],

[1, 0],

[0, 0],

[1, 1],

[1, 0],

[1, 1]])

Q5 A) Explain Logistic Regression with suitable example.

Ans-

[pic]

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). 

• It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. To represent binary / categorical outcome, we use dummy variables. 

• Like all regression analyses, the logistic regression is a predictive analysis.  Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

• logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function.

• The fundamental equation of generalized linear model is: g(E(y)) = α + βx1 + γx2

• Here, g() is the link function, E(y) is the expectation of target variable and α + βx1 + γx2 is the linear predictor ( α,β,γ to be predicted).

• The role of link function is to ‘link’ the expectation of y to linear predictor.

• We are provided a sample of 1000 customers. We need to predict the probability whether a customer will buy (y) a particular magazine or not. As you can see, we’ve a categorical outcome variable, we’ll use logistic regression.

• g(y) = βo + β(Age)         ---- (a)

• considered ‘Age’ as independent variable.

• In logistic regression, we are only concerned about the probability of outcome dependent variable ( success or failure). As described above, g() is the link function. This function is established using two things: Probability of Success(p) and Probability of Failure(1-p).

• p should meet following criteria:

• It must always be positive (since p >= 0)

• It must always be less than equals to 1 (since p >> X = np.arange(-5, 5, 0.1)

>>> Y = X + np.random.uniform(-0.5, 1, size=X.shape)

Following is a plot of the dataset. As everyone can see, it can be easily modeled by a linear regressor, but without a high non-linear function, it is very difficult to capture the slight (and local) modifications in the slope:

• The class IsotonicRegression needs to know ymin and ymax (which correspond to the variables y0 and yn in the loss function). In this case, we impose -6 and 10:

from sklearn.isotonic import IsotonicRegression

>>> ir = IsotonicRegression(-6, 10)

>>> Yi = ir.fit_transform(X, Y)

• The result is provided through three instance variables:

>>> ir.X_min_

-5.0

>>> ir.X_max_

4.8999999999999648

>>> ir.f_

• The last one, (ir.f_), is an interpolating function which can be evaluated in the domain [xmin, xmax]. For example:

>>> ir.f_(2) array(1.7294334618146134)

[pic]

Q6 B) Explain the following term with Examples.

1. Correlation matrix 2. Variance Inflation Factor (VIF)

Ans-

A correlation matrix is a table showing correlation coefficients between variables. Each cell in the table shows the correlation between two variables. A correlation matrix is used to summarize data, as an input into a more advanced analysis, and as a diagnostic for advanced analyses.

Applications of a correlation matrix

There are three broad reasons for computing a correlation matrix:

To summarize a large amount of data where the goal is to see patterns. In our example above, the observable pattern is that all the variables highly correlate with each other.

To input into other analyses. For example, people commonly use correlation matrixes as inputs for exploratory factor analysis, confirmatory factor analysis, structural equation models, and linear regression when excluding missing values pairwise.

As a diagnostic when checking other analyses. For example, with linear regression a high amount of correlations suggests that the linear regression’s estimates will be unreliable.

2. Variance Inflation Factor (VIF)- A variance inflation factor(VIF) detects multicollinearity in regression analysis. Multicollinearity is when there’s correlation between predictors (i.e. independent variables) in a model; it’s presence can adversely affect your regression results. The VIF estimates how much the variance of a regression coefficient is inflated due to multicollinearity in the model.

VIFs are usually calculated by software, as part of regression analysis. You’ll see a VIF column as part of the output. VIFs are calculated by taking a predictor, and regressing it against every other predictor in the model. This gives you the R-squared values, which can then be plugged into the VIF formula. “i” is the predictor you’re looking at (e.g. x1 or x2):

[pic]

****************** THE END **************************

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download