Az754797.vo.msecnd.net



10 Minutes to Build Your First Solution for Azure Machine Learning Competition: Iris Multiclass ClassificationIntroductionThe primary goal of this tutorial competition is to help you get familiar with Cortana Intelligence Competition platform and build your first solution for Azure Machine Learning (AzureML) Competition using the well-known iris data. This competition is a tutorial competition. There is no award for this competition. The purpose of this competition is to help you get familiar with the AzureML Competition Platform and the AzureML Studio.This is a multiclass classification task. The task in this competition is to predict the type of iris plant using four attributes: sepal length, sepal width, petal length, and petal width. We randomly split the entire data into training and testing data, stratified on the target column class. The training data contains 3 classes with 30 instances each, where the three classes are: Setosa, Versicolour, and Virginica. This competition takes place on Microsoft's Azure platform using its built-in and customizable machine learning tools.Azure machine learning (AML) is a cloud-based tool that enables data scientists and big data professionals to build and operationalize predictive analytics solutions with ease. The following tutorial is a possible starting point for those who are interested in participating in this competition and learning about the diverse capabilities of AML. You can use these as a baseline and start building upon them in the rich AML studio environment by dragging and dropping an extensive set of available algorithms or your own custom R and Python scripts. Overview of the Sample Solution in this TutorialThe tutorial provides instructions on how you can build a qualified solution for this competition in less than 10 minutes based on a sample training experiment we provide. With only some clicks and drag-and-drop actions, your first entry for this competition can be submitted and you will be able to see you on the leaderboard with a reasonable score. Below are the graphs of the sample training experiment, and the predictive experiment that is built from the training experiment. Training experiment. Tutorial Competition: Iris Multiclass ClassificationPredictive experiment. Tutorial Competition: Iris Multiclass Classification [Predictive Exp.]Requirements on the Web Service API Input and Output SchemaThe input schema of the web service API HAS TO BE the same as the schema of the training data. The input data schema can be found in the data description section.The output schema of the web service API HAS TO BE as follows:Column IndexColumn NameData Type1IDNumeric2Scored LabelsStringMake sure that the order of the columns in your web service API output schema, column names and data types are the same as above. Otherwise, you will not be able to submit, or you will not get a reasonable score as you might expect.?Five Steps to Build Your First Solution in 10 MinutesHere are the five steps that you can take to build your first machine learning solution in 10 minutes for this competition. Step 1. Sign in to Azure Machine LearningStep 2. Enter the Competition Step 3. Run the Sample Training Experiment Step 4. Build and Run a Predictive Experiment, Publish and Submit Web Service API for Evaluation Step 5. View Your Ranking on Public LeaderboardFor more details of how to build the sample training experiment, and how you can build your new solution on AzureML for this competition, refer to the deep dive and instructions in?the Appendix:?How the Sample Training Experiment Is Built: a Deep DiveHow to Create a New Experiment for the CompetitionStep 1. Sign in to Azure Machine LearningStep 1.1. Open Azure Machine Learning web page using any browser. Then click "Get started now".Step 1.2. Sign in to AzureML Studio. You will be directed to the Microsoft Sign in page. If you already have a Microsoft account, you can sign in directly here. Or, if you have an Office 365 account, or any valid Windows Azure Active Directory Account, you can also sign in directly. Otherwise, click on "Sign up now" link (the link in the green box) to sign up for a Microsoft account.Note: Please remember that after you make a successful submission to the competition, the public leaderboard will show the name associated with the Microsoft account you are using to log in. If you prefer to remain anonymous, you’d better change the name associated with this email account before you submit. Step 2. Enter the CompetitionLook for this competition in gallery of AzureML. To enter the competition, click?Gallery?at the top of your studio page and you will be directed to?Cortana Analytics Gallery. Click?Competitions?and you will find Tutorial Competition: Iris Multiclass Classification in the?Machine Learning Competitions?page.192405082550000left3175020000Visit the information page of the competition.?Click?the Tutorial Competition: Iris Multiclass Classification, you will see information about this competition such as summary, description, data files, rules, prizes, leaderboard, etc.Enter the competition. Click?Enter competition?to copy the sample training experiment to your AzureML workspace.?Step 3. Run the Sample Training ExperimentAfter the sample training experiment Tutorial Competition: Iris Multiclass Classification is copied to your Azure ML workspace. Make sure that you select the South Central US region. You can name your experiment at this stage. Click the Run button at the page bottom, the sample experiment will start running. It may take around 1 minute to complete. After the training experiment finishes, right click the output of Evaluate Model module, then Visualize you will see the performance your model in both training and validation data set. In this competition we use the Overall Accuracy as the evaluation metrics to rank participants on the leaderboard.Step 4. Build and Run the Predictive Experiment, Publish, and Submit Web Service API for EvaluationCreate predictive experiment automatically. After the sample training experiment completes successfully, at the bottom of the page click SET UP WEB SERVICE, then click Predictive Web Service. As shown below, the program automatically generates a predictive experiment using the model trained in the sample training experiment to make predictions.Slightly modify the predictive experiment. The web service output schema should only have two columns: ID and Scored Labels, so you need to make an adjustment here. You can add Apply SQL Transformation to the automatically generated predictive experiment as follows to build a web service API with qualified output data schema. Please follow the detailed steps below.You can find modules on the left side of the studio. To help get these modules faster, you can input the module name in the search field?Search experiment items. Then, from the search results, drag and drop the module you need into your experiment.Replace the SQL Query Script in the module Apply SQL Transformation with the following scripts:select ID, "Scored Labels" from t1;Checklist before you proceed: □ The output portal of?Score Model?module is connected to first?input portal of?Apply SQL Transformation?module.□ The?Web Service Output?is connected to the output portal of?Apply SQL Transformation module.4.3. Deploy web service API.?RUN?the predictive experiment, and?DEPLOY WEB SERVICE?after it completes.Click?RUN?button at the bottom of the page. The predictive experiment should complete in less than 1 minute. Then, click?DEPLOY WEB SERVICE?to generate your web service API.?4.4 Submit your web service API for evaluationAfter?Deploy Web Service?is done, click?SUBMIT COMPETITION ENTRY?at the page bottom to submit and get your web service API evaluated on the testing data.You have to agree with the terms during the submission process. You also have the opportunity to provide a customized name to your submission, which might be very helpful to remind you the features and/or models you use in this solution.?Step 5. View Your Ranking on Public Leaderboard?After your web service API is successfully evaluated on the public testing data, you will see a green check mark on the left bottom corner of the page indicating that your solution has been successfully evaluated. Click?VIEW COMPETITION SUBMISSION IN GALLERY, you will be redirected to the competition page in gallery where you can see your submission history. It may take about 1 or 2 minutes before your score on the public test data can be returned from the evaluation process. After that, you will be able to see your current ranking on the public leaderboard.?AppendixHow the Sample Training Experiment Is Built: A Deep DiveThis sample training experiment illustrates some typical steps for creating an end-to-end pipeline for a machine learning task.It consists of the following steps:Ingesting and visualizing the raw dataExclude ID from feature setSplitting the data into a training set and a validation setTraining a predictive modelScoring and evaluating a trained modelIngesting and visualizing the raw dataThe training data is read from a web URL using the Reader Module in a tabular csv format in the sample training experiment.You can visualize the data by simply right clicking the output portal of the reader module, and selecting?Visualize. After the visualization windows pops up, you can select any column; the statistics and the histogram of the selected variable will then be shown in the right panel.?Exclude ID from feature setWe use a Metadata Editor to clear this variable from feature set. We do not delete this column since we still need this column in our web service API output. Splitting the data into a training set and a validation setTo train and validate a model, we split the data into training and validation sets using the?Split?module.?In this sample experiment, we use stratified sampling to split the data randomly into 60% and 40%, where the 60% of the data is output from the first output portal and used as the training data, and the remaining is output from the second output portal and used as the validation data.Training a predictive modelNow, train the model using the training data. We first add?Multiclass Logistic Regression module from the?Machine Learning->Initialize Model->Classification?menu to the experiment. In this sample training experiment, train a multiclass logistic regression model in the interest of simplicity, but you can explore other available models or create your own R models through Create R Model by following?an example in Cortana Analytics Gallery.The Train Model module found in the Machine Learning->Train menu carries out the actual training of the logistic regression model. This module takes two inputs: the left input portal takes the model specification, and the right input portal takes the training data. The Train Model module must specify the label column. Here, indicate that the class column is the target column through the column selector dialog in the Properties box.Scoring and evaluating a trained modelAfter the model is trained, you can use it to predict the training and validation data sets. The?Score Model?module in the?Machine Learning->Score?menu accomplishes this task. It takes a trained model in its left input port and the data set to be predicted in its right input port.In the scenario displayed above, the program has scored the logistic regression model against the training set in the left?Score Model?module and against the validation set in the right module. Visualizing the output data set from the right?Score Model?module adds several scoring columns to the end of the original data set. This is shown in the graph below.The scored data set can now be evaluated using the?Evaluate Model?module in the?Machine Learning->Evaluate?menu. This module takes in one, or two scored data sets as input. Right click the output portal of Evaluate Model module, and select Visualize to see the model’s performance on both training and validation data set.Evaluation, in this case, displays several metrics related to the accuracy of the model's predictions, as well as a graphical representation of the confusion matrices associated with the prediction task.The accuracy metrics and confusion matrices for the training (left) and validation (right) sets are shown in the above graph. In this competition, we use the Overall Accuracy as the performance metrics to rank participants on the leaderboard. Comparing the prediction performance of the model on both training and validation data helps determine whether the model is over-fitting on the training data. If you see that the validation performance is much worse than the training performance, it might indicate that your model is over-fitted on the training data.How to Create a New Experiment for the CompetitionIf you want to create a new training experiment, click?Save?and then?Save As?to save the sample training experiment as a new one and make further edits (feature engineering, trying different models, etc.) on it.?DO NOT?directly click?NEW?to create a new experiment since experiments created this way will not be recognized as experiments for competitions. Later on web service APIs created from such training experiment will not have the button to submit for evaluation.After the new training experiment is created, you can follow the step above to?Step 4. Run the Scoring Experiment, Publish and Submit Web Service API for Submission. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download