Build Your First App - Advanced Model Building



Build Your First App - Advanced Model BuildingOverview?This tutorial is a continuation of the Build Your First App guide. (9.0 Advanced Model Building). It is suggested you read the Build Your First App user guide before going through this tutorial.This tutorial is good for two scenarios:You are experienced with machine learning and want to create your own Knowledge Pack with customized algorithmsYou already generated a Knowledge Pack using the Dashboard widget and want to find out how you can tweak the underlying features of the Knowledge Pack even furtherPrerequisites: You should have already uploaded the Quick Start project through the DCL called ‘Activity Demo’The goal of this tutorial is to give insight into the more hidden advanced features in building a custom algorithm for a Knowledge Pack without the Dashboard. To see an even deeper look at the concepts in this guide you can find our full documentation under Help=>DocumentationThere are three main steps to building a SensiML Knowledge Pack:- Query your data- Transform the data into a feature vector- Build the model to fit on the sensor deviceJupyter Notebooks?The Analytics Studio is a tool based on jupyter notebooks. If you have not used jupyter notebooks before the following keyboard shortcuts will be useful.Execute a cell - Shift + EnterAuto-complete - Press tab at any time while typing a function/command and the Analytics Studio will give you all available optionsLoading your project?First you need to load the project you created through the Data Capture Lab. In our example it is called 'Activity Demo'In?[?]:import sys%matplotlib inlinefrom sensiml import SensiMLfrom sensiml.widgets import *dsk = SensiML()In?[?]:dsk.project ='Activity Demo'The next step is to initialize a pipeline space to work in. A pipeline includes each step that you perform on the data to build a SensiML Knowledge Pack. The work you do in the pipeline will be stored in SensiML Cloud so that you can share pipelines with team members and come back to stored work in the future. Add a pipeline to the project using the following code snippet. dsk.pipeline = "Name of your pipeline"In?[?]:dsk.pipeline = "Activity Demo Pipeline"Query your data?To select all of the data you labeled through the Data Capture Lab you need to add a query step to your pipeline.We provided a query widget to make this step easier. To load the query widget, use the command below:In?[?]:QueryWidget(dsk).create_widget()Use the query widget to enter the following parameters:- Query Name: My First Query- Segmenter: Manual- Label: Activity- Metadata: Subject- Sources: (Hold shift and select all)Once you are done click Add QueryBuilding a pipeline?Throughout this notebook we will add multiple steps to transform the data in a pipeline. Note: No work is done on the data until you execute the pipeline, i.e., dsk.pipeline.execute()The main steps of a pipeline include:-Query-Feature Engineering-Model GenerationIt is important that you add the steps in the right order. If you accidentally add them in the wrong order or want to restart, simply enter the command:dsk.pipeline.reset()Adding your Query step?Let's add the query step that you created above. Use the command below:In?[?]:dsk.pipeline.reset()dsk.pipeline.set_input_query('My First Query')Pipeline progress?To see the current steps in your pipeline you can enter the command:In?[?]:dsk.pipeline.describe()Segmentation and Feature engineering?The segmentation and feature engineering part of the pipeline involves transforming data streams into a feature vector that are used to train a model (SensiML Knowledge Pack). This is where we get into the more advanced machine learning part of the Analytics Studio. It is okay if you do not understand everything right away, we are going to walk through some examples of good features for the periodic event use case and give you the tools to explore more featuresThe features in the feature vector must be integers between 0-255. The feature vector can be any length, but in practice you will be limited by the space on the device.SensiML Core Functions?The Analytics Studio provides a way to define a pipeline for feature vector and model building. The feature vector generation part of the pipeline includes over 100 core functions that can be split into a few different types:Sensor transforms - these are applied to the data directly as it comes off the sensor, they can be smoothing functions, magnitude of sensor columns etc.Segmentation - the segmenter selects regions of interest from the streaming data. This can be an event if you are using an event detection segmenter, or simply a sliding window which buffers a segment of data and sends it to the next step.Segment transforms - operates on a segment of data, typically normalizes the data in some way such as demeaning to prepare for feature vector generation.Feature generators - Algorithms to extract relevant feature vectors from the data streams in preparation for model building.Feature transforms - Feature transforms normalize all of the features in the feature vector to between 0-255.Feature selectors - These functions remove features which do not help discriminate between different classes.The Analytics Studio allows you to string together a pipeline composed of these individual steps. The pipeline is sent to our servers where we can take advantage of optimizations to speed up the pipeline processing.Adding a basic core function?Next we're going to add one core function and explain how to work with other core functions.A core function that is often useful for normalizing data is the magnitude sensor transform. Add a Magnitude sensor transform using the command below:In?[?]:dsk.pipeline.add_transform("Magnitude", params={"input_columns": ['GyroscopeX','GyroscopeY', 'GyroscopeZ']})dsk.pipeline.describe()If you want to see specific documentation about any of the Analytics Studio commands, add a ? to the end of the commandIn?[?]:dsk.pipeline.add_transform?Exploring core functions:?The magnitude sensor transform is just one of over 100 core functions that the Analytics Studio provides. To see a list of the available core functions, use the following command:In?[?]:dsk.list_functions()To get the documentation for any of the functions, use the command:In?[?]:dsk.function_description('Magnitude')To get the function parameters, use the following command:In?[?]:dsk.function_help('Magnitude')Function snippets?The Analytics Studio also includes function snippets that will auto-generate the function parameters for you. To use a snippet, execute the following command: dsk.snippets.Transform.Magnitude()To see snippets in action, go ahead and execute the cell below:In?[?]:dsk.snippets.Transform.Magnitude()Pipeline Execution?When executing the pipeline, there will always be two results returned. Take a look at the next cell. The first variable magnitude_data will be the actual data. The second variable stats will contain information about the pipeline execution on the server.In?[?]:magnitude_data, stats = dsk.pipeline.execute()Explore the returned magnitude_data using the command below. Notice that an additional column is added to the dataframe - Magnitude_ST_0000. The subscripts refer to this being a sensor transform (ST) and being the first one added 0000. If you were to add another sensor transform, for example taking the magnitude of the accelerometer data as well, you would get another column Magnitude_ST_0001.In?[?]:magnitude_data.head()Performing Segmentation?The next step is to segment our data into windows which we can perform recognition on. For periodic events we want to use the Windowing Transform. Go ahead and look at the function description. Delta is the sliding window overlap. Setting delta to the same value as the window size means that there is no overlap in our segmented windows.In?[?]:dsk.pipeline.add_transform("Windowing", params={"window_size": 300, "delta": 300,})dsk.pipeline.describe(show_params=True)Different window sizes can lead to better models. For this project lets reduce the window_size and delta to 200. The actual time that the window size represents for this data set it corresponds to 2 seconds, as our data was recorded at 100HZ. Go ahead and change the values in the Windowing Segmenter and re-execute. You will see the parameters change for the windowing segmenter change, but a new step shouldn't be added.In?[?]:dsk.pipeline.add_transform("Windowing", params={"window_size": 200, "delta": 200,})dsk.pipeline.describe(show_params=True)Segmentation Filters?It is often good practice to pair a windowing segmentation algorithm with a filter. This will add a step in the pipeline that will drop segments, saving battery life on the device by ignoring segments that don't contain useful information. For this pipeline we want to use the MSE Filter transform (Mean Squared Error). Go ahead and add this step to the pipeline.In?[?]:dsk.pipeline.add_transform("MSE Filter", params={"input_column": 'Magnitude_ST_0000', "MSE_target": -1.0, "MSE_threshold": 0.01})After adding the MSE filter execute the pipeline.In?[?]:mse_data, stats = dsk.pipeline.execute()Feature Vector Generation?At this point we are ready to generate a feature vector from our segments. Feature generators are algorithms to extract relevant feature vectors from the data streams in preparation for model building. They can be simple features such as mean up to more complex features such as the fourier transform.Feature generators are all added into a single step and run in parallel against the same input data. Let's add two feature generators now:In?[?]:dsk.pipeline.add_feature_generator(["Mean", 'Standard Deviation'], function_defaults = {"columns":[u'Magnitude_ST_0000']})We have added two feature generators from the subtype "Statistical". The more features, the better chance you have of building a successful model. Let's try adding a few more feature generators of the same subtype. Call dsk.list_functions() and you can find more feature generators of the same typeIn?[?]:dsk.pipeline.add_feature_generator(["Mean", 'Standard Deviation', 'Sum', '25th Percentile'], function_defaults = {"columns":[u'Magnitude_ST_0000']})Our classifiers are optimized for performance and memory usage to fit on resource constrained devices. Because of this we scale the features in the feature vector to be a single byte each so we need to add the Min Max Scale transform to the pipeline. This function will scale the features in the feature vector to have values between 0 and 255.In?[?]:dsk.pipeline.add_transform('Min Max Scale')In?[?]:feature_vectors, stats = dsk.pipeline.execute()Next let's take a look at the feature vectors that you have generated. We plot of the average of all feature vectors grouped by Activity. Ideally, you are looking for feature vectors that are separable in space. How do the ones you've generated look?In?[?]:dsk.pipeline.visualize_features(feature_vectors)Model Building - Creating a model to put onto a device?Model TVO description?train_validate_optimze (tvo) : This step defines the model validation, the classifier and the training algorithm to build the model with. With SensiML, the the model is first trained using the selected training algorithm, then loaded into the hardware simulator (currently we only support pattern matching, but more algorithms will be added in the future) and tested using the specified validation method.This pipeline uses the validation method "Stratified K-Fold Cross-Validation". This is a standard validation method used to test the performance of a model by splitting the data into k folds, training on k-1 folds and testing against the excluded fold. Then it switches which fold is tested on, and repeats until all of the folds have been used as a test set. The average of the metrics for each model provide you with a good estimate of how a model trained on the full data set will perform.The training algorithm attempts to optimize the number of neurons and their locations in order to create the best model. We are using the training algorithm "Hierarchical Clustering with Neuron Optimization," which uses a clustering algorithm to optimize neurons placement in feature space.The current available classifier in the Analytics Studio is PME. PME has two classification modes, RBF and KNN and two distance modes of calculation, L1 and LSUP. You can see the documentation for further descriptions of the classifier.In?[?]:dsk.pipeline.set_validation_method('Stratified K-Fold Cross-Validation', params={'number_of_folds':3,})dsk.pipeline.set_classifier('PME', params={"classification_mode":'RBF','distance_mode':'L1'})dsk.pipeline.set_training_algorithm('Hierarchical Clustering with Neuron Optimization', params = {'number_of_neurons':5})dsk.pipeline.set_tvo({'validation_seed':2})Go ahead and execute the full pipeline now.In?[?]:model_results, stats = dsk.pipeline.execute()The model_results object returned after a TVO step contains a wealth of information about the models that were generated and their performance. A simple view is to use the summarize function to see the performance of our model.In?[?]:model_results.summarize()Let's grab the fold with the best performing model to compare with our features.In?[?]:model = model_results.configurations[0].models[0]The neurons are contained in model.neurons. Plot these over the feature_vector plot that you created earlier. This step is often useful for debugging.In?[?]:import pandas as pddsk.pipeline.visualize_neuron_array(model, model_results.feature_vectors, pd.DataFrame(model.knowledgepack.feature_summary).Feature.values[-1], pd.DataFrame(model.knowledgepack.feature_summary).Feature.values[0:])go ahead and save the best model as a SensiML Knowledge Pack. Models that aren't saved will be lost when the cache is emptied.In?[?]:model.knowledgepack.save('MyFirstModel_KP')The Power of SensiML Knowledge Packs?The most important objective of the Analytics Studio is to allow users to instantly turn their models into downloadable Knowledge Packs that can be flashed to devices to perform the classification tasks.Generate Knowledge Pack?Let's generate our Knowledge Pack. We have saved the Knowledge Pack with the name MyFirstModel_KP. Select it in the widget below. Then select your target platform.In?[?]:DownloadWidget(dsk).create_widget()Make sure to generate your Knowledge Pack with the same sample rate that you recorded your raw sensor data with or else you may get unexpected results. For our Activity Demo this should be 100Set the following properties:- HW Platform: Nordic Thingy 2.1- Target OS: NordicSDK- Format: Binary- Sample Rate: 100- Debug: False- Test Data: None- Output: BLETo find out more about these properties check out the Build Your First App guide (Generating a Knowledge Pack)Flashing a Knowledge Pack (Nordic Thingy)?The FlashWidget makes flashing a Knowledge Pack to your device quick and easy, but it requires some one-time preparation steps detailed in the Flashing Firmware to Nordic Thingy guideTo see step by step instructions for flashing a Knowledge Pack see the user guide Flashing Firmware to Nordic ThingyIn?[?]:FlashWidget(dsk).create_widget()Flashing a Knowledge Pack (QuickAI)?Flashing a Knowledge Pack to a QuickAI device requires the QuickAI Flash Utility which can be found at see step by step instructions for flashing a Knowledge Pack see the user guide QuickAI SensiML Quick Start GuideFinal Steps: View your classification output on the Nordic Thingy?Now that you've flashed your Knowledge Pack to a device let's check out the results! The easiest way to see the live event classification results of a Knowledge Pack running on your sensor is through the SensiML TestApp (PC or Android). Open the TestApp and connect to your device to see the output.You can also connect your device to your PC and view the output over serial connection. See the Flashing Firmware to Nordic Thingy guide for serial options.In?[?]: ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download