Literature review



IRootLab - A MATLAB toolbox for vibrational spectroscopyManualUpdated on 21/Jul/2017This document is licensed under a?Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.Table of contents TOC \o "1-3" \h \z \u 1.Introduction PAGEREF _Toc488428317 \h 21.1.Overview PAGEREF _Toc488428318 \h 21.2.This manual PAGEREF _Toc488428319 \h 21.3.Conventions PAGEREF _Toc488428320 \h 31.3.1.Formatting styles PAGEREF _Toc488428321 \h 31.3.2.Terminology and abbreviations PAGEREF _Toc488428322 \h 31.3.3.Citing IRootLab PAGEREF _Toc488428323 \h 32.Setup PAGEREF _Toc488428324 \h 42.1.Installation PAGEREF _Toc488428325 \h 42.2.Starting IRootLab PAGEREF _Toc488428326 \h 42.2.1.Windows only PAGEREF _Toc488428327 \h 42.2.2.All platforms PAGEREF _Toc488428328 \h 42.2.3.System requirements PAGEREF _Toc488428329 \h 42.2.4.Platform-specific binaries PAGEREF _Toc488428330 \h 42.3.Learning resources PAGEREF _Toc488428331 \h 43.Quick start tutorial PAGEREF _Toc488428332 \h 53.1.Open and visualize data (Figure 2) PAGEREF _Toc488428333 \h 53.2.Simple analysis – differences between mean spectra (Figure 3) PAGEREF _Toc488428334 \h 53.3.Principal components analysis (PCA) (Figure 4) PAGEREF _Toc488428335 \h 64.Graphical user interfaces (GUIs) PAGEREF _Toc488428336 \h 74.1.1.Objtool PAGEREF _Toc488428337 \h 74.1.2.MATLAB Code generation with objtool PAGEREF _Toc488428338 \h 94.1.3.Supported dataset file types within objtool PAGEREF _Toc488428339 \h 104.2.Mergetool PAGEREF _Toc488428340 \h 115.Class library PAGEREF _Toc488428341 \h 136.Various topics PAGEREF _Toc488428342 \h 156.1.IRootLab setup file PAGEREF _Toc488428343 \h 156.2.MATLAB command sheet PAGEREF _Toc488428344 \h 156.3.Dataset from existing MATLAB variables PAGEREF _Toc488428345 \h 157.References PAGEREF _Toc488428346 \h 16IntroductionOverviewIRootLabADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1093/bioinformatics/btt084", "abstract" : "SUMMARY: IRootLab is a free and open-source MATLAB toolbox for vibrational biospectroscopy (VBS) data analysis. It offers an object-oriented programming class library, graphical user interfaces (GUIs) and automatic MATLAB code generation. The class library contains a large number of methods, concepts and visualizations for VBS data analysis, some of which are introduced in the toolbox. The GUIs provide an interface to the class library, including a module to merge several spectral files into a dataset. Automatic code allows developers to quickly write VBS data analysis scripts and is a unique resource among tools for VBS. Documentation includes a manual, tutorials, Doxygen-generated reference and a demonstration showcase. IRootLab can handle some of the most popular file formats used in VBS.License: GNU-LGPL. AVAILABILITY: Official website: . CONTACT: juliotrevisan@ SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.", "author" : [ { "family" : "Trevisan", "given" : "J\u00falio" }, { "family" : "Angelov", "given" : "Plamen P" }, { "family" : "Scott", "given" : "Andrew D" }, { "family" : "Carmichael", "given" : "Paul L" }, { "family" : "Martin", "given" : "Francis L" } ], "container-title" : "Bioinformatics (Oxford, England)", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2013", "3", "13" ] ] }, "page" : "1-2", "title" : "IRootLab: a free and open-source MATLAB toolbox for vibrational biospectroscopy data analysis.", "type" : "article-journal" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[1]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[1] is a framework for vibrational spectroscopy data analysis in MATLAB. It provides pattern recognition, biomarker extraction, imaging, pre-processing, feature extraction, and other methods, directed to vibrational spectroscopy (Fourier-Transform InfraRed (FTIR) and Raman).The framework includes a class function library, and Graphical User Interfaces (GUIs) to import and analyse data (objtool, mergetool, sheload). The objtool GUI can be also used as a MATLAB code generator. A demonstration page can be opened by typing browse_demos.IRootLab is Free/Libre and Open-Source software, released under the LGPL licence.Official website: contains over 200 hierarchically organized object classes representing concepts, methods and algorithms. The two most important class branches are datasets and blocks. Data analyses are built by training/using blocks with datasets. Block is a very general concept encompassing pre-processing methods, classifiers, visualizations, reports, complex analysis sessions etc. REF _Ref341888646 \h Figure 1 illustrates different types of blocks with code examples using the IRootLab library.Datasets and blocksDatasetBlockModifieddatasetDatasetVisualizationBlockFigureTraining datasetBlock1- Training stageTrainedblockDataset(same or other)Trainedblock2- Use stageModifieddatasetBlocks that don’t require trainingVisualization blocksBlocks that require trainingCode example% Rubberband baseline correctionblock = pre_bc_rubber();ds02 = block.use(ds01);% Class means visualizationblock = vis_means();figure;block.use(ds01);% PCA-LDAblock = cascade_pcalda();block.blocks{1}.no_factors = 10; % PCA factorsblock = block.boot();block = block.train(ds01);ds02 = block.use(ds01);Code exampleCode exampleFigure 1 – Illustration of different types of blocks with code examples using the IRootLab library.This manualThis manual explains the structure of IRootLab and discusses some of the most important details about the toolbox. Basic MATLAB knowledge is expected from the reader (MATLAB current directory, workspace, how to specify a MATLAB vector). Aspects of object-oriented programming (OOP) were concentrated in Section REF _Ref341903403 \r \h 5 and avoided elsewhere to spare readers who are mostly interested in using the GUIs.The “official” IRootLab documentation remains the source code itself, and the Doxygen-generated documentation from there (). However, the official documentation is for reference, rather than for learning. This manual provides selected, concentrated and relevant information.ConventionsFormatting styles The following styles are used throughout this manual:name of a fileMATLAB codeMATLAB codeSome option available in a graphical interfaceNoteTerminology and abbreviationsclassThe word “class” may be used meaning eitherdata class (pattern classification), such as “control”, “cancer” etc, orobject class (object-oriented programming).The meaning should be clear from the context.group= sample/patient/colony/specimen etcA “group” represents all spectra that must be kept together at all times if the dataset is split, because they all belong to the same sample/patient/colony/specimen etc.observation= rowRefers to a row in a dataset matrix. May be a spectrum, but can be also an average spectrum, a set of PCA scores etcfeature= data variableRefers to a column in a dataset matrix. May be a wavenumber, a PCA factor etcno“number of observations” = number of rows in a dataset matrixnf“number of features” = number of columns in a dataset matrixASAnalysis SessionSGSSub-dataset Generation SpecificationFSGFeature Subset GraderCiting IRootLabIf IRootLab works well for you and you use it for your respective study and publication, please cite the following: (please mention the website in your manuscript)ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1093/bioinformatics/btt084", "abstract" : "SUMMARY: IRootLab is a free and open-source MATLAB toolbox for vibrational biospectroscopy (VBS) data analysis. It offers an object-oriented programming class library, graphical user interfaces (GUIs) and automatic MATLAB code generation. The class library contains a large number of methods, concepts and visualizations for VBS data analysis, some of which are introduced in the toolbox. The GUIs provide an interface to the class library, including a module to merge several spectral files into a dataset. Automatic code allows developers to quickly write VBS data analysis scripts and is a unique resource among tools for VBS. Documentation includes a manual, tutorials, Doxygen-generated reference and a demonstration showcase. IRootLab can handle some of the most popular file formats used in VBS.License: GNU-LGPL. AVAILABILITY: Official website: . CONTACT: juliotrevisan@ SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.", "author" : [ { "family" : "Trevisan", "given" : "J\u00falio" }, { "family" : "Angelov", "given" : "Plamen P" }, { "family" : "Scott", "given" : "Andrew D" }, { "family" : "Carmichael", "given" : "Paul L" }, { "family" : "Martin", "given" : "Francis L" } ], "container-title" : "Bioinformatics (Oxford, England)", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2013", "3", "13" ] ] }, "page" : "1-2", "title" : "IRootLab: a free and open-source MATLAB toolbox for vibrational biospectroscopy data analysis.", "type" : "article-journal" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[1]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[1] Trevisan, J., Angelov, P.P., Scott, A.D., Carmichael, P.L. & Martin, F.L. (2013) "IRootLab: a free and open-source MATLAB toolbox for vibrational biospectroscopy data analysis". Bioinformatics 29(8), 1095-1097. doi: 10.1093/bioinformatics/btt084.ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1038/nprot.2010.133", "abstract" : "Infrared (IR) spectroscopy of intact cells results in a fingerprint of their biochemistry in the form of an IR spectrum; this has given rise to the new field of biospectroscopy. This protocol describes sample preparation (a tissue section or cytology specimen), the application of IR spectroscopy tools, and computational analysis. Experimental considerations include optimization of specimen preparation, objective acquisition of a sufficient number of spectra, linking of the derived spectra with tissue architecture or cell type, and computational analysis. The preparation of multiple specimens (up to 50) takes 8 h; the interrogation of a tissue section can take up to 6 h (\u223c100 spectra); and cytology analysis (n = 50, 10 spectra per specimen) takes 14 h. IR spectroscopy generates complex data sets and analyses are best when initially based on a multivariate approach (principal component analysis with or without linear discriminant analysis). This results in the identification of class clustering as well as class-specific chemical entities.", "author" : [ { "family" : "Martin", "given" : "Francis L" }, { "family" : "Kelly", "given" : "Jemma G" }, { "family" : "Llabjani", "given" : "Valon" }, { "family" : "Martin-Hirsch", "given" : "Pierre L" }, { "family" : "Patel", "given" : "Imran I" }, { "family" : "Trevisan", "given" : "J\u00falio" }, { "family" : "Fullwood", "given" : "Nigel J" }, { "family" : "Walsh", "given" : "Michael J" } ], "container-title" : "Nat. Prot.", "id" : "ITEM-1", "issue" : "11", "issued" : { "date-parts" : [ [ "2010", "1" ] ] }, "page" : "1748-1760", "title" : "Distinguishing cell types or populations based on the computational analysis of their infrared spectra", "type" : "article-journal", "volume" : "5" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[2]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[2] Martin, F.L., Kelly, J.G., Llabjani, V., Martin-Hirsch, P.L., Patel, I.I., Trevisan, J., Fullwood, N.J. & Walsh, M.J. (2010) "Distinguishing cell types or populations based on the computational analysis of their infrared spectra". Nature Protocols 5(11), 1748-1760. doi:10.1038/nprot.2010.133SetupThis chapter gives information to get IRootLab running.InstallationDownload the most recent ZIP file from? extract the file to a directory of your choice.Starting IRootLabWindows onlyOpen the Windows ExplorerIf you are starting a new project, create a directory for your project e.g., […]/My Documents/brain_projectLocate the directory created when the ZIP file was extractedDouble-click on the file named startup_windowsChange MATLAB Current folder to your own project directory.All platformsIf you are starting a new project, open Windows Explorer and create a directory for your project e.g., […]/My Documents/brain_projectStart MATLABChange MATLAB Current folder to the directory containing the IRootLab library extracted from the ZIP file (this directory should contain a file called startup.m)In MATLAB command line, enter startupYou should see a list of directories being added to the path, then a welcome message.Change MATLAB Current folder to your own project directory.NoteAlternatively, you can use MATLAB "Set path..." (add with subdirectories). This will keep you from having to run startup every time you start MATLAB. However, make sure you remove the old directories from the MATLAB path when you download a new version of IRootLab.System requirementsIRootLab will run under MATLAB, which is ported for Windows, Linux, and MacOS. The oldest MATLAB version tested was r2007b.MATLAB toolboxes:(optional) MATLAB Parallel Computing Toolbox (PCT)MATLAB Wavelet Toolbox for Wavelet de-noisingPlatform-specific binariesThis information is only relevant if you intend to use SVM or the MySQL database.SVM classifier (LibSVM): LibSVM was successfully compiled for Windows 32-bit/Windows 64-bit; Linux 32-bit/64-bit.MySQL connector (mYm): mYm was currently compiled Windows 32-bit; Linux 32-bit/64-bit. Linux 64-bit: libmysqlclient.so.16 and libmysqlclient.so.18.Learning resourcesTutorials and this manual: . There are currently about 8 step-by-step analysis tutorials on-line covering various topics.Demo files: type browse_demosOn-line documentation (reference): Access to reference from MATLAB command line: help2 <filename>Context-sensitive help from IRootLab GUIs: press F1Quick start tutorialThis simple tutorial aims to get you started with using objtool, which is the main GUI. It will use Ketan’s Brain ATR dataADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1039/c2ay25544h", "author" : [ { "family" : "Gajjar", "given" : "Ketan" }, { "family" : "Heppenstall", "given" : "Lara D." }, { "family" : "Pang", "given" : "Weiyi" }, { "family" : "Ashton", "given" : "Katherine M" }, { "family" : "Trevisan", "given" : "J\u00falio" }, { "family" : "Patel", "given" : "Imran I" }, { "family" : "Llabjani", "given" : "Valon" }, { "family" : "Stringfellow", "given" : "Helen F" }, { "family" : "Martin-Hirsch", "given" : "Pierre L" }, { "family" : "Dawson", "given" : "Timothy" }, { "family" : "Martin", "given" : "Francis L" } ], "container-title" : "Analytical Methods", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2013" ] ] }, "page" : "89-102", "title" : "Diagnostic segregation of human brain tumours using Fourier-transform infrared and/or Raman spectroscopy coupled with discriminant analysis", "type" : "article-journal", "volume" : "5" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[3]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[3], which is included in the toolbox.Open and visualize data ( REF _Ref341880286 \h Figure 2)Run the steps on Section 2.2 to start IRootLab At MATLAB command line, enter browse_demosScroll down the pageClick on the “LOAD_DATA_KETAN_ATR” hyperlinkClick on the “objtool” hyperlink to launch objtoolClick on Apply new blocks/more actionsClick on visClick on Class meansClick on Create, train & use456789Figure 2 – Tutorial – Open and visualize data in objtoolSimple analysis – differences between mean spectra ( REF _Ref341881747 \h Figure 3)At this part, we will compare the mean spectra of the “Glioblastoma” and “Astrocytoma” classes against the “Normal” class. The latter is used as a reference class to investigate changes in infrared absorption in the two former.Click on preClick on Subtract mean of a reference classClick on Create, train & useA parameters window appears; click on OK (“1” is already the first class of the dataset; “Normal”)Click on ds01_refmean01Click on visClick on Class meansClick on Create, train & use10121115161714Figure 3 – Tutorial – Differences between mean spectraPrincipal components analysis (PCA) ( REF _Ref341881739 \h Figure 4)This part of the analysis investigates whether PCA can segregate the data classes.Click on ds01Click on fconClick on Principal component analysisClick on Create, train & useA parameters window appears; click on OK to accept the default optionsClick on ds01_pca01Click on visClick on 2D ScatterplotClick on Create, train & useOn the window that appears, enter [1, 2, 3] as the Indexes of variables to plot. Click on OK192120182425262327Figure 4 – Tutorial – PCAGraphical user interfaces (GUIs)IRootLab has four GUIs callable from MATLAB command line ( REF _Ref341887667 \h Table 1). Table SEQ Table \* ARABIC 1 – IRootLab GUIs.GUI namePurposeobjtoolThis is the most important GUI. Used for data analysis, general manipulation of objects, and code generationmergetoolTool to assemble dataset from a multiple spectral files containing one spectrum eachirootlabLaunch other GUIs, get help, and view auto-generated MATLAB codesheloadLoads dataset from online databaseADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1039/c0an00586j", "abstract" : "The Syrian hamster embryo (SHE) assay (pH 6.7) is an in vitro candidate to replace in vivo carcinogenicity tests. However, the conventional method of visual scoring of foci (non-transformed vs. transformed colonies) can be time-consuming and is open to subjectivity. Infrared (IR) spectroscopy has the potential to provide objective assessment of such SHE colonies with the added advantage of potentially providing mechanistic information. In this study, SHE cells were treated with one of eight different chemical regimens, allowed in culture to attach and form foci on IR-reflective glass slides; these were subsequently interrogated by attenuated total reflection (ATR) Fourier-transform IR (FTIR) spectroscopy. Derived mid-IR spectra (n = 13,406) were subjected to chemometric analysis focusing primarily on the extraction of biochemical information related to test agent treatment and/or morphological transformation. The use of ATR-FTIR spectroscopy with chemometrics to analyze the SHE assay is a novel approach to toxicological assessment.", "author" : [ { "family" : "Trevisan", "given" : "J\u00falio" }, { "family" : "Angelov", "given" : "Plamen P" }, { "family" : "Patel", "given" : "Imran I" }, { "family" : "Najand", "given" : "Ghazal M" }, { "family" : "Cheung", "given" : "Karen T" }, { "family" : "Llabjani", "given" : "Valon" }, { "family" : "Pollock", "given" : "Hubert M" }, { "family" : "Bruce", "given" : "Shannon W" }, { "family" : "Pant", "given" : "Kamala" }, { "family" : "Carmichael", "given" : "Paul L" }, { "family" : "Scott", "given" : "Andrew D" }, { "family" : "Martin", "given" : "Francis L" } ], "container-title" : "Analyst", "id" : "ITEM-1", "issue" : "12", "issued" : { "date-parts" : [ [ "2010", "12" ] ] }, "page" : "3266-3272", "title" : "Syrian hamster embryo (SHE) assay (pH 6.7) coupled with infrared spectroscopy and chemometrics towards toxicological assessment", "type" : "article-journal", "volume" : "135" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[4]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[4]The GUIs read and write variables directly into the MATLAB workspace.ObjtoolThe main objtool “missions” are:Create objects or load datasetsBoot, train and use blocksExecute objects “more actions”Automatically generate MATLAB code from any of the aboveThe look of objtool and its visible options will vary depend on selected items and toggled buttons. The most notorious change is the colour of the window background, which reflects which class is selected in the left panel.The objects listed in objtool are variables present in MATLAB workspace, created by objtool itself, by mergetool, by sheload or by other means (e.g., user scripts). Every processing step leaves existing objects untouched and creates new ones, adding to the new variable name a suffix that reminds of the operation that has just been carried out. For example, in REF _Ref341889702 \h Figure 5 ds01_std01 is the result of the standardization of dataset ds01; then, ds01_std01_lda01 is the result of applying LDA to ds01_std01. objtool is divided in three areas, as seen in REF _Ref341889702 \h Figure 5.Left panel/Classes panelMiddle panel/Objects panelRight panelFigure 5 – Screenshot of objtoolLeft panelThe left panel is a selector for what to show in the middle panel. It also shows how many object of each class exists in the MATLAB workspace. A trick helps identify objects that were just created by the last operation: when a number is surrounded by asterisks (“**”), it means that new objects of that class have just appeared as a result of the last user operation.Middle panelThis panel shows the existing objects of the class selected on the left panel. The buttons at the top vary slightly depending on the selected class: for irdata, the Load… and Save as… buttons appear, whereas for the other classes, the New… button appears instead.Right panelsThere are three different panels on the right. The panel to appear is selected by one of three toggle buttons on the top right.The first of the right panels is the Apply new blocks/more actions panel ( REF _Ref342171247 \h Figure 6). The Applicable blocks list shows only the classes that accept the object(s) that are selected in the middle panel as input. There are three options when creating a block in this way: Create only; Create & train; and Create, train & use. The More actions list is dependent on the selected object in the middle panel. “More actions” are simple actions available to some objects only (mainly to extract some information from an object).The Existing blocks panel ( REF _Ref342171261 \h Figure 7) shows all blocks already in the workspace that are applicable to the object(s) that is (are) selected in the middle panel. The Train and Use operations will pass whichever is selected in the middle panel as input to the block.Finally, the Object properties panel shows MATLAB-generated description for the selected object. REF _Ref342171287 \h Figure 8 shows this panel when a dataset is selected in the middle panel. Some objects have associated additional information. For example, a cascade block (class block_cascade_base) will show the descriptions of its component blocks, apart from its own description.ABFigure 6 – (A) Apply new blocks/more actions panel when the Dataset class is selected in the left panel; (B) Apply new blocks/more actions panel when a class other than a dataset is selected in the left panel. The main difference is set of buttons that appear only for the Dataset class.Figure 7 – Existing blocks panel in objtoolNumber of observations/(data rows)Number of features/(data columns)Number of groups (e.g., patients)Number of data classesData matrix, dimension no×nfClasses, column vector of size noClass labelsGroup codes, (e.g., patient names)Dataset propertiesFigure 8 – Object properties panel showing the properties of a dataset. Some properties are described. irdata is circled, which is the object class name for a dataset. All properties are explained in the reference, which is accessible by typing help2 irdata at MATLAB command window (or directly at the source file by typing edit irdata.m). The same applies to other object classes.MATLAB Code generation with objtoolA major feature of objtool is its ability to generate MATLAB code. Every analysis step done in objtool is recorded as MATLAB code into a file named irr_macro_[nnnn].m, which can be opened by entering ircode_edit at MATLAB command window.This resource was created to help with script/function/class writing, and it is highly efficient in doing so. For example, the demo files were almost completely created in this way, and a substantial part of IRootLab itself was automatically coded by objtool.Code generation can easily help users to repeat the same analysis sequence on a different dataset, or batch-process several datasets.Supported dataset file types within objtool REF _Ref341904254 \h Table 2 shows the file types currently supported in objtool. Directory demo/sampledata has some examples of data files of all types.Table 2 - file types supported in objtool.File typeRead?Write?CommentRelated OOP-class fileTXT (basic)YesYesFigure 9dataio_txt_basic.mTXT ("Pirouette-like")YesYesFigure 9dataio_txt_pir.mTXT (IRootLab)YesYesFigure 10; Only TXT format that saves all dataset propertiesdataio_txt_iroot.mTXT (IrootLab with labelled classes)YesYesFigure 11; Fills the “classes” column with the class labels, not numbersdataio_txt_iroot2.mMATYesYesMATLAB compressed binary: short file size; opens fastdataio_txt_mat.mOPUS image mapsYesNoFPA images only, not point mapsdataio_opus_nasse.mTXT (LibSVM format)NoYesLIBdataio_opus_libsvm.mThe TXT (IRootLab) format is the only text format that saves all the properties of a dataset. There are two variants of the TXT (IRootLab) format. The first variant ( REF _Ref359339014 \h Figure 10) saves the classes as integer numeric values starting at zero, and has a “classlabels” row in the file containing the labels respective to each number; the second variant ( REF _Ref363640428 \h Figure 11; named IRootLab with labelled classes when you “Save as…” in objtool) does not have the “classlabels” row, and fills in the “classes” column directly with the labels instead. We recommend the second variant, as it is easier to edit and avoid confusion. You can choose between these variants when you “Save as…” in objtool.The TXT (basic) can only store spectra and their associated classes.The TXT (Pirouette-like) stores the wavenumbers, sample codes, spectra, and classes.Figure 9 - Basic and Pirouette-like text formatsFigure 10 - IRootLab text format (first variant)Figure 11 – IRootLab with labelled classes text format (second variant)MergetoolFigure 12 – Screenshot of mergetool.mergetool is a tool to merge several spectral files into a new dataset. The new dataset is created in the MATLAB workspace and can be immediately viewed in objtool. REF _Ref488428349 \h Table 3 shows the spectral file types currently supported in mergetool. A spectral file type auto-detection button is available.Table 3 - Spectral file types supported in mergetool.Spectral File typePirouette .DATOPUT binaryWire TXT (Renishaw)Wire (2016) TXT(Renishaw)Diane's "Another FTIR" (2017) TXTBWSpec CSVUsage is nearly straightforward. Only the meaning of the Group code trimming dot (right-to-left) parameter may be slightly obscure. This parameter is a way to tell what part of the file name will be considered to be the code for a data group (i.e., sample/patient etc). It is common for many spectra be taken from the same sample/group. The spectral files are often sequentially named by the spectrum acquisition software (e.g., OPUS (Bruker Optik GmbH) creates files such as sample.0, sample.1, sample.2 etc). Within mergetool, the dots (“.”) in the file names are considered to be “dividers”. So, the Group code trimming dot (right-to-left) specifies how many dots to skip (from right to left) to compose the group code; the group code will start at the beginning of the file name and stop just before the specified dot ( REF _Ref359339144 \h Figure 12). The Perform checks button gives a report that, among other things, gives a preview of what a group code will be before the files are imported.For image maps, the Image height needs to be specified. Naturally, this number must be able to divide the total number of spectral files (such quotient will be the image width). The image map is mounted as follows. The first spectrum in the directory (sorted by name) will be the bottom left pixel of the image ( REF _Ref341895445 \h Figure 13).Image mapFirst file in directory(sorted by name)Second file. . .VerticallyHorizontallyFirst file in directory(sorted by name)Second file. . .Figure 13 – mergetool file reading sequence to compose image map, depending on how the Pixel mapping option is set.Class libraryIRootLab contains more than 200 object classes hierarchically organized. The classes are data models and implementations of analysis methods, concepts and algorithms. Parts, concepts or techniques that are common to many methods have been implemented separately (e.g., sgs, fsg). This provides flexibility and avoids code repetition. The most important class branches, with their corresponding descriptions, are shown in REF _Ref341887667 \h Table 1. A full listing can be obtained by running the demo file demo_classes_html.Table SEQ Table \* ARABIC 4 – Some important branches and leaves from the IRootLab class tree.Class (.m file name)Description???????Dataset (irdata)Datasets for analysis can be point spectra or image maps. The only difference between these is that image maps have the height property set to a value > 0.Classifiers output a special kind called an “estimation dataset” (class estimato). These datasets have their classes unset and need to be processed by a decider.A final special dataset is generated by hierarchical clustering. In this case, the variables represent cluster numbers (class irdata_clus).???????Block (block)This is the base class for all processing operations.A block is an object containing the boot(), train(), and use() methods, and a few more properties to define how the block class will be handled by objtool.??????????Miscellanea (blmisc)Miscellanea blocks lack a common identity; they have been grouped for convenience to display grouped in objtool.??????????Visualization (vis)Figure and report (HTML) generation??????????Cascade block (block_cascade_base)Cascade blocks encapsulate a sequence of blocks.Cascade blocks can mimic the behaviour of linear transformation blocks (fcon_linear; such as PCA) if it contains one or more such blocks. It has all the properties that a fcon_linear block has, however its valid functioning will depend on the component blocks. The loadings matrix is calculated by multiplying the loadings matrix of successive component blocks.When a cascade block is about created in the GUI, it will call the parameters GUIs of its component blocks in sequence.Customized/frequent sequences can be created by inheriting the block_cascade_base class. ??????????Decider (decider)This is an abstraction of class decisions based on the posterior probabilities calculated by a classifier. Classifiers generate an estimato dataset which is later processed by a decider. If the highest per-class posterior probability is below the decider decisionthreshold property, it will “refuse to decide”, assigning to the data row a class of -1 instead of a valid one.??????????Classifier (clssr)A classifier is a block whose use() method outputs an estimato dataset.?????????????Ensemble (aggr)Various classifier ensemble architecture?????????????Incremental (clssr_incr)Classifiers capable of incremental learning. Such classifiers train0 method uses one data row at a time to modify the classifier internal structure.????????????????Fuzzy Rule-Based Model (frbm)Contains a set of parameters that allow various fuzzy classifier to be set up, including eClass0 and eClass1ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1109/TFUZZ.2008.925904", "author" : [ { "family" : "Angelov", "given" : "P P" }, { "family" : "Zhou", "given" : "Xiaowei" } ], "container-title" : "IEEE T. Fuzzy Syst.", "id" : "ITEM-1", "issue" : "6", "issued" : { "date-parts" : [ [ "2008", "12" ] ] }, "page" : "1462-1475", "title" : "Evolving Fuzzy-Rule-Based Classifiers From Data Streams", "type" : "article-journal", "volume" : "16" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[5]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[5]??????????Feature Extraction (fext)IRootLab uses Guyon et al.’s division of feature extraction between feature selection and feature construction methodsADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "author" : [ { "family" : "Guyon", "given" : "Isabelle" }, { "family" : "Gunn", "given" : "Steve" }, { "family" : "Nikravesh", "given" : "Massoud" }, { "family" : "Zadeh", "given" : "L A" } ], "container-title" : "Soft Computing", "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2006" ] ] }, "note" : "I don't understand in page 105 what she means, but I may get back to it later. Anyway this does not seem to be a wrapper method.", "publisher" : "Springer", "publisher-place" : "New York", "title" : "Feature Extraction - Foundations and Applications", "type" : "book" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[6]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[6].?????????????Feature Construction (fcon)Methods that combine variables (linearly or non-linearly) into new ones????????????????Hierarchical Clustering (clus_hca)Clustering was placed under feature construction because it creates a new dataset whose features represent cluster numbers.????????????????Linear Transformation (fcon_linear)Blocks with loadings vectors to transform the input dataset????????????????Measure (fcon_mea)Measure over the entire row (e.g., norm, maximum, minimum etc), creates output that contains one variable only (i.e., the measuse).?????????????Feature Selection (fsel)Feature selection. This block is not trainable. It contains a fixed vector of feature indexes to be selected from the input to use(). Feature selection methods are found under “Analysis Session” (as class)??????????Pre-processing (pre)This branch contains methods that are classically called “pre-processing” methods in the literature, such as de-noising, baseline correction and normalization ADDIN CSL_CITATION { "citationItems" : [ { "id" : "ITEM-1", "itemData" : { "DOI" : "10.1007/978-1-4020-5811-0", "ISBN" : "978-1-4020-2859-5", "author" : [ { "family" : "Somorjai", "given" : "R L" }, { "family" : "Alexander", "given" : "M" }, { "family" : "Baumgartner", "given" : "R" }, { "family" : "Booth", "given" : "S" }, { "family" : "Bowman", "given" : "C" }, { "family" : "Demko", "given" : "A" }, { "family" : "Dolenko", "given" : "B" }, { "family" : "Mandelzweig", "given" : "M" }, { "family" : "Nikulin", "given" : "A E" }, { "family" : "Pizzi", "given" : "N J" }, { "family" : "Pranckeviciene", "given" : "E" }, { "family" : "Summers, RZhilkin", "given" : "P" } ], "container-title" : "Artificial Intelligence Methods And Tools For Systems Biology", "editor" : [ { "family" : "Dubitzky", "given" : "Werner" }, { "family" : "Azuaje", "given" : "Francisco" } ], "id" : "ITEM-1", "issued" : { "date-parts" : [ [ "2004" ] ] }, "note" : "Prominent roles are played by magnetic resonance (MR) spectroscopy, infrared (IR) spectroscopy, Raman spectroscopy, fluorescence spectroscopy and mass spectroscopy.\n\n \nThe first 4 methods can be applied in vivo, a significant advantage in the clinic.\n\n \nThe pattern recognition/AI community generally accepts that to create a classifier with high generalization capability, we require an sample/feature rate of minimum 5.\n\n \n\"Statistical classification strategy\"\n\n \nNot only should the eventual classifier provide accurate, reliable diagnosis/prognosis, it should also predict class membreship using the fewest possible discriminatory features (for interpretability).\n\n \nFurthermore, these features must be interpretable in biochemically, medically relevant terms (\"biomarkers\"). These are two interrelated aspects (performance and interpretability).\n\n \n1. Data visualization\n\n \n2. Pre-processing\n\n \n3. Feature selection/extraction (reduction is not a good term anymore)\n\n \n4. Classifier development\n\n \n5. Classifier aggregation\n\n \nRelative distance plane: By choosing 2 points in the feature space (the reference pair), one preserves the exact distances of all other points to this pair. I implemented RDP, does not give a good visualization in our datasets, except for outlier detection.\nFeature selection produces a subset of the original features. Feature extraction is the more general approach: it involves obtaining some functional combination of the original features. PCA ..... However, PCA involves a high degree of feature transformation - the extracted feature bear little resemblance to the original ones. This feature scrambling is undesirable for interpretability. A conceptually better approach than PCA is to use PLS for dimensionality reduction. The new variables are expected to be better for prediction, but the scrambling problem is still the same. \n\n \nI found PLS-LDA to give loadings similar to the penalized LDA with [.1 0 0].\n\n \nOur feature extraction approach is a genetic-algorithm-based optimal selection algorithm. It uses binary coding to indicate the presence or absense of a feature.\n\n \nOk, there are different forms of cross-validation.\n\n \nChallenges:\n\n \n\u2022 Bugger, they cite non-uniqueness of features! However, I wouldn't say it is a challenge\n\n \n\u2022 A way to make a classifier reject a sample, i.e., send it back without assigning it to any class.\n\n \n\u2022 Combining the 2-class classifiers is not so trivial\n\n \n\u2022 Nice non-linear transformations\n-------------------\n\n \nChallenges: sample sparsity, high dimensional feature spaces, noise/artifact signatures.\n\n \nClassifier strategy must take into account 5 key components: data visualization, pre-processing, dimensionality reduction, robust classifier and classifier aggregation.\n\n \nVisualization: relative distance plane (RDP); projection pursuit\n\n \nPre-processing: relative distance plance (RDP); normalization, smoothing, peak alignment, first derivative, second derivative, and the author mentions 2 techniques involving optimization introduced by him (page 4 at the bottom).\n\n \nFeature reduction: PCA+exaustive search of 3 best PCs; partial least squares; rotated PCs, DWT with threshold; Genetic Algorithm Optimal Region Selection (GA_ORS). \n\n \nClassifier aggregation: one may use the same classifier but different feature extraction or pre-processing methods. See page 11 for references.\n\n \nOther consideration: perhaps 2-class classifiers are better.\n\n \n\n \n------------------------------------\n\n \nIn another occasion, I also wrote this:\n\n \nreview ray\nBecause of the twin curses, reliable classification of biomedical data, spectra in particular, is especially difficult, and demands a \u201ddivide and conquer\u201d approach. Thus, our strategy consists of several stages: \n1) Data visualization, \n2) Preprocessing, \n3) Feature extraction / selection, \n4) Classifier development,\n5) Classifier aggregation / fusion. \n\n ", "page" : "67-85", "publisher" : "Springer Netherlands", "publisher-place" : "Dordrecht", "title" : "A data-driven, flexible machine learning strategy for the classification of biomedical data", "type" : "chapter", "volume" : "5" }, "uris" : [ "" ] }, { "id" : "ITEM-2", "itemData" : { "DOI" : "10.1093/bioinformatics/bti102", "abstract" : "MOTIVATION: The major difficulties relating to mathematical modelling of spectroscopic data are inconsistencies in spectral reproducibility and the black box nature of the modelling techniques. For the analysis of biological samples the first problem is due to biological, experimental and machine variability which can lead to sample size differences and unavoidable baseline shifts. Consequently, there is often a requirement for mathematical correction(s) to be made to the raw data if the best possible model is to be formed. The second problem prevents interpretation of the results since the variables that most contribute to the analysis are not easily revealed; as a result, the opportunity to obtain new knowledge from such data is lost. METHODS: We used genetic algorithms (GAs) to select spectral pre-processing steps for Fourier transform infrared (FT-IR) spectroscopic data. We demonstrate a novel approach for the selection of important discriminatory variables by GA from FT-IR spectra for multi-class identification by discriminant function analysis (DFA). RESULTS: The GA selects sensible pre-processing steps from a total of approximately 10(10) possible mathematical transformations. Application of these algorithms results in a 16% reduction in the model error when compared against the raw data model. GA-DFA recovers six variables from the full set of 882 spectral variables against which a satisfactory DFA model can be formed; thus inferences can be made as to the biochemical differences that are reflected by these spectral bands.", "author" : [ { "family" : "Jarvis", "given" : "Roger M" }, { "family" : "Goodacre", "given" : "Royston" } ], "container-title" : "Bioinformatics", "id" : "ITEM-2", "issue" : "7", "issued" : { "date-parts" : [ [ "2005", "4" ] ] }, "note" : "nearly 10^10 mathematical transformations possible", "page" : "860-868", "title" : "Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data", "type" : "article-journal", "volume" : "21" }, "uris" : [ "" ] }, { "id" : "ITEM-3", "itemData" : { "DOI" : "10.1039/c2an16300d", "abstract" : "Applying Fourier-transform infrared (FTIR) spectroscopy (or related technologies such as Raman spectroscopy) to biological questions (defined as biospectroscopy) is relatively novel. Potential fields of application include cytological, histological and microbial studies. This potentially provides a rapid and non-destructive approach to clinical diagnosis. Its increase in application is primarily a consequence of developing instrumentation along with computational techniques. In the coming decades, biospectroscopy is likely to become a common tool in the screening or diagnostic laboratory, or even in the general practitioner's clinic. Despite many advances in the biological application of FTIR spectroscopy, there remain challenges in sample preparation, instrumentation and data handling. We focus on the latter, where we identify in the reviewed literature, the existence of four main study goals: Pattern Finding; Biomarker Identification; Imaging; and, Diagnosis. These can be grouped into two frameworks: Exploratory; and, Diagnostic. Existing techniques in Quality Control, Pre-processing, Feature Extraction, Clustering, and Classification are critically reviewed. An aspect that is often visited is that of method choice. Based on the state-of-art, we claim that in the near future research should be focused on the challenges of dataset standardization; building information systems; development and validation of data analysis tools; and, technology transfer. A diagnostic case study using a real-world dataset is presented as an illustration. Many of the methods presented in this review are Machine Learning and Statistical techniques that are extendable to other forms of computer-based biomedical analysis, including mass spectrometry and magnetic resonance.", "author" : [ { "family" : "Trevisan", "given" : "J\u00falio" }, { "family" : "Angelov", "given" : "Plamen P." }, { "family" : "Carmichael", "given" : "Paul L." }, { "family" : "Scott", "given" : "Andrew D." }, { "family" : "Martin", "given" : "Francis L." } ], "container-title" : "The Analyst", "id" : "ITEM-3", "issue" : "14", "issued" : { "date-parts" : [ [ "2012", "7", "21" ] ] }, "note" : "\n From Duplicate 1 ( \n \n Extracting biological information with computational analysis of Fourier-transform infrared (FTIR) biospectroscopy datasets: current practices to future perspectives.\n \n - Trevisan, J\u00falio; Angelov, Plamen P.; Carmichael, Paul L.; Scott, Andrew D.; Martin, Francis L. )\n\n \n \n\n \n\n \n\n ", "page" : "3202-15", "title" : "Extracting biological information with computational analysis of Fourier-transform infrared (FTIR) biospectroscopy datasets: current practices to future perspectives.", "type" : "article-journal", "volume" : "137" }, "uris" : [ "" ] } ], "mendeley" : { "previouslyFormattedCitation" : "[7\u20139]" }, "properties" : { "noteIndex" : 0 }, "schema" : "" }[7–9].??????????Analysis Session (as)These blocks usually perform complex and potentially time-demanding analyses inside their use(), and the use() output is a irlog (not a dataset as is the output of most blocks) from where many results can be extracted and/or visualized.???????Sub-dataset Generation Specs (sgs)This class is an abstraction of sub-dataset generation. Sub-dataset generation is needed nearly everywhere in data analysis. Situations include obtaining train and test datasets, cross-calculation of scores, and as part of algorithms (e.g. bagging and boosting). sgs centralizes the task of calculating the rows to be extracted from a dataset to generate sub-datasets.???????Feature Subset Grader (fsg)fsg centralizes the task of calculating a “grade” (a measure of for a sub-set of features). This class was created to be used in feature selection algorithms. ???????Peak Detector (peakdetector)This class centralizes the work of detecting peaks from a spectrum, loadings vector, histogram etc.???????Vector Comparer (frbm)This class is used to compare vectors with the same number of elements (paired tests). The main application is for comparison between classifiers (see report_sovalues_comparison.m).???????Log (irlog)This is a general information/results container. It stores information whose format is unsuitable to be stored by a irdata object. Typically generated by as::use(). Various topicsIRootLab setup fileIRootLab configuration for colour scheme, font, scatterplot markers etc is stored as MATLAB global variables. These variables are automatically saved by objtool in a file called irootlab_setup.m, which can be changed at will. Each project directory has its own irootlab_setup.m file. You can force the creation of this file by entering setup_write at MATLAB command window. To open the setup file, enter edit irootlab_setup at MATLAB command window. To make changes to this file immediately available in MATLAB environment, simply execute the file (e.g., by pressing the F5 key at MATLAB editor; or entering irootlab_setup at MATLAB command window). REF _Ref341902804 \h Table 5 describes some setup options. For a complete reference to IRootLab global variables, please check assert_all.m Table SEQ Table \* ARABIC 5 - some global/irootlab_setup.m setup variables.Variable namePurposeSCALEControls the relative size of text, markers, and line widths.Tip: Increase the SCALE if you want a figure to be small in a publication. For example, if SCALE = 1 looks fine when you create a full-width figure panel, you should use SCALE = 2 for a half-width panel.COLORSDefines a colour sequence to represent different data classes. Each colour is coded as a 3-element vector representing its [Red, Green, Blue] (RGB) composition. Each values ranges from 0 to 255. For example, [255, 0, 0] is red, [0, 0, 0] is black, and [255, 255, 0] is yellow (red+green). Tip: There is a good website at with lots of nice colour palettes!MARKERSSequence of markers (triangles, circles, squares etc) to represent different data classes in scatterplots.Tip: For a list of available marker symbols, enter help plot at MATLAB command window.FONTFont nameFONTSIZEFont sizeMATLAB command sheetIRootLab-specificv_x2ind([1650, 1450, 1080], ds01.fea_x)Converts wavenumbers to feature indexes for dataset ds01.save_as_png([], "filename.png”, 300)Saves current figure as a PNG file named?filename.png?at?300?dpi resolution.edit_ircodeOpens the current auto-generated macro MATLAB editorhelp2Opens main page of documentation in browserhelp2 irdataOpens help in browser for file irdata.mobjtoolmergetoolsheloadOpens respective GUIsds01.classlabels’Shows class labels from dataset ds01MATLAB basics1:5Same as [1, 2, 3, 4, 5]1:2:21Same as [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21]xlim([900, 1800])Sets current figure x-axis limits to?1800-900 cm-1?region. Note that the lower value comes first in the command.ylim([0, 1])Sets current figure y-axis limits between 0 and 1.var(ds01_meanc01_pca01.X, 1)’Displays Variances of principal components (assuming that the dataset name is ds01_meanc01_pca01)Dataset from existing MATLAB variablesIf you already use MATLAB for data analysis, you are likely to have at some stage the dataset variables, such as X and classes separately. The example below shows how to pack such existing variables into an IRootLab dataset.% Assumes that these variables exist:% - X of dimension [no]x[nf]% - classes of dimension [no]x[1]ds = irdata();ds.X = X;ds.classes = classes;ds = ds.assert_fix(); % Automatically fills in the classlabels property with defaultsAlso, check the file called demo_import_fisheriris.m; this example performs some analysis on Fisher’s Iris dataset that is shipped with MATLAB and performs a somewhat more advanced than the example above. ReferencesADDIN Mendeley Bibliography CSL_BIBLIOGRAPHY [1]J. Trevisan, P. P. Angelov, A. D. Scott, P. L. Carmichael, and F. L. Martin, “IRootLab: a free and open-source MATLAB toolbox for vibrational biospectroscopy data analysis.,” Bioinformatics (Oxford, England), pp. 1–2, Mar. 2013.[2]F. L. Martin, J. G. Kelly, V. Llabjani, P. L. Martin-Hirsch, I. I. Patel, J. Trevisan, N. J. Fullwood, and M. J. Walsh, “Distinguishing cell types or populations based on the computational analysis of their infrared spectra,” Nat. Prot., vol. 5, no. 11, pp. 1748–1760, Jan. 2010.[3]K. Gajjar, L. D. Heppenstall, W. Pang, K. M. Ashton, J. Trevisan, I. I. Patel, V. Llabjani, H. F. Stringfellow, P. L. Martin-Hirsch, T. Dawson, and F. L. Martin, “Diagnostic segregation of human brain tumours using Fourier-transform infrared and/or Raman spectroscopy coupled with discriminant analysis,” Analytical Methods, vol. 5, pp. 89–102, 2013.[4]J. Trevisan, P. P. Angelov, I. I. Patel, G. M. Najand, K. T. Cheung, V. Llabjani, H. M. Pollock, S. W. Bruce, K. Pant, P. L. Carmichael, A. D. Scott, and F. L. Martin, “Syrian hamster embryo (SHE) assay (pH 6.7) coupled with infrared spectroscopy and chemometrics towards toxicological assessment,” Analyst, vol. 135, no. 12, pp. 3266–3272, Dec. 2010.[5]P. P. Angelov and X. Zhou, “Evolving Fuzzy-Rule-Based Classifiers From Data Streams,” IEEE T. Fuzzy Syst., vol. 16, no. 6, pp. 1462–1475, Dec. 2008.[6]I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, Feature Extraction - Foundations and Applications. New York: Springer, 2006.[7]R. L. Somorjai, M. Alexander, R. Baumgartner, S. Booth, C. Bowman, A. Demko, B. Dolenko, M. Mandelzweig, A. E. Nikulin, N. J. Pizzi, E. Pranckeviciene, and P. Summers, RZhilkin, “A data-driven, flexible machine learning strategy for the classification of biomedical data,” in Artificial Intelligence Methods And Tools For Systems Biology, vol. 5, W. Dubitzky and F. Azuaje, Eds. Dordrecht: Springer Netherlands, 2004, pp. 67–85.[8]R. M. Jarvis and R. Goodacre, “Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data,” Bioinformatics, vol. 21, no. 7, pp. 860–868, Apr. 2005.[9]J. Trevisan, P. P. Angelov, P. L. Carmichael, A. D. Scott, and F. L. Martin, “Extracting biological information with computational analysis of Fourier-transform infrared (FTIR) biospectroscopy datasets: current practices to future perspectives.,” The Analyst, vol. 137, no. 14, pp. 3202–15, Jul. 2012. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download