NITRC

The Bayesian connectivity analysis (BCA) software

Rong Chen

March 2015

1. Introduction

BCA is a pipeline for connectivity matrix analysis. A general user is a researcher who are interested in brain network analysis and has basic knowledge of Linux/Unix, R, and Matlab. This software is tested on Windows 8 and CENTOS 6.6 using MATLAB 2012a and R version 3.1.3.

BCA is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

BCA is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with the GAMMA suite. If not, see .

2. Installation

Prerequisite: BCA needs R (r-), Matlab (products/matlab/).

• Install the caret, mclust package of R. Open R, run install.packages(“caret”) and install.packages(“mclust”).

• Install Matlab BN Toolbox. Unzip bnt. zip to a folder. Go to that folder, change the BNT_HOME variable in add_BNT_to_path.m. Paths to BNT folder and subfolder can also be added after opening Matlab (File->Set path->Add with Subfolders). The home page of BN Toolbox is . BN Toolbox is a free software package for probabilistic graphical models such as Bayesian networks.

• We call the folder containing user’s data prj_home. Copy all files under pipeline to prj_home.

3. A simulated dataset

In this simulated connectivity matrix dataset, we have a brain network including 9 regions. Therefore, the connectivity matrix is 9*9. There are 100 subjects in two groups. Each group has 50 subjects. Let aE be the measure of association (the connectivity score) at link E.

For the control group (subject 51-100), aE ~ N(5, 1) for all E.

For the patient group (subject 1-50), subject 1-25 has aE ~ N(2, 1) for the link [region 1 – region 2], and aE ~ N(5, 1) for other E. subject 26-50 has aE ~ N(2, 1) for the link [region 3 – region 4], and aE ~ N(5, 1) for other E.

This simulated dataset represents a scenario of an OR gate. That is, the pattern (a1-2 OR a3-4) predicts the group-membership variable.

The connectivity matrix is symmetric. That is, the connectivity score from node i to node j is same as that from node j to node i. Therefore, we only need the lower triangular part. The simulated data are saved in file exp_data.RData. It includes two variables, dataAll is the connectivity data (the lower triangular part of the connectivity matrix), and fv is the group-membership variable (fv=0 is the control group and fv=1 is the patient group). The columns of dataAll are variables, and rows are samples. For the simulate data, we have 9*(9-1)/2 = 36 connections and 100 samples. Therefore, dataAll is a 100*36 data matrix. Users can load this file using load(“exp_data.RData”) in R.

The connection index is saved in file vid.RData. It has three variables: xid is the starting vertex of a connection, and yid is the ending vertex of a connection, and vname is the variable name of a connection. For example, for a connection 2-1, xid=2, yid=1, and vname=”2-1”

4. How to run the analysis

The whole pipeline is depicted in Figure 1.

[pic]

Figure 1. The BCA pipeline

Step 1. The first step is to transform your connectivity matrix into the format required by BCA. The input to BCA is two files: exp_data.RData which is the connectivity data, and vid.RData which is the connection index. Details of these two files are in section 3.

Step 2. The second step is to perform resampling-based variable selection. Please copy exp_data.RData and vid.RData to prj_home. Then open R, go to prj_home under R, and run script pall.R. pall.R includes 4 scripts, p1_clean.R, p2_clustering.R, p3_csel_resamp_a.R, and p3_csel_resamp_b.R.

Step 3. The last step is Bayesian network modelling. Open Matlab, go to prj_home, add prj_home folder to path, run mall.m. mall.m includes these scripts: m0_gen_data.m, m1_learn_bncit.m, m2_learn_bagging.m, m3_ana_bagging.m.

Details of R and Matlab scripts are as follows.

p1_clean.R – clean data and remove zero-variance variables.

• Input: exp_data.RData and vid.RData

• Output: exp_data_clean.RData and vid_clean.RData. These file has a clean version of the connectivity data.

p2_clustering.R – clustering with mixture modelling

p3_csel_resamp_a.R – variable selection with resampling

• Input: exp_data_clean.RData, vid_clean.RData, cAll.RData

• Parameters: rspSize – the bootstrap resampling size

• Output: me – model ensemble, saved in me.RData

p3_csel_resamp_b.R – analyze the model ensemble

• Input: exp_data_clean.RData, cAll.RData, vid_clean.RData, and me.RData.

• Parameters: rspCut – the cutoff

• Output: biomarker.csv – the list of detected biomarkers. Data_c.csv – the dataset including selected biomarkers and the group-membership variable.

P4_discre.R – discretization for Bayesian network modelling

• Input: data_c.csv

• Parameters: quant – the quantile for discretization.

• Output: data_discre.csv – the discrete data

m0_gen_data.m – transform data_discre.csv into a BN Toolbox format

m1_learn_bncit.m – learn a Bayesian network

m2_learn_bagging.m – learn a Bayesian network with bootstrap resampling

• Input: data_d – discrete data

• Parameters: rsp_size – resampling size

• Output: bncit_bs_model – the Bayesian network model ensemble

m3_ana_bagging.m – analyze the model ensemble. The outcome is mbrsp which is the Markov blanket identified by Bayesian network learning with bootstrap resampling.

5. Results for the simulated data

Users should detect two biomarkers (a1-2, a3-4). This is found in the biomarkers.csv file. The resampling-based Bayesian network learning found that the Bayesian network model ensemble include a significant model [a1-2 , a3-4] -> fv. Note that different resampling runs may generate slightly different results.

Here is the R output for the test dataset.

[pic]

Here is the Matlab output for the test dataset.

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches