PHS 398 (Rev. 9/04), Continuation Page - SLI



Research Plan

a. Identification and Significance of the Problem or Opportunity

Given a large set of documents such as grant applications or scientific publications, how does one quickly gain an understanding of the key information they contain? This problem is faced by medical researchers, lawmakers and administrators alike. Portfolio analysis tools at NIH are capable of viewing patterns of grant proposals and performing statistical analyses to understand trends and other analytical metrics. We propose to construct a system that allows analysts, administrators and policy-makers to access and view the entire research portfolio as a large-scale, simple, intuitive map where they may zoom in to the level of individual grants and then zoom out to the level of whole institutes.

During the 2007 fiscal year, NIH’s budget was $29.128 billion (see ). For 2007 alone, over 60,000 individual project grants are listed in the CRISP database (which includes government funding agencies additional to NIH) () ranging from basic research into the mechanisms of cancer to new teaching courses. The ability to understand where public funding is being spent is a vital element of oversight within the government, but is complicated by the fact that each grant must be classified based on a hand-crafted scheme.

This has lead to the development of the Research, Condition and Disease Categorization (RCDC, ), a comprehensive breakdown of different categories of the most important fields being funded. RCDC is both important and valuable, and is strictly defined as 215 high-level categories to be used to classify NIH spending. Although some subjects (like ‘cancer’) command billions of dollars of funding and others receive far less, each high-level topic label conceals a wealth of complexity about underlying research trends and topics in the field. We will build tools to analyze and visualize this complexity in a way that is intuitive, accessible, quantitative and scalable.

Figure 1 shows a screenshot from the application we will build: a web-based system that provides a simple navigable map of a document collection. This mockup is based on an existing prototype (see §c) and is only one possible product of our technology. In order to enable users to interact with this application using an ordinary Web-browser, the application will generate parts of the visualization on the server as static images and augment them with both additional metadata and quantitative metrics using the browser’s capabilities.

b. Technical Objectives

We will build a prototype web-application that provides definitive proof-of-concept for the vision described above. This system will be able to ingest thousands to millions of documents and generate scalable maps that may be explored using any browser (see Figure 2). Our application will permit users to explore and understand the structure, content, and hidden relationships in many different types of documents such as: in-process applications, previously-funded grants, review papers, primary research articles, medical records, MEDLINE abstracts, patents, web-pages, etc.

Our application will permit users to explore and understand the structure, content, and hidden relationships in many different relevant classes of documents.

Our specific objectives are:

a. Research and develop a robust, intuitive user interface that enables a user to generate meaningful visualizations

b. Research and characterize effective algorithms and data processing frameworks for visualizing biomedical research data in network format

c. Develop visualization prototype web-application and Application Programming Interface (API)

d. Review the strengths and weaknesses of the prototype needing improvement in Phase II

Phase I of this project will conclude with a fully functional prototype web application that includes a robust user interface designed and based on information gathered from NIH researchers and portfolio analysts that can be used to visualize and analyze biomedical research data. In addition, a relational database and xml data model, an Application Programming Interface (API), and a report generated from the feedback of these NIH analysts and researchers regarding the strengths and deficiencies of the prototype will be available.

c. Work/Research Plan

1.   Research the design of a User Interface (UI) that allows a user to select and view data of interest using derived parametric data.

In the first stage, we will gather information about users and their tasks in order to design an effective user interface to support exploration of biomedical research data.  Procedure:

1. Interview four to six NIH researchers and policy analysts whose job responsibilities involve the 'areas of interest' in the topic announcement in order to gain a deeper understanding of their problems and needs.

2. Generate Use Cases from these interviews that describe specific ways of using the visualization tool to understand biomedical research or grant portfolios.

3. Survey current literature dealing with human-computer interaction and graphic design to identify specific guidelines that will be incorporated into the design of the user interface for this tool.

4. Design user interface (UI) mock-ups that encompass the needs found during the interview process and refine them iteratively in cooperation with said researchers.

The outcome of this stage will be: (1) a set of Use Cases defining the user requirements; (2) a set of UI mockups as a result of step 4; and (3) technology recommendations for implementing such a UI. The benchmark needed to move on to the next stage will be agreement between NIH researchers and project personnel that the generated UI mock-ups sufficiently demonstrate the manner in which the user interface will fulfill the Use Cases generated in step 2. The ChalkLabs team consisting of Shashikant Penumarthy, Bruce Herr, and Gavin LaRowe in conjunction with Dr. Burns will be responsible for executing this task. Work will be performed at ChalkLabs’ offices in Indiana using the internal development infrastructure as well as on site at the offices of the NIH researchers. ChalkLabs personnel will travel to the offices of NIH researchers and observe them at work to understand the environment in which they work, the tools they use and other specific issues relevant to the design of the software application.

2. Design and develop a data model that can be used to store, query, and transform biomedical research data in order to generate visualizations (Weeks 5-8, Location: ChalkLabs, Bloomington, IN): 

Since the data model of this system will be network-based, it must be both general enough to handle variations in structure and efficient enough to support the kinds of analysis needed to support visualization. Procedure:

1. Survey biomedical research data sets such as grant applications to gain an understanding of commonly used data formats and attributes of those documents that may be important for exploring and understanding how they are related to each other.

2. Survey commonly used analysis algorithms that build networks from unstructured data. Although Phase I will only use the Topic Model, this step is important as it will provide knowledge needed to encompass other algorithms for analysis in the future.

3. Create a normalized conceptual data model that can represent the different types of networks and their attributes. This data model will subsequently be used to determine an initial Application Programming Interface (API) for Phase II.

4. Test the expressiveness and efficiency of the data model by loading test data into a database and performing frequently used network analysis and visualization operations on it.

The outcome of this stage will be both an XML and relational database model for representing biomedical research networks. The benchmark will be that common network operations performed in step 5 are easily expressed in terms of the data model developed. The ChalkLabs team consisting of Shashikant Penumarthy, Bruce Herr, and Gavin LaRowe will be responsible for executing this task. All work will be performed at ChalkLabs’ offices in Indiana using the internal development infrastructure.

3. Research and develop a visualization and interaction model.

The ‘visualization model’ defines the visualization design space by determining available primitives such as shapes and the type as well as the properties acting on those shapes (such as color and size). It is independent of the data model, but is designed so that its most important features can be visualized without resorting to complex transformations. The interaction model is then layered on top of the visualization model and defines the application’s response to user actions. Both the visualization and interaction models evolve to meet the changing needs of the data and users. Procedure:

1. Survey existing designs of network visualizations in scientific publications, visualization applications and visualization weblogs.

2. Identify visual primitives required to visualize networks based on the database schema developed in stage 1 and research performed in stage 2.

3. Design a visualization model to accommodate variations in network structure and visual attributes while incorporating the recommendations formulated as a result of UI research performed in stage 1.

4. Design an interaction model that incorporates ideas from existing interaction metaphors as well as introduces new ones based on their perceived ease of use and feasibility of implementation.

The outcome of this stage will be a description of objects in the visualization system and their interactions that can be translated into source code by computer programmers. The benchmark needed to move on to the next stage will be that the visualization and interaction models integrate very well with the user interface designed in stage 1. The ChalkLabs team consisting of Shashikant Penumarthy, Bruce Herr, and Gavin LaRowe will be responsible for executing this task. All work will be performed at ChalkLabs’ offices in Indiana using the internal development infrastructure.

4. Integrate the Topic Model algorithm to analyze and generate similarity networks for visualization. Topic Models are unsupervised models for generating topics from document collections (Griffiths and Steyvers, 2004), which may then be processed to provide visualizations. topicSeek LLC already has an efficient implementation of the Topic Model that will be used as part of the visualization tool. Procedure:

1. Identify the requirements for running Topic Model on a database and modify the programming interfaces appropriately.

2. Implement the necessary data converters to feed data to the algorithm from the database and store results of the analysis back into the database.

3. Test and optimize the implementation using data sets of varying sizes to determine deficiencies and strengths. 

The outcome of this stage will be a working implementation of the Topic Model that can use the efficient data model developed during stage 2 to perform text analysis. Dr. Newman will be responsible for developing the code for the Topic Modeling algorithm as well as the topics and similarity networks to be used for the subsequent visualizations. The benchmark needed to move the next step will be that the model correctly analyzes thousands to millions of documents to produce topics that make sense. The ChalkLabs team consisting of Shashikant Penumarthy, Bruce Herr, and Gavin LaRowe will be responsible for integrating Dr. Newman’s work product into the visualization platform. Dr. Newman will perform his work on an internal development environment at topicSeek’s offices in California. Work performed by the ChalkLabs team will be performed at ChalkLabs’ offices in Indiana using the internal development infrastructure.

5.   Research and develop the server-side analysis and rendering pipeline.

Here, we will automate and streamline the process of generating new visualizations. It will become possible to pipe many different data sets through the system, analyze and visualize them without the need for human intervention. Such automation will enable the creation of a “live” dashboard showing a map of, for example, the entire space of cancer research updated periodically as new data becomes available. Procedure:

1. Identify the data format of inputs and outputs expected by each stage of the pipeline shown in Figure 2.

2. Design programmatic interfaces between stages of the pipeline for 2-way transmission of data, command parameters and status messages and develop test cases to determine conformance of the implementation to interfaces specified in step 1.

3. Create programs that implement the interfaces designed in step 2 and create tools to manage the flow of data through the pipeline.

The outcome of this stage will be a software implementation of a server-side rendering pipeline that will accept data in flat files or databases and set of parameters as input and generate a set of static tiled images corresponding to the visualization as output. The benchmark needed to move on to the next stage will be that the pipeline be able to run from start to finish in a completely automated manner. The ChalkLabs team consisting of Shashikant Penumarthy, Bruce Herr, and Gavin LaRowe will be responsible for executing this task. All work will be performed at ChalkLabs’ offices in Indiana using the internal development infrastructure.

6. Develop prototype web-application and Application Programming Interface (API).

At this stage, a proof-of-concept web-application that runs in the browser will be built to demonstrate the desired components and behavior as specified during stages #2 and #4. Procedure:

1. Identify life-cycle events that orchestrate the creation, update and deletion of objects that constitute a running prototypical web-based application.

2. Architect interfaces between client and server with respect to data, meta-data, command parameters, results and error responses while adhering to established standards in web-application development.

3. Develop test cases that will be used to determine the conformance of the implementation to architecture developed in stage 1.

4. Iteratively develop and test the client-side prototype.

5. Develop an Application Programming Interface (API) that can used for future development.

The outcome of this stage will be a fully functional web-based prototype that can be used to explore and understand large document collections. The working software itself will be the benchmark that must be met in order to move on to the next stage. The ChalkLabs team consisting of Shashikant Penumarthy, Bruce Herr, and Gavin LaRowe will be responsible for executing this task. All work will be performed at ChalkLabs’ offices in Indiana using the internal development infrastructure and Amazon’s EC2 & S3 cloud computing infrastructure.

7. Evaluate the usability of the prototype in the context of the Use Cases developed during stage 1.

At this stage, ChalkLabs will test the application with the researchers interviewed during stage 1 to evaluate how well the software satisfied the Use Cases they helped to develop. This serves several purposes: (1) it allows the researchers to revisit the application and make recommendations regarding whether certain features they deemed important were, in fact, useful or if a different approach is needed; (2) it provides valuable feedback about the software to the builders of the tool; and (3) it may suggest other topics needing further exploration during the Phase II of the development of this tool. Procedure:

1. Interview a few researchers and prepare an informal usability study comprising of tasks that researchers should be able to perform with this tool.

2. Allow researchers to use the tool for the tasks specified using pre-loaded data and make observations on all aspects of the usage of the tool.

The outcome of this stage will be a report describing what worked well, what didn’t work, and recommendations for improvements to be addressed during Phase II. Dr. Burns, Dr. Newman, and the ChalkLabs team consisting of Shashikant Penumarthy, Bruce Herr, and Gavin LaRowe will be responsible for executing this task. All work will be performed at the respective offices of all parties.

As an Indiana-based company, ChalkLabs is eligible for a dollar for dollar match on all SBIR Phase I awards from the Indiana Economic Development Corporation’s 21st Century Fund. These funds, up to $100,000, will be used to further the objectives outlined in the above work plan. A letter of support is attached.

|Objective |Months |Responsible Party |Benchmarks |

| |1 |

|eRA COMMONS USER NAME | |

|shashikantp | |

|EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, and include postdoctoral training.) |

|INSTITUTION AND LOCATION |DEGREE |YEAR(s) |FIELD OF STUDY |

| |(if applicable) | | |

|University of Mumbai |B.E. |2002 |Electrical Engineering |

|Indiana University |M.S. |2004 |Computer Science |

|Indiana University |Ph.D. |2009 (antic.) |Information Visualization |

A. Positions and Honors

Professional Experience

Fall 2008-present Research Scientist, ChalkLabs, Bloomington, IN

Summer 2007-Fall 2008 Visualization Consultant, Mind Alliance Systems, Roseland, NJ

Fall 2006-Summer 2007 Research Assistant, Professor Susan Herring, Indiana University

Summer 2006-Fall 2006 Visualization Research Intern, Microsoft Research, Redmond, WA

Fall 2003-Spring 2006 Research Assistant, Prof. Katy Borner (InfoVis Lab), Indiana University

B. Relevant publications

1. Börner, K., Penumarthy, S., Meiss, M., & Ke, W. (2006). Mapping the diffusion of scholarly knowledge among major US research institutions. Scientometrics, 68(3), 415-426.

2. Herr, B. W., Huang, W., Penumarthy, S., & Börner, K. (2008). Designing highly flexible and usable cyberinfrastructures for convergence. Annals of the New York Academy of Sciences, 1093(1 Progress in Convergence: Technologies for Human Wellbeing), 161-179.

3. Penumarthy, S., & Börner, K. (2003). The ActiveWorld Toolkti: Analyzing and Visualizing Social Diffusion Patterns in 3D Virtual Worlds. Workshop on Virtual Worlds: Design and Research Directions, MIT, Boston MA.

4. Börner, K. Penumarthy S. (2004). Information Visualization Cyberinfrastructure. Position paper at the Workshop on Information Visualization Software Infrastructures, InfoVis 2004, Austin TX.

5. Huang, W., Herr, B., Penumarthy, S., Markines, B., & Börner, K. (2006). Cishell--A plug-in based software architecture and its usage to design an easy to use, easy to extend cyberinfrastructure for network scientists. Network Science Conference.

6. Börner, K., & Penumarthy, S. (2007). Spatio-Temporal information production and consumption of major US research institutions. Proceedings of ISSI, 1, 635-641.

7. Herr II, B. W., Duhon, R. J., Börner, K., Hardy, E. F., & Penumarthy, S. (2008) 113 years of physical review: Using flow maps to show temporal and topical citation patterns. Proceedings of the 12th International Conference on Information Visualization, Oct 19-24 Columbus OH.

C. Research Support

NSF IIS-0238261 Börner (PI) 10/01/03 - 09/30/08

National Science Foundation

CAREER: Visualizing Knowledge Domains

This project aims to bring the power of knowledge domain visualizations to any desktop connected to the Internet.

Role: Research Assistant and Technical Lead

Microsoft Research Susan Herring (PI) 09/01/2006 – 05/10/2007

9 Month Unrestricted Cash Gift

The project analyzed the effect of spam on the vitality of newsgroups in multiple languages over time.

Role: Research Assistant

|NAME |POSITION TITLE |

|Gully Alexander Peter Carey Burns |Neuroinformatics Research Scientist |

| |Information Sciences Institute |

| |University of Southern California |

|eRA COMMONS USER NAME | |

|GullyBurns | |

|EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, and include postdoctoral training.) |

|INSTITUTION AND LOCATION |DEGREE |YEAR(s) |FIELD OF STUDY |

| |(if applicable) | | |

|Imperial College, London, England |B.Sc. |1992 |Physics |

|Oxford University, Oxford, England |D.Phil. |1997 |Physiology |

A. Positions and Honors

Professional Experience

1992-1995 Research Assistant, Laboratory of Physiology, Oxford University, England.

1995-1997 Research Assistant, Neural Systems Group, Newcastle University, England.

1997-1999 Research Associate, Swanson Laboratory, Department of Neurobiology, USC.

1999-2006 Research Assistant Professor, Department of Neurobiology, USC.

2006-present Neuroinformatics Research Scientist, Information Sciences Institute, USC.

B. Relevant publications (selected from 33 publications)

1. Burns, G.A.P.C., D. Feng, and E.H. Hovy (2008), "Intelligent Approaches to Mining the Primary Research Literature: Techniques, Systems, and Examples", in "Computational Intelligence in Biomedicine". In Press, To appear in Series in Studies in Computational Intelligence, Springer-Verlag, Germany.

2. Burns, G., D. Feng, T. Ingulfsen, and E. Hovy (2007), "Infrastructure for Annotation-Driven Information Extraction from the Primary Scientific Literature: Principles and Practice". in 1st IEEE International Workshop on Service Oriented Technologies for Biological Databases and Tools (SOBDAT 2007). Salt-Lake City.

3. Burns, G., W.-C. Cheng, R.F. Thompson, and L. Swanson (2006), "The NeuARt II system: a viewing tool for neuroanatomical data based on published neuroanatomical atlases." BMC Bioinformatics: 7:531.

4. Burns, G.A. and W.C. Cheng (2006), "Tools for Knowledge Acquisition within the NeuroScholar system and their application to anatomical tract-tracing data". J Biomed Discov Collab, 1(1): p. 10.

5. Khan, A., J. Hahn, W.-C. Cheng, A. Watts, and G. Burns (2006), "NeuroScholar's Electronic Laboratory Notebook and its Application to Neuroendocrinology". Neuroinformatics, 4(2): p. 139-160.

6. Burns, G.A., A.M. Khan, S. Ghandeharizadeh, M.A. O'Neill, and Y.S. Chen (2003), "Tools and approaches for the construction of knowledge models from the neuroscientific literature". Neuroinformatics, 1(1): p. 81-109.

7. Burns, G., F. Bian, W.-C. Cheng, S. Kapadia, C. Shahabi, and S. Ghandeharizadeh (2002), "Software engineering tools and approaches for neuroinformatics: the design and implementation of the View-Primitive Data Model framework (VPDMf)". Neurocomputing, 44-46: p. 1049-1056.

8. Stephan, K.E., L. Kamper, A. Bozkurt, G.A. Burns, M.P. Young, and R. Kotter (2001), "Advanced database methodology for the Collation of Connectivity data on the Macaque brain (CoCoMac)". Philos Trans R Soc Lond B Biol Sci, 356(1412): p. 1159-86.

9. Burns, G., K. Stephan, B. Ludäscher, A. Gupta, and R. Kötter (2001), "Towards a federated neuroscientific knowledge management system using brain atlases". Neurocomputing, 38-40: p. 1633-1641.

10. Burns, G.A. (2001), "Knowledge management of the neuroscientific literature: the data model and underlying strategy of the NeuroScholar system". Philos Trans R Soc Lond B Biol Sci, 356(1412): p. 1187-208.

11. Stephan, K.E., C.C. Hilgetag, G.A. Burns, M.A. O'Neill, M.P. Young, and R. Kotter (2000), "Computational analysis of functional connectivity between areas of primate cerebral cortex". Philos Trans R Soc Lond B Biol Sci, 355(1393): p. 111-26.

12. Hilgetag, C.C., G.A. Burns, M.A. O'Neill, J.W. Scannell, and M.P. Young (2000), "Anatomical connectivity defines the organization of clusters of cortical areas in the macaque monkey and the cat". Philos Trans R Soc Lond B Biol Sci, 355(1393): p. 91-110.

13. Burns, G.A. and M.P. Young (2000), "Analysis of the connectional organization of neural systems associated with the hippocampus in rats". Philos Trans R Soc Lond B Biol Sci, 355(1393): p. 55-70.

C. Research Support

R01 GM 083871-1 (Burns) 4/1/2007– 3/31/2012 4.80 calendar

BioScholar: a Biomedical Knowledge Engineering framework based on the published literature

This work is a continuation of the NeuroScholar project funded by NLM. The major goals of this project is to create a deployable knowledge management / engineering system for bench scientists that may be constructed, curated and maintained within a single laboratory.

HHSN271200800426P (Burns) 5/6/2008 – 8/5/2008 0.75 calendar

Topic Maps for CRISP

The major goal of this project is to build tools to that permit users to browse online ‘topic-maps’ for the CRISP database. This is the forerunner to the current SBIR application

n/a (Kesselman) 6/11/2008 – 6/10/2013 4.80 calendar

St. John's Health Center

Center for Health Informatics

The center for Health Informatics is a large scale, multidisciplinary center (incorporating intelligent systems, high-throughput networking and grid computing) with the mission to deliver turnkey information processing and delivery solutions to the clinical community. Dr Burns plays a leadership role within the center’s approach to biomedical ontologies.

1 R01 LM07061-04 (Burns) 5/01/01-04/30/07

Knowledge Management of the Neuroscientific Literature

This project involves the construction of a knowledge management system for neuroscientific information contained in the literature. It incorporates ontological work, visualization and analysis development and a study of the neural circuits underlying defensive behavior in the rat. This system called NeuroScholar is complete as a functional prototype and has been released as an open source project to the community.

1-year E-Sciences unrestricted cash gift, Ghandeharizadeh (PI) 01/01/05-31/12/06

Microsoft

“Sangam, a system for integrating data to solve stress-circuitry-gene coupling”

This research is a spin-off from work on the NeuroScholar system involving software called Proteus project from the database laboratory at the Computer Science Department at the University of Southern California. This research project is funded by a cash-gift from Microsoft Research and is concerned with developing an ‘EScience’ application that is built on integrating multiple sources of information into a single representation.

|NAME |POSITION TITLE |

|David Newman |Research Scientist |

| |Depeartment of Computer Science |

| |University of California, Irvine |

|eRA COMMONS USER NAME | |

|DAVID NEWMAN | |

|EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, and include postdoctoral training.) |

|INSTITUTION AND LOCATION |DEGREE |YEAR(s) |FIELD OF STUDY |

| |(if applicable) | | |

|University of Melbourne, Australia |B.S. |1986 |Engineering |

|Princeton University |M.S. |1992 |Engineering |

|Princeton University |Ph.D. |1996 |Engineering |

A. Positions and Honors

Professional Experience

1990-1994 Research Assistant, Princeton University, NJ

1995-1996 Research Assistant, Brown University, NJ

1997-1998 Postdoctoral Fellow, California Institute of Technology, CA

2001-2005 Research Scientist, Dept. of Earth System Science, University of California, Irvine, CA

2005-present Research Scientist, Dept. of Computer Science, University of California, Irvine, CA

Honors and Awards

2007 TeraGrid award to investigate large scale topic modeling using TeraGrid resources.

1996 Massachusetts Institute of Technology. Young Investigator Award.

B. Relevant publications (selected from17 publications)

1. Porteous, Newman, Ihler, Asuncion, Welling, Smyth. Fast Gibbs Sampling for Latent Dirichlet Allocation. In ACM SIGKDD Knowledge Discovery and Data Mining 2008.

2. Newman, Asuncion, Welling, Smyth. Distributed Inference for Latent Dirichlet Allocation. In Neural Information Processing Systems 2007.

3. Newman, Hage, Chemudugunta, Smyth. Subject Metadata Enrichment using Statistical Topic Models. In Joint Conference in Digital Libraries 2007.

4. Hage, Chapman, Newman. Enhancing Search and Browse Using Automated Clustering of Subject Metadata. In D-Lib Magazine July/August 2007.

5. Newman, Chemudugunta, Smyth, Steyvers. Analyzing Entities and Topics in News Articles Using Statistical Topic Models. In Intelligence and Security Informatics 2006.

6. Newman, Chemudugunta, Smyth, Steyvers. Statistical Entity-Topic Models. In ACM SIGKDD Knowledge Discovery and Data Mining 2006.

7. Newman, Smyth, Steyvers. Scalable Parallel Topic Models. In Journal of Intelligence Community Research and Development 2006.

8. Newman and Block. Probabilistic Topic Decomposition of and Eighteenth Century Newspaper. In Journal of the American Society for Information Science and Technology, 2006.

9. Teh, Newman, Welling. A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation. In Neural Information Processing Systems 2006.

C. Research Support

NIH NINDS Contract ($80,000) 6/2009 – 10/2009

Topic Maps for CRISP (NIH database of funded grants).

Role: PI

IMLS Research Award ($750,000) 10/2009 – 9/2011

Improving search and browse in digital libraries using topic modeling.

Role: PI

|NAME |POSITION TITLE |

|Bruce Herr |Information Visualization Engineer |

| |ChalkLabs, Bloomington, Indiana |

|eRA COMMONS USER NAME | |

|bherr2 | |

|EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, and include postdoctoral training.) |

|INSTITUTION AND LOCATION |DEGREE |YEAR(s) |FIELD OF STUDY |

| |(if applicable) | | |

|Indiana University |B.S. |2004 |Computer Science |

A. Positions and Honors

Professional Experience

Summer 2008-present ChalkLabs, Bloomington, IN

2004-Summer 2008 InfoVis Lab, Indiana University

B. Relevant publications

1. Katy Börner, Elisha F. Hardy, Bruce W. Herr II, Todd M. Holloway, W. Bradford Paley. (2007). Taxonomy Visualization in Support of the Semi-Automatic Validation and Optimization of Organizational Schemas. Journal of Infometrics. Vol. 1(3), 214-225, Elsevier.

2. Bruce W. Herr II, Weixia Huang, Shashikant Penumarthy, Katy Börner. (2007). Designing Highly Flexible and Usable Cyberinfrastructures for Convergence. In Bainbridge, William S. & Roco, Mihail C. (Eds.), Progress in Convergence - Technologies for Human Wellbeing (Vol. 1093, pp. 161-179), Annals of the New York Academy of Sciences, Boston, MA.

3. Bruce W. Herr II, Weimo Ke, Elisha F. Hardy, Katy Börner. (2007). Movies and Actors: Mapping the Internet Movie Database. Conference Proceedings of 11th Annual Information Visualization International Conference (IV 2007), Zürich, Switzerland, July 4-6, IEEE Computer Society Conference Publishing Services, pp. 465-469.

C. Research Support

NSF IIS-0238261 Borner (PI) 10/01/03 - 09/30/08

National Science Foundation

CAREER: Visualizing Knowledge Domains

This project aims to bring the power of knowledge domain visualizations to any desktop connected to the Internet.

Role: software developer

NSF IIS-0513650 Borner (PI) 09/01/05 - 08/31/08

National Science Foundation

SEI: NetWorkBench: A Large-Scale Network Analysis, Modeling and Visualization Toolkit for Biomedical, Social Science and Physics Research.

This project will design, evaluate, and operate a unique distributed, shared resources environment for large-scale network analysis, modeling, and visualization, named Network Workbench (NWB).

Role: software developer

|NAME |POSITION TITLE |

|Gavin LaRowe |Chief Technologist & CEO |

| |ChalkLabs, Bloomington, Indiana |

|eRA COMMONS USER NAME | |

|GL-CHALKLABS | |

|EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, and include postdoctoral training.) |

|INSTITUTION AND LOCATION |DEGREE |YEAR(s) |FIELD OF STUDY |

| |(if applicable) | | |

|University of Puget Sound |B.A. |1995 |Foreign Literatures & Computer Science |

|Indiana University |M.S. |2006 |Information Science |

A. Positions and Honors

Professional Experience

Summer 2008-present ChalkLabs, Bloomington, IN

Fall 2007-Summer 2008 Mind Alliance Systems, Roseland, NJ

Spring 2004-Fall 2007 InfoVis Lab, Indiana University

Honors and Awards

Fall 2004 Fellow, Swedish Collegium for Advanced Study, Uppsala, Sweden

Spring 2006 Fellow, Swedish Colledium for Advanced Study, Uppsala, Sweden

B. Relevant publications

1. Gavin LaRowe, Sumeet Ambre, Weimao Ke & Katy Börner (2008). The Scholarly Database and Its Utility for Scientometrics Research. To appear in the special issue of Scientometrics on the 11th ISSI.

2. LaRowe, Gavin, Ambre, Sumeet Adinath, Burgoon, John W., Ke, Weimao & Katy Börner. (2007). The Scholarly Database and Its Utility for Scientometrics Research. Torres-Salinas, D & Moed, H F. (Eds.), Proceedings of the 11th International Conference on Scientometrics and Informetrics, Madrid, Spain, June 25-27, pp. 457-462.

3. LaRowe, Gavin, Ichise, Ryutaro & Börner, Katy. (2007). Visualizing Japanese Co-Authorship Data. Proceedings of the 11th Annual Information Visualization International Conference, Zürich, Switzerland, July 4-6, IEEE Computer Society Conference Publishing Services, pp. 459-464.

C. Research Support

NSF IIS-0534909 Borner (PI) 3/15/2006 – 8/01/2007

National Science Foundation

COLLABORATIVE SYSTEMS: Social Networking Tools to Enable Collaboration in the Tobacco Surveillance, Epidemiology, and Evaluation Network (TSEEN). The project is a pioneering effort at incorporating social network referral tools as an integral part of collaborative systems within the context of digital government. First, the proposed project will extend theoretical understanding of the emergence of collaboration network structures involving multidimensional networks.

Role: DBA & Software Lead

h. Subcontractors/Consultants

ChalkLabs will be sub-contracting work out to topicSeek LLC and ISI. All work for topicSeek LLC will be performed solely by Dr. Newman who will perform topic modeling related work described in Section c.4 “Integrate the Topic Model algorithm to analyze and generate similarity networks for visualization”. ChalkLabs will assign specific data analysis and modeling tasks to topicSeek. Newman will be responsible for generating similarity networks of documents as well as assisting in the design of the data model to be used in the software application. Newman’s role is also indicated in an accompanying letter of support from topicSeek. All work for ISI will be performed by Dr. Burns, who is a subject-matter expert in the area of biomedical research. ChalkLabs will use Burns’ expertise in order to collect requirements from NIH researchers, design the user interface, perform usability testing and create the final report as described in Sections c.1 and c.6. Burn’s role is also indicated in an accompanying letter of support from ISI.

i. Facilities and Equipment

Name: ChalkLabs, Bloomington, IN (offeror organization)

Primary Contact: Gavin LaRowe, Owner & CEO

Desc: ChalkLabs is small business entity that focuses on advanced research and web applications development and services, with strong emphases in network science, information visualization, and data mining.

Office: ChalkLabs is located in the IU Research and Technologies Park in the Showers complex in Bloomington, IN. The primary mission of the Research and Technology Park is to, 'establish a first-class research park at Bloomington that will be a focal point for future partnerships between university researchers and industry.' With over 52,000 square feet of leasable office space, many cutting-edge IU-related technology and research organizations such as the Pervasive Technologies Lab, the Advanced Network Management lab, the Open Systems lab, and the Internet2 research offices are located here. In addition, six other IT-related businesses including Information In Place, Inc., RightRez occupy this space. ChalkLabs is also part of the Inventure technology incubator run by the Small Business Development Corporation (SDBC) in Bloomington, IN, providing access to many local businesses, venture capital, and finance organizations who are affiliated with the SDBC.

Staff: As of 9/02/08, ChalkLabs has 5 FTEs. Based on current projections, this number will double by December of 2008.

Computer: ChalkLabs administers a robust network of Linux-based, OS X, and Windows-based servers and desktop machines dedicated towards both research and development. All of these computers are protected by an uninterruptible power supply and backup generators for 24x7x365 operation. Research project staff have an average of over one workstation per staff member, connected to a high performance switched 10Gbps ethernet LAN backbone with 10Gbps connectivity to external research networks. Aside from internal computing resources, various staff members have access to research grid-computing and cluster-computing infrastructures at Indiana University, such as 'Big Red' and 'AVIDD'. The Big Red Cluster -- IU's latest high-performance computing system -- is a 512-node distributed shared-memory cluster, designed around IBM's BladeCenter JS21. Each JS21 node contains two dual-core 2.5GHz PowerPC 970MP processors, 8GB ECC SDRAM, a 72GB SATA hard disk for local scratch space, and a PCI-X Myrinet 2000 adapter for high-bandwidth, low-latency MPI applications. In its initial configuration, the cluster is running SuSE Linux Enterprise Server 9, with IBM's LoadLeveler and the Moab Workload Manager for batch job management. Big Red users have access to a 266TB GPFS filesystem for analysis and temporary storage of large datasets, as well as native access via the Lustre client to the 535TB Data Capacitor. IBM's PowerPC 970MP processor contains two double precision floating point units per core. A single node contains four cores, each capable of four floating point operations per cycle. The Myrinet 2000 interconnect provides a 2+2Gb/s low-latency (2.6-3 microseconds) network for MPI communication. Each JS21 is equipped with a Myricom M3S-PCIXD-2-I adapter connected directly to one of two Myricom M3-CLOS-ENCL 256-port switches. Research and development staff have an average of two workstations per staff member, connected to a high-speed switched 12Gbps ethernet LAN. The AVIDD (Analysis and Visualization of Instrument-Driven Data) facility is a distributed 2.2 TeraFLOPS Linux cluster, including more than 10 TB of disk space. AVIDD is frequently used to parse and analyze very large data sets. IU also operates a 1.02 TFLOPS IBM SP, which includes a large-memory Regatta node (96 GB RAM for this node alone). The Regatta node of the SP is often used for analysis of very large data sets, and for importing external public data sources into Oracle databases.

_______

Name: Information Sciences Institute (ISI), Marina del Rey, CA

Primary Contact: Gully Burns, ISI

Desc.: ISI-USC is one of the premier institutes for advanced artificial intelligence research and new media

applications to education and training, with more than 300 researchers working on innovative technology

applications sponsored by DARPA, ARDA, NSF, NSA, and other agencies.

Office: ISI includes office space in a 12 story office building located 20 minutes from the main USC campus; each floor of the building includes multiple conference rooms and video conference units. All team members have separate near-adjoining offices on the north side of the building on the fourth floor.

Staff:

Computer: The computer center has been an integral part of ISI since its founding in 1972. Today's Information Processing Center (IPC) maintains a state-of-the-art computing environment and staff to provide the technical effort required to support the performance of research. Resources include client platform and server hardware support, distributed print services, network and remote access support, operating systems and application software support, computer center operations, and help desk

coverage. The IPC also acts as a technical liaison to the ISI community on issues of acquisition and integration of computing equipment and software.

The Center's servers are protected by an uninterruptible power supply and backup generator to ensure availability 24 hours a day, 365 days a year. A rich mix of computer and network equipment along with modern software tools for the research community's use provide a broad selection of capabilities, including Unix-based Sun servers and Windows-based Dell servers used for electronic mail and group calendaring, web services, file and mixed application serving. File servers utilize high-performance RAID and automated backup to facilitate performance and data protection. Computer room space is also available to researchers for hosting project-related servers. In addition, research staff have access to grid-enabled cluster computing, and to USC's 5,400-CPU compute cluster with low latency Myrinet interconnect that is the largest academic supercomputing resource in Southern California. All printers (color and b/w) are networked and available for unrestricted use. This includes one color photocopier per floor.

Research project staff have an average of over one workstation per staff member, connected to a high performance switched 10Gbps ethernet LAN backbone with 10Gbps connectivity to research networks such as Internet 2, as well as additional network resources such as IP multicast, 802.11b and 802.11g wireless, H323 point-to-point and multipoint videoconferencing, webcasting and streaming media.

Bibliography

Adai, A. T., Date, S. V., Wieland, S., & Marcotte, E. M. (2004). LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks. Journal of Molecular Biology, 340(1), 179-190.

Alvarez-Hamelin, J. I., Dall'Asta, L., Barrat, A., & Vespignani, A. (2005). K-Core decomposition: A tool for the visualization of large scale networks. Arxiv Preprint Cs.Ni/0504107.

Batagelj, V., & Mrvar, A. (1998). Pajek-Program for large network analysis. Connections, 21(2), 47-57.

Bederson, B. B., Shneiderman, B., & Wattenberg, M. (2002). Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies. ACM Transactions on Graphics (TOG), 21(4), 833-854.

Blei, D.M., A.Y. Ng, and M.I. Jordan (2003), "Latent Dirichlet Allocation". Journal of Machine Learning Research, 3: p. 993-1022

Borner, K., Sanyal, S., & Vespignani, A. (2007). Network science. Annual Review of Information Science and Technology, 41, 537.

Burns, G., B. Herr, D. Newman, T. Ingulfsen, P. Pantel, and P. Smyth (2007), "A snapshot of neuroscience: unsupervised natural language processing of abstracts from the Society for Neuroscience 2006 annual meeting". in Annual Meeting of the Society for Neuroscience. San Diego. . p. 100.6 / XX26

Davidson, G. S., Hendrickson, B., Johnson, D. K., Meyers, C. E., & Wylie, B. N. (1998). Knowledge mining with vxinsight: Discovery through interaction. Journal of Intelligent Information Systems, 11(3), 259-285.

Di Battista, G., Eades, P., Tamassia, R., & Tollis, I. G. (1998). Graph drawing: Algorithms for the visualization of graphs. Prentice Hall PTR Upper Saddle River, NJ, USA.

Ellson, J., Gansner, E. R., Koutsofios, E., North, S. C., & Woodhull, G. (2003). Graphviz and dynagraph: Static and dynamic graph drawing tools. Graph Drawing Software, 127-148.

Fruchterman, T. M. J., & Reingold, E. M. (1991). Graph drawing by force-directed placement. Software- Practice and Experience, 21(11), 1129-1164.

Griffiths, T.L. and M. Steyvers (2004), "Finding scientific topics". Proc Natl Acad Sci U S A, 101 Suppl 1: p. 5228-35

Lima, M. (2007). Visual complexity. Online at: Last Accessed: Oct 13, 2008

Martin, S., W.M. Brown, R. Klavans, and K.W. Boyack (2007), "DrL: Distributed Recursive (Graph) Layout". Journal of Graph Algorithms and Applications, 1(1): p. 1

Moere, A. V. (2007). Information aesthetics. Online at: . Last Accessed: Oct 13, 2008

Newman, D., C. Chemudugunta, P. Smyth, and M. Steyvers (2006), "Analyzing Entities and Topics in News Articles Using Statistical Topic Models". in LNCS -- IEEE ISI. San Diego.

Palantir Technologies. (n.d.). Palantir technologies. Online at: Last Accessed: Oct 13 2008

Plaisant, C., Grosjean, J., Bedersonn, B. B., (2002). Spacetree: Supporting exploration in large node link tree, design evolution and empirical evaluation. University of Maryland College Park, Human-Computer Interaction Lab.

Schroeder, W., Martin, K. M., & Lorensen, W. E. (1998). The visualization toolkit: An object-oriented approach to 3D graphics. Prentice-Hall, Inc. Upper Saddle River, NJ, USA.

Siek, J., Lee, L. Q., & Lumsdaine, A. (2002). The boost graph library: User guide and reference manual. Addison-Wesley.

TouchGraph, L. L. C. Touchgraph. Online at: . Last Accessed: Oct 13, 2008

Tong, A.H., G. Lesage, G.D. Bader, H. Ding, H. Xu, X. Xin, J. Young, G.F. Berriz, R.L. Brost, M. Chang, Y. Chen, X. Cheng, G. Chua, H. Friesen, D.S. Goldberg, J. Haynes, C. Humphries, G. He, S. Hussein, L. Ke, N. Krogan, Z. Li, J.N. Levinson, H. Lu, P. Menard, C. Munyana, A.B. Parsons, O. Ryan, R. Tonikian, T. Roberts, A.M. Sdicu, J. Shapiro, B. Sheikh, B. Suter, S.L. Wong, L.V. Zhang, H. Zhu, C.G. Burd, S. Munro, C. Sander, J. Rine, J. Greenblatt, M. Peter, A. Bretscher, G. Bell, F.P. Roth, G.W. Brown, B. Andrews, H. Bussey, and C. Boone (2004), Global mapping of the yeast genetic interaction network. Science, 303(5659): p. 808-13

van Ham, F., & van Wijk, J. J. (2003). Beamtrees: Compact visualization of large hierarchies. Information Visualization, 2(1), 31-39.

Current Awards and Pending Proposals/Applications

None

-----------------------

Figure 1: Screenshot mockup of proposed application. This web-based system provides a navigable map of a document collection.

Figure 2: Diagram shows high-level technical architecture of the proposed application.

Figure 3: Screenshot of the development prototype to demonstrate navigation of grant abstracts showing relationship between NHLBI and NINDS grants relating to blood.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download