Requirements for Statistical Analytics and Data Mining

OpenBudgets.eu: Fighting Corruption with Fiscal Transparency

Project Number: 645833

Start Date of Project: 01.05.2015

Duration: 30 months

Deliverable 2.3 Requirements for Statistical Analytics and Data Mining

Dissemination Level

Public

Due Date of Deliverable

Month 12, 30.04.2016

Actual Submission Date

01.06.2016

Work Package

WP 2, Data Collection and Mining

Task

T 2.3

Type

Report

Approval Status

Final

Version

1.0

Number of Pages

32

Filename

D2.3 - Requirements for Statistical Analytics and Data Mining.docx

Abstract: In this deliverable we present requirements for statistical analytics and data mining in the OpenBudgets.eu (OBEU) platform. Based on user needs assessed and reported in previous OBEU deliverables we formulate data mining and analytics tasks, discuss related tools and algorithms, and finally define corresponding requirements.

The information in this document reflects only the author's views and the European Community is not liable for any use that may be made of the information contained therein. The information in this document is provided "as is" without guarantee or warranty of any kind, express or implied, including but not limited to the fitness of the information for a particular purpose. The user thereof uses the information at his/ her sole risk and liability.

Project funded by the European Union's Horizon 2020 Research and Innovation Programme (2014 ? 2020)

History

Version 0.1 1.0

Date 11.05.2016 31.05.2016

Reason Version for internal review Final version for submission

D2.3 ? v.1.0

Revised by Kleanthis Koupidis Christiane Engels

Author List

Organisation Fraunhofer OKFGR OKFGR UBONN UBONN UEP UEP UEP UEP

Name Christiane Engels Charalampos Bratsas Kleanthis Koupidis Fathoni Musyaffa Fabrizio Orlandi David Chud?n Jaroslav Kucha Jindich Mynarz V?clav Zeman

Contact Information christiane.engels@iais.fraunhofer.de char.brat@ koupidis.okfgr@ musyaffa@iai.uni-bonn.de orlandi@iai.uni-bonn.de david.chudan@vse.cz jaroslav.kuchar@vse.cz mynarzjindrich@ vaclav.zeman@vse.cz

Page 2

D2.3 ? v.1.0

Executive Summary

In this deliverable we present the requirements for statistical analytics and data mining in the OpenBudgets.eu (OBEU) project. We start by elaborating the methodology used to collect the data mining and statistical analytics requirements. After identifying sources of collected data mining and analytics needs in previous OBEU deliverables, these needs are summarized. We continue with mapping those needs onto corresponding data mining and analytics tasks. A discussion regarding appropriate algorithms for the identified tasks follows. Based on the collected tasks, we describe related tools. Finally, we formulate the list of requirements for data mining and statistical analytics along with a priority for each requirement.

Page 3

Abbreviations and Acronyms

CSV DCV RDF OBEU

Comma-Separated Values Data Cube Vocabulary Resource Description Framework OpenBudgets.eu

D2.3 ? v.1.0

Page 4

D2.3 ? v.1.0

Table of Contents

1 INTRODUCTION ....................................................................................................... 8

2 PRELIMINARIES ...................................................................................................... 8

2.1 SEMANTIC MODEL ........................................................................................... 8

2.2 METHODOLOGY ............................................................................................... 9

3 DATA MINING AND ANALYTICS NEEDS AND TASKS .......................................... 9

3.1 SOURCES OF COLLECTED DATA MINING AND ANALYTICS NEEDS ........... 9

3.2 COLLECTED DATA MINING AND ANALYTICS NEEDS ..................................10

3.2.1

Analysis of the required functionality of OpenBudgets.eu (D4.2) .......... 10

3.2.2

User Requirements Reports ? First Cycle (D5.1) .................................. 11

3.2.3

Needs Analysis Report (D6.2) .............................................................. 12

3.2.4

Assessment Report (D7.1) ................................................................... 12

3.2.5

Stakeholder identification and outreach plan (D8.3).............................. 12

3.2.6

Additional Needs .................................................................................. 14

3.3 DATA MINING AND ANALYTICS TASKS .........................................................14

3.4 SUMMARY OF DATA MINING AND ANALYTICS TASKS ................................21

3.5 DISCUSSION OF IDENTIFIED DATA MINING AND ANALYTICS TASKS........21

3.5.1

Similarity Learning ................................................................................ 21

3.5.2

Rule/Pattern Mining .............................................................................. 22

3.5.3

Outlier/Anomaly Detection .................................................................... 22

3.5.4

Clustering ............................................................................................. 23

3.5.5

Graph/Network Analysis ....................................................................... 23

3.5.6

Pattern Matching .................................................................................. 24

3.5.7

Descriptive Statistics ............................................................................ 24

3.5.8

Comparative Analysis ........................................................................... 25

3.5.9

Time Series Analysis ............................................................................ 25

4 TOOLS .....................................................................................................................26

4.1 RAPIDMINER....................................................................................................26

4.2 WEKA ...............................................................................................................26

4.3 R .......................................................................................................................26

4.4 PYTHON ...........................................................................................................27

4.5 SPARQL ...........................................................................................................27

4.6 OPENSPENDING .............................................................................................27

4.7 EASYMINER .....................................................................................................28

5 REQUIREMENTS FOR STATISTICAL ANALYTICS AND DATA MINING ..............29

5.1 GENERAL FUNCTIONAL REQUIREMENTS ....................................................29

Page 5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download