Title of the Paper (18pt Times New Roman, Bold)



An assistance environment for the classification

of knowledge on risk project

Rachid CHALAL, Razika OUAMANE

Laboratoire MCS “ Méthodes de Conception de Systèmes”,

INI (Institut National d’Informatique)

BP 68M Oued Smar, Alger, Algérie

Abstract: - Project risk Management is making the object of several recent studies. However it does not exist a well mastered and well known lexicon domain for risk management. In other way, the tools of assistance for terminology construction of corpus have known an important development these last years. The assistance tools for the classification of knowledge on the risk are little numerous. It is to palliate these insufficiencies that our work has been realized. It presents a assistance tool for the acquisition and the classification of knowledge on the risk. The environement is a tool, allowing the generation of concepts and relationship, from a corpus of text, in order to lead to an ontology of the domain.

Key-Words: - Knowledge, concept, term, risk, relationship, classification.

1 Introduction

Projects risk management has become these last years, for many enterprises a major preoccupation. The analysis that we have made of project risk management [10], [4],[6], [5], [8], reveals that the classification of risks has become an integral part of project management. However one of the classification has been neglected, this one concerns the field of the artificial intelligence and that comprises a classification of knowledge on risks allowing to generate concepts of the domain and to master the vocabulary. In the domain of risk management, there is no available unified terminology. There exists several terminologies, some coming from the research (Quotations ) , trades ( Risk Manager, Risk Controller ...). it is therefore indispensable to work on a thematic classification of risks to lead to an assistance to the normalization of the vocabulary used and to a classification of risks. The classification of risks is going to permit an improvement of the process of the risk management and an elimination of the conceptual and terminological confusion that tends to a shared comprehension. Facing this problematical, we developed the tool. The tool acquires a totality of concepts and relationships (hierarchical, none hierarchical) between concepts, allowing the definition of an ontology of the domain of project risk management.

First, a set of definitions will be presented. Then we will present the method of construction of the ontology allowing the generation of the classification. We will chain up by a presentation of the environrmrnt architecture , describing each of its modules. Finally we will conclude on the perspectives and the evolution of our work.

2 Définitions

2.1 Ontology

The ontology is the formal transcription of the knowledge whole, one is going to identify concepts and relations then between concepts. This formalisation is a compilation that proves out to be necessary in a communication human-human attended by a machine. Ontologies constitute powerful resources to share the knowledge, they generated a big interest in artificial intelligence, and in a big variety of disciplines that is confronted to the problem of integration of information and interpretation of data [7].

2.2 Application domain : Project Risk Management

2.2.1 Definition of Risk Management

Risk Management includes the processes concerned with identifying, analyzing, and responding to risk. It includes maximizing the results of positive events and minimizing the consequences of adverse events [10],[4],[5],[8].

2.2.2 Classification of risks in projects

▪ we can classify risks in different manners:

▪ The position in the project : Internal and external risks [1]

▪ The nature : Technical, organizational, humans risks

▪ The origin : Suppliers, clients, competitors [12] ,

▪ The consequence : Passing of costs, no respect of periods .

3. The Method of construction of an ontology for the domain of Project risk management

3.1 Context of the method

The proposed method obliges us to place us at the confluence of several varied disciplines and increasingly tallied since several years. As it is watched in the bellow figure (Fig. 1).

[pic]

Fig. 1. Context of our method

3.2 Method description

The developed ontology will allow to identify risks, and to adapt appropriate tools to classes them, by defining a lexicon for risks domain to be able to use it on a large scale. This ontology will allow to associate to risks a precise and well defined semantics and to classes them inside a hierarchy and connecting them by semantic relationships. A cooperative approach has been envisaged, where text analysis tools extract terms and the characteristic relationship of the domain, and facilitate so the deduction of concepts, thereafter an expert's cooperation will permit to validate these results and to correct them. A current at the moment [16] aims to replace the corpus as a starting point in the construction of ontology. It is advised to keep bonds between the term and its occurrences in the corpus, so as to facilitate its comprehension. According to [3] the corpus is indeed the main source of knowledge to our disposition. We place ourself totally in this optics. The acquisition method (Fig. 2) inspires from linguistic models based on the repeated segment method [13], and from works of the distributed analysis of Z. Harris [11].

Fig. 2. Detailed description of the method [14]

4. Architecture of environement

The tool is a support tool to the method proposed above. It allows the acquisition and the classification of knowledge on the risk from a corpus of text. Its functioning rests mainly on two modules : one is an extractor of terms and hierarchical relationships : TERMS, and the second is an extractor of none hierarchical relationships: LINKS (Fig. 3). Therefore, we are going to acquire terms allowing to designate concepts of the domain. These terms will be exploited in the second module (LINKS) to acquire none hierarchical relationships of the domain, by exploring linguistic contexts between them. Expert of the terminology used in the domain of risk management is responsible for the construction procedure of the ontology.

Fig. 3. Architecture of the environement

5. TERMS Module

5.1 Preprocessing

At this level, a set of processing is realized. It is described through quoted points here after. The result of this processing is a set of words of the corpus, as well as associated filters.

[pic]

Fig. 4. Perusal and pre processing of the corpus

5.1.1 Cutting of the text in a continuation of words

It allows to cut the text in a continuation of words by using separating characters and characters of spacing (It concerns mainly blank characters).

5.1.2 Abbreviation, lack of spelling

A good number of composed expressions is used in texts of abridged manner (for example UR : for Unit Risk). Besides, the mistakes of spelling on technical terms which are frequently found in electronic sources (for example in Internet). These elements must be taken in account if we don't want to lose interesting information.

5.1.3 Lemmatisation

To regroup under a same lemme singular and plural, we considers that the lemme is :

The infinitive form for verbs

Male, singular of the lexème for others

5.1.4 List of words of the corpus

This list is noted Lvocable, and is obtained from corpus by calculating the frequency of the different words . It contains the different words of the language (Prepositions, verb, adjective,...), and also punctuations and special characters used in the corpus

5.1.5 Generation of filters

These filters will be used in the filtering of repeated segments. The first filter Filgram (grammatical) includes the conjunctions, prepositions, pronouns, ... The second filter Filponc (punctuation and special characters), is established once for all, it will be used on any corpus. Finally, Filverb contains the verbal filter, including verbs of the domain.

5.1.6 Constitution of a lexicalised and filtered list of the corpus : Llex

A lexicalised and filtered list of words of the corpus Llex is a strating list that allows the repeated segment calculation (indeed, all repeated segments have to begin with a word of this list). It is obtained by using Lvocable words of the corpus and Filgram filter, Filponc Llex=Lvocable/(Filgram+Filponc) (we eliminate filters of Lvocable).

5.2 Terms extraction

5.2.1 Extraction of the terms and the arborescence of terms

The method of term acquisition is based on the calculation of repeated segments [13]. The aim is to notice terms that are the nominal syntagmes and to structure them in tree form. To filter repeated segments we use all previously described filters. More, we use elimination rules of useless segment ( for example, cutting a segment in two ). This method of repeated segment calculation is therefore easily transfered to other languages (for example English). Since it suffices to use the filter and cutting rules of segment in the new language without modifying the processing algorithms [15].

[pic]

Fig. 5. Visualisation fo repeated segments

We summarize these stages in the following points :

▪ Calculation and filtering of repeated segments

▪ Filtering of segments by a specialist, constitution of the composed term list

▪ Structuring of terms in trees and calculation of heads, Fusion of the list of term heads and the list of composed terms

▪ Locating new terms

▪ Constitution of the list of terms by including new terms.

5.2.2 Validation of results

The cooperation of the expert is made all along of the process, after each stage the expert is invited to validate terms and to extracte relationships by allowing at any given time a return to the text. The expert acts in two stages, the first remains in the domain of the lexical processing and exploits data retained by the anterior stage; the second part focuses on the semantic interpretation and the structuring of concepts. In the course of this normalization, the mass of data is little by little reduced. Then an itérative process will allow to infer other terms non identified by the repeated segments method. Since this method does not allow to detect sequences of terms not repeated in the text [14].

[pic]

Fig. 6. Choice of concepts to retain

5.2.3 Constitution of thedomain terms list

In the above example describing a tree of terms, the segment "risque externe nouveaux produits de substitution", allows us to have as head the term " Risque ", and also a set of terms (externe, nouveau, produit, substitution). Those are simple terms obtained after a process of structuring terms in a tree. To acquire an obtained term list, it is necessary to add simple terms (head of term LISTSIMPLE) to composed terms ( obtained Segment LISTCOMP). The result is stocked in a list LISTRISK=LISTSIMPL+ simple Terms of LISTCOMP. The term lexicon on risks is included in LISTRISK.

6. LINKS Module

we have used the notion of distribution that is a notion more stable than the notion of senses. The notion of senses is wrongley defined. The distribution of an element is indeed the set of contexts of this element in the corpus. It allows for example to classify elements of the text belonging to a same linguistic category by using properties of the distributed analysis [11]. The goal is to extract automatically syntactic diagrams expressed between two terms, to avoid to get associations between terms introducing the noise, the distance of research between two terms automatically will be lower to 15 words.

We are going to restrain the study to sentences that contain two terms, we distinguishes then, next points :

▪ Term1 verb Term2

▪ Term1loc Teme2 (Loc : verbal expression)

6.1 Extraction of relationships of the typical Term1 Verb Term2

It concerns the extraction of relationship diagrams expressed by a verb between two terms. The hypothesis of the acquisition model is that the verb is considered as pivot of the sentence expressing a linguistic relationship between two concepts of the domain [9].

Obviously, we can put in the class that contains the terms (arguments) like left and right of a verb. They are represented in the form of list of terms.

If all terms of a class have the same number of occurrence in the class then we choose the first term of the list as representive the class (name of the class), otherwise it is the most frequent term of the class that will be considered as representing the class.

Relationship : impliquer

Left Class: Financier

Financier

produit

personnel

Right Class : Juridique

juridique

environnemental

technique

The relationship is the form : Class1 Relationship Class2

6.2 Extraction of relationships of the typical Term1 Expression Term2

It allows to synthesize contexts situated between two terms in a same sentence. The goal is to extract diagrams of relationships expressed by an expression between two terms. Expressions situated between two terms of the corpus. LISTRISK will be located, by having the form Term1 expression Term2 [2] ( figure 7).

[pic]

Fig. 7. Creation of relations

6.3 Validation of obtained relationships

The validation in the preceding stage has allowed us to validate a list of terms and to regroup them hierarchically, this second validation will allow us to regroup them in relationships of the type: Class1 Argument Class2 (where argument can represent a verb or an expression). This stage consists simply to present the different relationships discovered by the expert, that he will retain or reject, or even reappoint some relationship ( figure 8).

[pic]

Fig. 8. Validation of relations

Conclusion

The environement will contribute to bring a considerable assistance to the project manager, since it allows an identification more rapid and more exhaustive of risks. The ontology of the risk will allow to reduce the existent terminological and conceptual confusion in this domain and set a shared comprehension.

This paper presents two main contributions in the domain of the text and analysis processing.

Firstly, it confirms that the quantitative linguistic methods are simple and easy to implémenter. Accompanied by an iteratif and incremental process of textual data acquirement, they can be a great interest for the location and the classification of knowledge described in texts. Secondly, the adaptation of such a method to other languages that the French is possible and comfortable, seen that the used linguistic resources are not bound to the availability of dictionaries or syntactic analysts. The modular approach of the system, permits to add other modules and to integrate other sources of knowledge. This work can be completed and extended for different domains and applications, it has been presented like a flat experimental forms for the domain of project risk management. A possible application is the utilization of the system for the definition of a lexicon of terms in other domains, it can, notably, to be adapted in order to use him for other finalities as Thesaurus, Indexing. The even semiautomatic character of the system, oblige the user to intervene for the validation of results can be improven appreciably to not tell to quit to the profit of an automatic processing.

References:

[1] Alquier A.M, Salles M., M.H. Tignol, Early Project Oganization and start-up techniques based on project management information system, Nordnet, Helsinki, 1999.

[2] Barthelemy T., Apprentissage automatique de schémas relationnels à partir de textes, Rapport de DEA. Laboratoire LIIA ENSAIS, Strasbourg, 1995.

[3] Bourigault D., Charlet J., Ontologies et textes Groupe terminologie et intelligence artificielle, IC’ 2000. France

[4] Chapman C., Stephen Ward S., Project Risk Management Process, Techniques and insights, John Wiley & Sons, 1996.

[5] Courtôt H., La gestion des risques dans les projets, Edition Economica, 1998.

[6] Declerck R.P, Emery P., Crener M.A, Le management stratégique des projets, Edition hommes et techniques 1997.

[7] Dieng-Kintz R., Corby O., Gandon F., Giboin A., Méthodes et outils pour la gestion des connaissances une approche pluridisciplinaire du knowledge management, Edition Dunod 2001.

[8] Duncan W.R., A guide to the project Management body of knowledge PMBOK, PMI Standard Committe, Project Management Institute, PA 19082 USA, 2000.

[9] Gardiner D. A. Verb-based relations and conceptual proximity in information retrieval, thesis submitted to the faculty of the graduate school in partial fulfillment of the requirements for the degree of doctor of philosophy, University of Minnesota, 1996.

[10] Giard V., La gestion des projets, Editions Economica, 1993.

[11] Harris Z. S. Langage and information, Columbia University Press, New York. 1988.

[12] Larmand G., La maîtrise des risques dans les contrats de vente, Edition Afnor Gestion, 1993.

[13] Lebart L., Salem A., Statistique textuelle, Edition Dunod. 1994.

[14] Ouamane R., Chalal R., Classification des risques dans les projets, 4e Conférence Francophone de MOdélisation et SIMulation : Organisation et Conduite d’Activités dans l’Industrie et les Services, MOSIM’03, Toulouse, France, 23-25 Avril 2003.

[15] Oueslati R, Apprentissage automatique de connaissance à partir de corpus Equipe Inférence et apprentissage, laboratoire de recherche en informatique, Université Paris Sud. 1999

[16] Sereno B., Corbel C., Girardot J.J., Le projet COSI : recherche d’informations assistée par les concepts, RJCIA2000 - Lyon, France, Septembre 2000.

-----------------------

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download