Intelligent information mapping and summarization (Pertinence Summarizer)

Connivences.info shows graphically and in a simple way relationships between actors from news feeds involved in national and international politics, economics, etc. Every week or every day you will discover, in the form of round tables, the "connivence maps" between companies, politicians, etc.

These maps are interactive : a click on an actor on a map will show a list of all articles with summarization in which it is cited, a click on a link between two actors on a map will show a list of all articles in which the two actors are both cited.
 

DEMO: Pertinence summarizes Google search results

In this DEMO, you can access to the integrated summarization function applied to output from the Google search engine.
Pertinence Summarizer can summarize automatically and dynamically a Google hit list. After launching a query, click on the new link "Summary" and enter the login "google" and password "google".

It has practical features:
- For all the terms of the query, the corresponding occurrences are highlighted and you can navigate between occurrences with the Tab key.
- Pertinence Summarizer can summarize the content of the text retrieved by Google, by a percentage selected from the orange bar (supported formats: MSWord, PDF, txt, html…)
 
The same function is available for French documents at : http://www.pertinence.net/google/fr

Pertinence’s automatic summarization functionality may be integrated into any other search engine over the Internet or an Intranet.

Active and real time multilingual news thread with online summarization
 ONLINE MULTILINGUAL TEXT SUMMARIZATION OF MULTILINGUAL NEWS

MULTILINGUAL WATCH APPLICATION with  AUTOMATIC TEXT SUMMARIZATION   Online 

Text structuration leading to an automatic summarization system : RAFI (1999) 
Abderrafih Lehmam

For 2001-2005 see : www.pertinence.net

Abstract

The automatic text summary concerns the language industries. This work proposes a system automatically and directly transforming a source text into a reduced target text. The system deals exclusively with scientific and technical texts. It is based on the identification of specific expressions allowing an evaluation of the relevance of the sentence concerned, which can then be selected for the elaboration of the summary. The procedure consists in attributing a score to each sentence of the text and then eliminating those having the lowest scores. To produce the RAFI system (Automatic Summary based on Indicative Fragments), we resorted to the linguistic means of discourse analysis and the computing capacity of data processing instruments. This system would be adapted to Internet.

KeyWords

Computer scientific linguistics, Extraction of information, Research of information assisted by computer, Scientific and Technical Discourse, Thesaurus, Automatique Text Summary, Documetary system, Cognition

Awards:

The system RAFI was the subject of a doctoral thesis defended at the University Nancy2 (December 1995). This work receive " le Grand Prix de la Recherche 1996" awarded by la Société Industrielle de l'Est, France.

 Introduction

The study undertaken in this work concerns the design and construction of a system of Automatic Text Summary (ATS) which sums up Scientific and Technical Texts (STT). According to UNESCO, we currently publish about 30 000 scientific and technical articles a year, moreover, we offen encounter this type of texts when we "browse" on Internet. Summarizing an article is a complicated and long drawnout activity. On an average, "a human summarizer" can sum up to eight articles a day. This performance is insufficient for the needs of researchers who are eager for a great amount of information. These "human summaries" remain however very limited in number in spite of their quality. That is the reason why automatic carrying out of text summaries proves to be very interesting and necessary. The automatic text processing will save time for researchers and in a wider sense increase the possibilities of research. The presence of digital texts on different networks and particularly Internet constitutes a very great amount of textual information. This information is not read frequently, but if we process a system which permits us to condense or sum up texts, then we can make use of this textual information without losing too much time. Named RAFI (Résumé Automatique à Fragments Indicateurs), the system that we have designed automatically transforms a long source text into several versions of more condensed texts. Summaries achieved by RAFI could then be used by readers without looking up the source text. This summary generating system would be adapted to and used by Internet . Thus we'd like to present it as follows.

RAFI offers attractive possibilities in the management and the screening of scientific and technical information making available to the user a summary which picks the phrases considered important from the original text with a view to quickly access the information conveyed in the source text. With RAFI it is also possible to supply graded summaries of the same text so that the user may have complete answers corresponding to his expectations without reading the initial text. RAFI, installed on Internet, would be "a summarizing engine", a thing that is currently missing on the worldwide network because most research engines existing in the Web function by selecting words in response to a given request. RAFI can process a whole text by picking the pertinent information.

All of these considerations compel us to believe more and more strongly in the need to intensify the attempts to create the ATS system in order to improve the few outcomes in this domain. These attempts have been undertaken : in the USA, Lehnert (1981), De Jong (1982), Alterman & Bookman (1990), in England Paice (1990), in Germany Hahn (1990), the corresponding systems process respectively English or German. The system RAFI, Lehmam (1995) processes the French language.

We are now going to first describe the system RAFI and we will then show how we could carry out this system presenting the method that we've found and implemented for the design of the system RAFI. It works on personal computers.

Description of the system

The principle of the system RAFI is based on the recognition of elements of text and selection of the most pertinent ones through comparison with a base of pre-constituted knowledge (thesaurus). The selected items then serve as a base for the constitution of summaries whose lengths vary according to the score chosen. The original text items are compared with a thesaurus and a score is attributed to them. Next, the system eliminates the sentences which have not obtained the score indicated.

Besides the automatic aspect, which makes it possible to generate a summary rapidly without analysing the original text, this system permits user to parametrate the summary depending on the degree of information and linguistic perfection required.

Adapted, in its current version, to the scientific and technical text processing, the system RAFI has already presended several possibilities of application such as assistance to "extractors", assistance to researchers and documentalists in the decision of reading the text, the management of documents....We can think of other uses, particularly the processing of the textual data conveyed on Internet.

Innovative character

The innovative character of the system RAFI comes from several items :

Traditional systems are based on the statistic analysis of source text. On that account, they rapidly became gigantic and they are especially concerned with micro-domains. RAFI, starting from the text's own structure, permits to reconstitute the idea of the author as he had expressed it, no matter what the scientific and technical theme.

Other systems are founded on content modeling but they have other disadvantages. They model the text concerned to transform it into a representative knowledge base of the source text. After this modeling a laborious work aims at generating summaries. They are, thus restricted to the processing of specialized texts (such as copyright). These systems enable us to produce only one summary. By its score principle, RAFI allows of summary multiplicity and relies essentially on important contextual indicators ; it bypasses the problems of syntactic analysis and so is not restricted to automatic summaries in highly specific domains.

RAFI, processible on a PC, has no limits as for the production of the summary output; the user is free to make the system generate as much information as he wishes, within a range of 0 % to 100 %.

The innovation in the design of the system RAFI is translated in a software operating on very widespread systems (PC or Macintosh type) making private use possible and not limiting up its diffusion.

The possibility of displaying different summaries on the same monitor permits user to generate freely and in a friendly manner different summary versions and to choose what suits him best.

The system RAFI is very fast : the system processes a 4275 word text (about 12 pages) in one minute and 10 seconds on just a 66 Mhz PC. On faster and more powerful machines (PC pentium II,III,IV, for example), the process can be achieved at an impressive speed.

To sum up, the major innovative characteristics of RAFI are : its simplicity of use, its user friendliness and its rapidity of works, its adaptability to user's needs, its being a French language system, the only automatic summary system that gives summaries tailored to the user's needs.

The economic advantage

The increasing problem of our society, the so-called "that of information" is the absence of assisting tools of for the processing and the synthesis which permit to research relevant information in the flood of information that we know. The system RAFI offers a solution to this problem where the scientific and technical texts are concerned by its simple processes :

- text scanning,
- data aquisition by direct access (teleloading of big text through Internet, for example),
-summarizing the text (scores) and choosing the required summary type

In relation to a network such as Internet, the system RAFI makes the task of summary automatic or at the very least puts at the disposal of the "Internet navigator" an assistance system that can supply him quickly with a selection of the most important sentences which enables him to create his own summary without analyzing the original text and so strip a great amount of articles dealing with the same subject of study. A "navigator" may also use RAFI in order to decide whether or not to read a lengthy article. The capacity of this system, its characteristics of use, its meager requirement in terms of equipment all spell great economic advantages. It would be used by companies or individuals who process scientific and technical information and particularly regular users of Internet when RAFI, perfectly compatible with this network, is installed.

Big companies possess documentation centers where this type of text has an important place. A presentation and a demonstration took place in front of industrialists at the Ministry of Higher Education and Research on the 13 th of may 1996 and another one at EDF (site Clamart, Paris). Apart from these industries and organisms, another important area is constituted by the scientific press (La Recherche, Science et Vie, Pour la Science, Nature...) as well as by all the specialized periodicals that we find in research laboratories. To this we can evidently add the enormous flood of scientific information conveyed by Internet which is not read often, being too expensive in consultation and reading time. The possibility of installing it on Internet as a summary generator thus proves to be a very interesting one.

The idea of sytem RAFI is to enable us to use simply and efficiently more scientific and technical written information in digital form.

Conclusion

The Automatic Text Summary (ATS), neglected for a long time, interests many researchers studying computational linguistics. With the rapid progress of computer science and with more and more texts being stored in digitalized form, it is only expected to consider the means of managing the textual data bulk. In this work, we have created a system which might contribute to the management of technoscientific texts.

The scientific searchers can use RAFI for the purpose of decision in relation to reading or not reading of a more or less voluminous article, the industrialists can use the system in order to select a technical product described in a text of several pages without having to read the whole body. The ATS produced by the system give an abstract of certain sentences of the source text which would provide sufficient indications on it. The indicative summary produced by the system would be useful also to have trace of the processed source text for archives. It can also serve the base for the professional summary writers of bigger centers of documentation or market of scientific and technical information or yet other purpose. Other uses can be imagined and very particularly release of this system on the network Internet proves to be easily possible.

Concerning the result produced by RAFI, we wish to refine it in our future research in the domain of automatic text summary. And in this system, we wish to include new rules from linguistic field that might contribute to making the summaries more coherent.

The system RAFI has been evaluated by the research department in Paris EDF,/DER/GRETS/ L and C and given below is the result of the protocol MLUCE : "RAFI gives quite satisfying results in the context of text selection presenting the works and their results ; it can also supply some indications on the occasion of bibliographic research."

 

References

(2005, à par.) Abderrafih Lehmam, " Le résumé automatique des textes ; aspects linguistiques et computationnels", Editions L'Harmattan, Paris

(2004) Abderrafih Lehmam, Philippe Bouvet "Watch application, summarization and syndication in Arabic", Proceedings of the conference NEMLAR '04 "Arabic Language Resources and Tools Conference", pp. 157-163, 22-23 September 2004, Cairo, Egypt.

(2004) Abderrafih Lehmam , Philippe Bouvet "Un résumeur automatique de textes multilingues intégré dans une plateforme de veille; application à la langue arabe", Actes de la conférence JEP-TALN-RECITAL 2004 : Traitement automatique de la langue arabe écrite et parlée Arabic Language Processing - Text & Speech, pp. 111-122, Fès, Maroc

 A. Lehmam & P. Bouvet (2003), "Pertinence Information Network ; un système d’alerte, en multilingue" Séminaire DocForum Explorez les nouvelles voies de la recherche d'information! journée organisée par l'association DocForum et O. Andrieu (abondance.com) à l'ENS Lettres et Sciences Humaines – Lyon le 20 novembre 2003,

(2003) A. Lehmam & P. Bouvet "Pertinence Information Network; Collecte, traitement, diffusion ciblée et exploitation de l'information" 9e Carrefour des Possibles de la FING (Fondation Internet Nouvelle Génération) ; Rendez-vous régulier au service des innovateurs et des utilisateurs des technologies de l'information et de la communication, 25 septembre 2003, Maison de la RATP, Paris

(2003) A. Lehmam & P. Bouvet "Pertinence Information Network : Agent d'alertes en multilingue ; l’alerte par syndication de contenu" WEBPublication - Publication dynamique sur Internet et Intranet, 2003 Paris

(2003) A. Lehmam & P. Bouvet  A. Lehmam "Pertinence Summarizer : un outil d’aide à la rédaction par la génération de résumés automatiques", Colloque COMTEC 2003, Gestion Documentaire – Archivage , mars 2003, Advancia, CCIP Paris

(2002) A. Lehmam & P. Bouvet, "Résumé de texte automatique : vers des solutions professionnelles", Journée d'Étude de l'Association pour le Traitement Automatique des LAngues (ATALA) "Le résumé de texte automatique : solutions et perspectives" organisée par Jean-Pierre Desclés (LaLICC - FRE 2520 CNRS - Université Paris-Sorbonne), Abderrafih Lehmam (Pertinence Mining, Paris) et Jean-Luc Minel (LaLICC - FRE 2520 CNRS - Université Paris-Sorbonne) ENST, 14 décembre 2002, Paris

(2003) A. Lehmam & P. Bouvet "Résumé automatique multilingue tenant compte de la thématique du texte", Séminaire ATILF (Analyse et Traitement Informatique de la Langue Française) (INaLF CNRS), 6 décembre 2002, Nancy

(2001) A. Lehmam & P. Bouvet "Évaluation, rectification et pertinence du résumé automatique de texte pour une utilisation en réseaux Internet et Intranet " 3ème Colloque du Chapitre français de l’ISKO 5-6 juillet 2001 à l’Université de Paris X "Filtrage et résumé automatique de l’information sur les réseaux" pp. 111-124

A. Lehmam (2000)"Résumé de texte automatique : des solutions opérationnelles", La Tribune des Industries de la Langue, de l'Information Électronique et du Multimédia, Janv-Juin , pp.50-58, OFIL, Paris.

A. Lehmam (1999) " Text structuration leading to an automatic summary system ", Information Processing & Management, 35, pp. 181-191, Elsevier Science Ltd, NJ, New York, USA

A. Lehmam (1998) Rapport linguistique du Projet RAFI; 103 pages.,

A. Lehmam (1998) Rapport informatique du Projet RAFI; 105 pages

A. Lehmam (1997b) Automatic summarization on the Web? RAFI: A system for summarizing using indicating fragments , 5ème Conference RIAO '97 Recherche d'Information Assistée par Ordinateur sur Internet, Université McGill, Montréal, Québec H3A 2T7, Canada, 25-26-27 Juin 1997, pp. 112-124.

A. Lehmam (1997a) Une structuration de texte conduisant à la construction d'un système de résumé automatique , 1ères JST FRANCIL '97 Journées Scientifiques et Techniques du Réseau Francophone de l'Ingénierie de la Langue de l'Aupelf-Uref. L'Ingénierie de la Langue : de la Recherche au Produit, Avignon, France, pp. 122-130, 15-16 avril

A. Lehmam (1997) "Le Résumé Electronique : l'expérience de RAFI", Echos n°5, pp. 12-24

A. Lehmam (1996c) Le résumé automatique à fragments indicateurs : un système d'aide au résumé humain, Actes du Colloque " Informatique et Langue Naturelle" I.N.L.'96, 9-10 octobre, Nantes, pp.355-373.

A. Lehmam (1996b) Construction d'un système de résumé automatique de textes de type scientifique et technique, Actes du Colloque "Rencontre des étudiants chercheurs en informatique pour le Traitement Automatique des Langues" RéciTAL'96, les 25, 26, 27 septembre 1996, Courcelles (Gif-sur-Yvette), Paris, pp.65-69.

A. Lehmam (1996a) Le système RAFI, Rapport de la journée "filtrage et résumé automatique de textes" organisée à Paris, le 13 mai 96, Secrétariat d'Etat à la Recherche : MENESR, Paris

A. Lehmam (1996) "Le résumé de texte automatique : des ambitions aux résultats actuels : le système RAFI", La Tribune des Industries de la Langue et de l'Information Électronique, Numéro Spécial: 20-21-22, pp.35-45, Juillet 1996, OFIL, Paris.

A. Lehmam (1995) Le résumé automatique de texte: réalisation d'un prototype procédant par extraction de phrases du texte source. Actes de la Première Rencontre des Jeunes Linguistes de France (Université du Littoral - Nord-Pas-de-Calais). Recueil de Recherches Linguistiques, Dunkerque, pp. 79-89.

 

Cont@ctpencil.gif (2430 octets)mail6.gif (291 octets)

Résumé automatique

Activités dans le domaine

Faisons connaissance - CV -

automatic summarization

Allez au Maroc, c'est merveilleux!