ICA – Intelligent Content Analysis

Automatic analysis of the text, formatting unformatted data.

Melingo’s Intelligent Content Analysis (ICA) is an advanced system, developed by Melingo using algorithm based text analysis and entity extraction tools.  Given texts in Hebrew, Arabic or Persian, the system gives two outputs:

Complete analysis of the text – the system takes the inputted text and outputs an analysis of each word according to its root, part of speech, ascription to a word combination, prefix, tense etc.

Textual entities found in the text –  a text is inputted, and the system extracts the main entities appearing in it, and categorizes them in categories such as names, places, organizations, addresses, non-verbal chains such as telephone numbers, car license numbers, credit cards, email addresses, URLs etc.

The entities are extracted to a synopsis where they are listed according to type, subtype and number of appearances.

ICA functions as an open interface under Windows in net, C++ and JAVA.  It is actually an API which allows the user to make wide and flexible use of its output, while being easily combined in existing software.

System add-ons

UDK – User Defined Keyword

The UDK component is an add-on  which makes possible the addition and enrichment of organizational categories (as a category dictionary personally adapted and managed by the customer) for the purpose of entity extraction.  This capability makes it possible to ascribe words or names to new categories or to add them to existing categories.

For example, the user can define the word ‘lily’  as an entity in the  category ‘weapons’, or as an entity in a new category defined according to his needs, for instance ‘flowers’ or ‘plants’.

Organizational lexicon

The organizational lexicon component is another add-on to the ICA system which can specifically affect the resulting analyses.  The customer can affect resulting homonyms by scoring them, giving a higher score to the desired result.

Please contact us for a trial version or with any other question.

Chracteristics of Melingo’s ICA

 

Recognizes entities from a wide world of concepts

ICA can identify central entities from many built in categories without need for manual definitions.  Among them: Names of countries, cities, people, medical terms, weapons, names of organizations and more.

Use of morphology

Concepts are identified in the text even when used in different conjugations, spellings and forms, in a way which ensures optimal recognition of the central concepts of the text according to their context.

Overcoming multiple meanings

The system performs a precise analysis of the text, while overcoming multiple meanings.  In this way the noun ‘barak’ will be identified and analyzed differently from the name ‘Barak’.

Concept personalization

ICA can be personalized and adapted to the needs of the customer and to his world of content, giving preference to concepts from his world.  The customer can also define new concepts to be identified according to his needs.

Support for many programming languages

The system works as an API with .net, Java and C++ envelopes, so it is easy to integrate with systems written in these languages.

Implemented in large systems

The system is currently being used successfully by large systems.

Possible uses for ICA

Analysis and comprehension of texts

Automatic extraction of keywords from the entire document

Cataloguing and labeling of documents

Identification of business opportunities - identification of texts dealing with a particular product

Summarizing documents

Formatting unformatted data

Integration of ICA in the search/indexing process

Example of Melingo’s ICA function

The following example portrays the entity extraction capability of Melingo’s ICA.  In this example an article was examined, textual entities it contains were highlighted, their appearances were counted and they were divided into categories according to subjects.

ica - sample

The following table shows an example of the text’s analysis.  The column on the left shows individual words from the article, and the other columns – the analysis – shows each word’s part of speech, basic form etc.

ica - sample - Tokens

ICA in social networks

Among our customers

 

.