ICA – Intelligent Content Analysis
Automatic analysis of the text, formatting unformatted data.
Melingo’s Intelligent Content Analysis (ICA) is an advanced system, developed by Melingo using algorithm based text analysis and entity extraction tools. Given texts in Hebrew, Arabic or Persian, the system gives two outputs:
Complete analysis of the text – the system takes the inputted text and outputs an analysis of each word according to its root, part of speech, ascription to a word combination, prefix, tense etc.
Textual entities found in the text – a text is inputted, and the system extracts the main entities appearing in it, and categorizes them in categories such as names, places, organizations, addresses, non-verbal chains such as telephone numbers, car license numbers, credit cards, email addresses, URLs etc.
The entities are extracted to a synopsis where they are listed according to type, subtype and number of appearances.
ICA functions as an open interface under Windows in net, C++ and JAVA. It is actually an API which allows the user to make wide and flexible use of its output, while being easily combined in existing software.
UDK – User Defined Keyword
The UDK component is an add-on which makes possible the addition and enrichment of organizational categories (as a category dictionary personally adapted and managed by the customer) for the purpose of entity extraction. This capability makes it possible to ascribe words or names to new categories or to add them to existing categories.
For example, the user can define the word ‘lily’ as an entity in the category ‘weapons’, or as an entity in a new category defined according to his needs, for instance ‘flowers’ or ‘plants’.
The organizational lexicon component is another add-on to the ICA system which can specifically affect the resulting analyses. The customer can affect resulting homonyms by scoring them, giving a higher score to the desired result.
Please contact us for a trial version or with any other question.
Chracteristics of Melingo’s ICA
Recognizes entities from a wide world of concepts
Use of morphology
Overcoming multiple meanings
Support for many programming languages
Implemented in large systems
The system is currently being used successfully by large systems.
Possible uses for ICA
Analysis and comprehension of texts
Automatic extraction of keywords from the entire document
Cataloguing and labeling of documents
Identification of business opportunities - identification of texts dealing with a particular product
Formatting unformatted data
Integration of ICA in the search/indexing process
Example of Melingo’s ICA function
The following example portrays the entity extraction capability of Melingo’s ICA. In this example an article was examined, textual entities it contains were highlighted, their appearances were counted and they were divided into categories according to subjects.
The following table shows an example of the text’s analysis. The column on the left shows individual words from the article, and the other columns – the analysis – shows each word’s part of speech, basic form etc.
ICA in social networks
Among our customers