In mid-February 2008 empolis provided CES, a Polish think-tank for Eastern analyses, with unique knowledge-based search tool that allows for intelligent access to around 4 million documents. The basis for the solution was the empolis:Information Access Suite (e:IAS).
The main design goal of the implemented solution was to let analysts retrieve necessary information from a multimillion, multi-language document base within seconds. The system was supposed to provide access to different data sources in a coherent manner and employ innovative language and knowledge tools in data processing.
The resulting system fulfilled all customer requirements - and far more. The system can be accessed from a Web browser, offering personalized portlet GUI and access permissions. Documents can be input manually, as analyses are prepared by CES employees, or automatically, by means of an integrated import from Polish, Russian and Ukrainian sources. Documents are indexed according to the knowledge model, taking into consideration specific document types and additional custom characteristics of data (e.g. multilingual synonyms or transcription/transliteration of Russian names: searching for "Medwedew" will retrieve documents containing "Medvedev"). What is unique, the knowledge model is directly enhanced by changes in document base, e.g. with synonyms contained in newly uploaded biographical entries.
The search process is boosted by numerous language tools, including stemmers for Polish, English, Russian and German. Apart from using free-text queries in "Google-like" syntax, the search can be fine-tuned using metadata forms. Retrieved results can be additionally filtered against related regions, languages, dates and several additional document properties. To allow for viewing documents independently on their original formats, the system integrates automated converter from around 200 popular formats to HTML.
The main benefit of the system was to integrate different sources of information into single repository equipped with advanced processing capabilities. Previously used lists of derived word forms and asterisk-based searches were replaced by precise, multilingual stemming; nonsystematized geographical data are now organized as taxonomy filters. The older search system, implemented in the mid-90s, was completely replaced by the empolis solution.









