The RASP system includes state-of-the-art modules for finding sentence boundaries, finding individual words, analyzing words to identify the word root and any suffixes, assigning part-of-speech labels to words in running text, and analyzing the grammatical relations between words and larger units within sentences.
Text analyzed with RASP provides the basis for text classification on the basis of topic, sentiment, genre, reference to specific entities, the stength of specific assertions or many other facets, when combined with other open source machine learning classifiers either using supervised or semi-supervised learning techniques. The resulting annotated text collections can be indexed using open source search engines at the document, sentence or word level to provide flexible, intuitive, interactive access to text snippets and passages or to automatically create structured databases from text.
To date, RASP has been applied to billions of words of English text drawn from genres as diverse as biomedical scientific papers through to second language learners’ examination scripts.
You are interested in finding nuggets of information like:
‘Google acquired YouTube’.
This information can be expressed in a wide variety of ways in written form.
Therefore, to extract this information from a large and/or complex dataset, your application will need to perform more than just keyword search – it will need to perform natural language processing which can be provided by named entity recognisers and RASP.
Your application should be able to find matching text for your query like:
The NYT today announced Google’s acquisition of video hosting service, YouTube
Rasp output for this sentence:
Use RASP to recover sentences that have similar or identical meaning to your query.
We can do this by parsing text with RASP to output grammatical relations (GRs) so that finding sentences with similar or identical GRs results in discovering text with similar meaning.
Thus RASP helps discover relevant text with high precision, going well beyond the capabilities of keyword search.
RASP has been integrated into many academic applications, such as question answering, text anonymization, text summarization, information extraction, sentiment classification, machine translation, and database creation. Papers describing many of these applications can be found by typing RASP parsing into Google Scholar.
RASP is freely available to download for academic (non-commercial) applications under the GNU Lesser General Public License. RASP’s open-source release includes its five main modules (sentence splitter, tokenizer, tagger, morphological analyser and parser). The last version is 3.1, released in September 2012.
We provide commercial licences for the full RASP system and its individual components. Further, we provide consultancy and proof-of-concept prototype software development services using RASP and other, mostly open source, text processing and machine learning tools, including the TAP classifier and search engine.
We are willing to provide training in the use and customisation of RASP and are actively seeking technology partners to collaborate with.
The commercial version of Rasp includes additional auxiliary tools, subcategorisation extraction software, XML functionality, Unicode handling etc., and provides integration with TAP and our search engine.
iLexIR also offers tight integration of the toolkit with its own timed aggregate perceptron TAP classifier, an innovative machine learning classifier with the accuracy comparable to support vector machines but training time closer to a naive bayes classifier.