Our Team are actively developing and own our own Search Toolkit.

We are developing a scientific text mining system capable of performing complex information extraction over large datasets.

RASP-based Pipeline

We convert PDF papers to SciXML, parse these files with RASP, and then index the RASP-based XML with Solr/Lucene. This work builds on our previous work within the FlySlip project.

Simple Web-based Interface

We are actively developing our web-based interface that allows users to perform complex information extraction.

Intuitive Search

Our users will search in an intuitive manner to produce complex linguistic searches. That is, users will not need to understand in any way the underlying linguistic structure of the complex searches the system helps them construct.

Similar Image Searches

We have successfully utilised the open-source LIRE project to integrate similar image searches into our generic search toolkit.

For a more advanced description of our technology see this paper relating to the FlySlip project (an ongoing collaboration): (PDF).

Note: this paper illustrates screen shots of an older version of our prototype.

View videos documenting the use of this previous prototype: (short), (long)