Monday, 31 January 2011

Search Based Applications and Concept Search


I have not written about search for a while, as my current role is is less involved in search than my previous one, but then a couple of things come along at the same time.


Last week my colleague Martin White, of Intranet Focus, who is a true enterprise search guru, wrote about Search-based Applications. Martin's post was prompted by his review of a new book on Search-based Applications (SBA), but he also briefly mentions some of the latest and greatest in search technology trends, and I am sure he will not mind me quoting him verbatim to get the point across:
"For example, coming to a desktop near you before too long will be search tools based on topic modelling, which use Bayesian statistics and machine learning to infer the relationship between topics in a document.  What is fascinating about this technique is that it dates back to the development of latent semantic indexing in the late 1980s, which was then refined into probabilistic latent semantic indexing a decade later.  Now the buzz is about Latent Dirichlet Allocation (LDA), which itself has formed the basis for correlated topic models (CTM) and dynamic topic modelling (DTM)."
Not that most of us ever thought otherwise, but this is a timely reminder perhaps, that there is more to intranet and enterprise search than the technology built into SharePoint !

It is therefore very timely after Martin's post that another element of search is brought to my attention. Today Clearwell Systems announced that they have extended their "transparent" search capabilities to concept search - to give us Transparent Concept Search

If your not totally clued up on the idea of concept search (how could you not be ?) this is the intro paragraph of the Wikipedia entry as way of brief explanation:

"A concept search (or conceptual search) is an automated information retrieval method that is used to search electronically stored unstructured text (for example, digital archives, email, scientific literature, etc.) for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query."

The emphasis on ideas is mine. Concept Search is obviously applicable to the general ECM landscape (although Clearwell's focus is on e-Discovery). Now I used to be one of the cynics who thought that just slapping (a good) search engine on top of your content was a recipe for disaster. I was (and mostly still am) of the opinion that metadata and information architecture are massively important, BUT could we finally be getting to a place where search engine performance meets the vendor rhetoric ? 

Content Intelligence is a term that has been banded about since at least 2007, if not earlier. The idea is that you can use Business Intelligence concepts and technologies to anlayze your corpus of unstructured information. Will concept search, LDA, CTM and DTM and the rest of the advanced search acronym soup make easy, reliable and therefore highly usable content intelligence available to us all ?

Will real advances in search, added to the application of Moore's law to processing and the availability of 'cheap' storage into the Petabytes and beyond, mean that Information Management and Content Management professionals will finally be able to stop worrying about business classification scheme's, folder structures and metadata and just "throw a search engine on it" ???

No comments: