It seems everybody has an enterprise search solution nowadays...vendors from all walks (taxonomy & ontology, content management, portals, document management etc.) are purporting the benefits of their product over others. Search is a huge component of effective collaboration, because, after all...if you can't find the document/ content/ thread/ issue/ task /email that you were collaborating on again, how can you collaborate on it? Findability is king.
So how can you leverage search to collaborate better and find more relevant content? In a multi-part blog, let's review the search basics about products in this space:
Every search product has at least three major components: The Gatherer (Indexer), The Ranker & The User Experience (vendors may use different words to represent their search product). There's a plethora of administrative or higher level stuff that I can't go into to such as reporting (who is searching for what), or index syndication or auto-categorization & taxonomy development, which are also important topics when choosing a search engine, but we've got to start somewhere. I should note that whether we're talking web search, enterprise search or desktop search...all of these concepts exist.
The Gatherer / Indexer is responsible for going out and finding content and creating one or more indexes. All gatherers are limited in the types of repositories they can index and the type of files they can properly index...which really is based on their ability to parse the content of the files and bring the content into their index. In the Microsoft world, the ability to filter new file types is facilitated through an ifilter. There are many more available than what's out of the box with Microsoft's search solution: SharePoint. These ifilters are also used by other search products that leverage the Microsoft platform. A gatherer is also limited by the types of repositories it can search. Some gatherers can only browse file shares and folders, while others can crawl links on websites. Yet others can crawl Line of Business systems such as document management products. As a final note about gatherers: robust products in this space have adaptive indexers, which become more and more intelligent over time to updating their index with the most critical, used or important information quicker than old or less frequently used information.
The Ranker is responsible finding the best results based on the searched terms. Simple raking looks only for "free text" matches to the searched words and returns some sort of simple result list (i.e. organized by date). Other more complex raking products use special, complex algorithms to sort results and determine the relevancy of a result to the query. Good systems usually use meta data (such as document description) to help drive up the relevancy of an item search results. Sometimes this is assisted with things such as thesauri, which look for similar words or word rephrasing (RCMP could also be "Royal Canadian Mounted Police, or R.C.M.P.). Word stemming is another more complex feature which looks variations of parts of a word (i.e. ran, running, run, etc.) There's many more features that assist in raking in various products. In SharePoint, you can define "best bets" which essentially will move a search result up to the top of the list...other products have similar features.
The User Experience is what most people are familiar with when they search. The UE has two parts the search interface (i.e. simple search, advanced search) and the results interface. For Google, most people use the simple search box instead of moving to the advanced search page. Search results contain a web page title and snippet of text from the page (hopefully relevant). They also contain a link to searching for more relevant items. Search results pages differ from product to product (and really should...what a user sees in the results is highly relevant to what they are searching for)...although Google seems to be defining the standards on this one. Yahoo still includes what they call "Category" with (some of) their results (i.e. Companies >Technology >Habanero). While this type of classification (including some connection with a planned taxonomy or ontology) is probably very relevant in internal enterprise searches...it seems to be less and less relevant on the Internet (see also msn search).
Last year, we reviewed some key search products for websites. We're about to update the report, but here are the 2004 results.