Using the NIFSTD Ontology to Improve PubMed Search Results
Hitesh Sabnani (University of California San Diego), Anita Bandrowski (University of California San Diego), Amarnath Gupta (University of California San Diego)
The standard NIF literature search facility deconstructs an article into its constituent parts (Title, Abstract etc.) and measures the relevance of a search query by combining partial scores of the match between the query term vector and each component term vector into a combined matching score. The ranking function produces better search results than PubMed, but provides no semantic context to interpret the search results. One can compute "clustered results" where an algorithm post-processes the results to partition the results groups so that results within a group a "similar" to each other (e.g., using a cosine-distance metric) than between groups. We show that this form of "blind" similarity-based clustered ranking gives no insight into the search results because the clusters often center around arbitrary concepts that often have no bearing on neuroscience. To improve the quality of results, we use the NIF ontology in a novel way. For every result (i.e., abstract) returned from the search, we perform an automatic mapping of terms to the NIF ontology such that each abstract maps to more than one ontology term. After all terms are mapped, we perform a novel graph clustering method on the mapped nodes of ontology from the entire result set. The method allows overlapping of clusters and takes into account taxonomic and partonomic relationships amongst terms such that the number of conceptual overlaps between related terms (e.g., hippocampus and CA1) is minimized. The cluster centers are assigned to terms with the largest betweenness centrality. Results within a cluster are ranked in the standard way.
We show that this technique offers a deeper insight into the neuroscientific connection between the query and search results.