Textmining for cognitive paradigm annotation
Chayan Chakrabarti (University of New Mexico, Albuquerque, NM), George Luger (University of New Mexico, Albuquerque, NM), Angela Laird (University of Texas, San Antonio), Jessica Turner (Mind Research Network, Albuquerque, NM)
Methods. We have implemented an initial text mining approach on a subset of texts of abstracts from the BrainMap database (http://www.brainmap.org), to automate the expert annotations from the BrainMap schema and CogPO terms. We experiment with two categories of methods: methods emphasizing presence of high-entropy words, and methods emphasizing the sequence in which the words occur. High-entropy words are those, which add more discriminating information. These are likely to be technical terms relevant to the domain. In the second category, we examine the sequence in which certain words tend to occur in the corpus, rather than the words themselves.
Results. We measured the performance of a basic K-nearest-neighbor (KNN) approach on the title and abstract text of the corpus, in predicting the correct annotations. The results are better than chance, which is promising given the high-dimensional nature of the problem. We also evaluated some initial n-gram models which on this sparse corpus were less successful. Our work points toward the use of semantic models more complex than simple distance among abstracts.
Citations: Turner JA, Laird AR. (2012) The cognitive paradigm ontology: design and application. Neuroinformatics. 10(1):57-66.DOI: http://dx.doi.org/10.1007/s12021-011-9126-x