Speaker of Workshop 3
Will talk about: In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
Amarnath Gupta is a Research Scientist at the San Diego Supercomputer Center of University of California San Diego, where he directs the Advanced Query Processing Lab. His current research interests are in the area of emerging information systems that include graph data management, semantic information integration for scientific applications, ontological information management, information management in social networks, and the impact of high-performance computing platforms for information systems problems. He has been associated with several Neuroinformatics projects. He has been an early architect of the Cell Centered Database, the BIRN project, and is currently the co-PI and technical design lead of the Neuroscience Information Framework (NIF).
Scientific data should be viewed at multiple levels -- the numbers produced by instruments, data observed and collected by humans, results of different levels of transformations applied to data, inferences made from the data, and claims about scientific reality or hypotheses -- are all "data" at some level. Scientists regularly share their claims and hypotheses through their publications; some share portions of data through databases, data sets contributed to public and private repositories, or supplements to publications. However, the proportion of unshared information is very high, especially when there is no publisher-driven mandate to make data public. Today, there is growing body of sharing technologies and repositories with a wide range of data ingestion, storage, sharing and retrieval capabilities. In our experience with the Neuroscience Information Framework (NIF), we notice a wide variation in the kind of data scientists do and do not want to share. I believe that the real solution to the data scarcity problem must be brought about by setting some "social accountability" measures that value the contributing scientist more than the non-contributing scientist. I propose that we create a set of "reputation scores" (like credit ratings) which might be computed from their "accountability scores" that measure data sharing and "influence scores" that measure use of shared data. Not surprisingly, the e-commerce and social network communities have developed reputation and trust management models which, with some specialization, can be applied toward tracking scientists' contributions. These reputation engines will track the scientific activities of a scientist by analyzing and correlating their paper and data publications. It will also accept ratings and annotations on publication and data objects made by the users of scientific research products, and combine these ratings with the tracking results to compute contextual reputation scores. The ratings will be gathered and administered by independent 3rd parties, and used by the community to measure the trustworthiness of scientists, their experiments and their claims. The talk will sketch the structure and operating principles of a hypothetical reputation engine, and show that an organization like the NIF can already provide enough information to construct such an engine. We believe that the adoption of reputation management technologies by the Neuroscience community will be able to bring about a cultural shift in the domain of data and knowledge sharing.