TitleDimensions of drug information
NameSharp, Mark E. (author), Belkin, Nicholas J (chair), Saracevic, Tefko (internal member), Dalbello, Marija (internal member), Bodenreider, Olivier (outside member), Rutgers University, Graduate School - New Brunswick,
SubjectCommunication, Information and Library Studies,
Ontologies (Information retrieval),
Knowledge representation (Information theory)
DescriptionThe high number, heterogeneity, and inadequate integration of drug information resources constitute barriers to many drug information usage scenarios. In the biomedical domain there is a rich legacy of knowledge representation in ontology-like structures that allows us to connect this problem both to the very mature field of library and information science classification research and the very new field of ontology matching/merging (OM). We argue for a broad view of OM that makes room not only for the "pre-formal" phase/type of multi-ontology integration exemplified by RxNorm and the UMLS Metathesaurus, but also for an even earlier phase/type when "What is there?" in a domain has to deal with implicit and poorly structured "ontologies" that barely qualify as such. Such is the case in the drug domain. We introduce dimensions of drug information as an approach to early, pre-formal OM in the drug domain that draws inspiration and incorporates principles from facet analysis, domain analysis, and Semantic Web research on linked data and mashups. By surveying 23 publically available drug information resources, we identified 39 dimensions relevant to four drug (sub)domains - pharmacy, chemistry, biology, and clinical medicine - and mapped them to the resources An arbitrary four-domain, monohierarchical classification of the dimensions produced, by extension, a reasonable four-domain resource classification. Correspondence analysis and hierarchical cluster analysis also produced evidence of its partial validity. Detailed analysis of information on nine parent drug compounds from 15 resources refined this high-level dimensional mapping and identified hundreds of subdimensions which could be expressed as a six-level hierarchy. Based on these dimensions, we integrated this information in an experimental database and showed that it was useful (1) as a training set for automating the normalization of additional raw data from the same 15 sources, bringing the important goal of building an integrated, comprehensive (all drugs) database within reach, and (2) for satisfying a variety of use cases, some quite complex, derived from published literature representing the user types corresponding to our domain focus.
NoteIncludes bibliographical references
Noteby Mark E. Sharp
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.