TitleSideffective-system to mine patient reviews
NameRajagopalan, Sangeetha (author), Imielinski, Tomasz (chair), Gerasoulis, Apostolos (internal member), Borgida, Alexander T (internal member), Rutgers University, Graduate School - New Brunswick,
DescriptionSideffective is the system to crawl, rank and analyze patient testimonials about side effects from common medications. Since the wealth of any mining model is the Data corpus, the data collection phase involved extensive crawling of massive medical websites comprised of user forums from the internet. Subsequently, the raw files were subjected to certain site-specific parsing routines, yielding outputs conforming to a well-defined data model. Currently, the system holds close to 400,000 user testimonials pertaining to more than 2500 drugs/medicines. Sideffective aims at gathering and aggregating this wealth of information, build useful associations and present interesting observations and numeric validations, all in a user-friendly interface. The important issues that we have tried to tackle are: Extracting side effects without relying on pre-built lists, aggregating distribution of different side effect for a give drug, site-specific search, ranking and determining the negativity of reviews. The main focus of this thesis undertaking is Extraction & Discovery of Side effect from a users review about a drug. Apache Lucene's Shingle Analyzer, which extracts terms and their frequency, was used to generate more than 7 million phrases out of which the top 25,000 terms, with frequencies more than 100 was chosen for discovering side effects. After eliminating the syntactically incorrect phrases, our method calculates the frequency of occurrence of each of the terms in a medical websites domain versus a purely non-medical user websites domain, which proves to be highly effective in extracting side effects. Using this technique, more than 600 unique side effects reported by users has been discovered without using any fixed lists. This list extracted is also used to mine and summarize patients reviews. The aggregation and distribution tables we built, effectively determine top reactions exhibited by various drugs and reverse mapping of the same, demonstrating the symptom to drug associations. Our system also eliminates synonymous side effects as well as cures falsely appearing as a possible side effect.
NoteIncludes bibliographical references
Noteby Sangeetha Rajagopalan
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.