TitleSideffective - system to mine patient reviews
NameYalamanchi, Deepak (author), Imielinski, Tomasz (chair), Gerasoulis, Apostolos (internal member), Borgida, Alexander T (internal member), Rutgers University, Graduate School - New Brunswick,
DescriptionSideffective is the system to crawl, rank and analyze patient testimonials about side ffeects from common medications. Since the wealth of any mining model is the Data corpus, the data collection phase involved extensive crawling of massive medical websites comprised of user forums from the internet. Subsequently, the raw files were subjected to certain site-specific parsing routines, yielding outputs conforming to a well-defined data model. Currently, the system holds close to 400,000 user testimonials pertaining to more than 2500 drugs/medicines. Sideffective aims at gathering and aggregating this wealth of information, build useful associations and present interesting observations and numeric validations, all in a user-friendly interface. The important issues that we have tried to tackle are: Extracting side effects without relying on pre-built lists, aggregating distribution of different side effect for a give drug, site-specific search, ranking and determining the negativity of reviews. The system has been jointly built by Deepak Yalamanchi and Sangeetha Rajagopalan under the guidance of Prof. Tomasz Imielinski. This thesis focuses mainly on Sentiment Analysis of patient reviews. While most existing sentiment analysis systems are predicated by POS (parts of speech) tagging or Bayesian sentiment analysis methods, the same cannot be applied to medical reviews as they generally carry a negative flavor in them. We thereby approached the problem by identifying the features in the sentence and calibrating the sentiment on a Negativity Meter based on their relation to sentiment words. A feature, as defined for the purpose of this thesis, can be a medicine, a side effect or a symptom. The sentiment of each feature is determined by the aggregate of all its polarities with respect to each sentiment word, where the polarity is determined by an inverse relation to the distance of the feature from the sentiment word. Each sentence is then evaluated by the cumulative polarity of all the features contained in it. Sentiment of a review is determined by individually determining the sentiment of each sentence and then getting a weighted sum score of all the sentences in the review. The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. Experimental results, involving human reviewers (extracted from site: www.askapatient.com) and correlating them back to the negativity rating of each review yield conclusive results, demonstrating the effectiveness of the technique. We have also implemented a customized Lucene search on the data using a multi-review summarization approach and a ranking scheme based on the feature-list. Ranking priority is given to the review that has the largest feature list size.
NoteIncludes bibliographical references
Noteby Deepak Yalamanchi
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.