TitleTowards mistake-aware systems
NameOliveira, F´abio Abreu Dias de (author), Bianchini, Ricardo (chair), Martin, Richard (internal member), Nguyen, Thu (internal member), Candea, George (outside member), Rutgers University, Graduate School - New Brunswick,
Computer system failures,
DescriptionThe complexity of today’s enterprise computer systems poses a major challenge to system administrators, with a multitude of inter-related software components distributed in non-obvious ways across multiple computers. Not surprisingly, several studies have shown that human mistakes are an important source of outages and incorrect system behavior. To make matters worse, as computers permeate all aspects of our lives, higher demands are placed on the availability and correct operation of many computer systems. Given this state-of-affairs, we envisioned that systems must gracefully tolerate human mistakes made during system administration and operation. To realize our vision, we first studied human operator behavior and mistakes by means of live experiments with volunteers and a survey with database administrators. The results of this study led us to investigate a few techniques for dealing with mistakes, namely, validation of operator actions and model-based validation. Our research efforts culminate in a radically different approach, which we call mistake-aware systems management. We evaluate the effectiveness of validation of operator actions applied to databases, model-based validation, and mistake-aware systems management through a combination of live operator experiments, operator-emulation experiments, and mistake-injection experiments in a realistic prototype three-tier Internet service.
NoteIncludes bibliographical references
Noteby F´abio Abreu Dias de Oliveira
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.