TitleEnabling dynamic interactions in large scale applications and scientific workflows using semantically specialized shared DataSpaces
NameDocan, Ciprian (author), Parashar, Manish (chair), Marsic, Ivan (internal member), Zhang, Yanyong (internal member), Silver, Deborah (internal member), Allen, Gabrielle (outside member), Rutgers University, Graduate School - New Brunswick,
SubjectElectrical and Computer Engineering,
Electronic data processing ,
DescriptionEmerging scientific and engineering applications use large-scale parallel machines to simulate, with higher accuracy, complex physical phenomena consisting of dynamically interacting processes. The workflows associated with these applications consist of parallel application codes that need to co- ordinate and interact at runtime. The interactions typically involve large volumes of data that must be exchanged and processed by the codes. The heterogeneous nature of the coupled codes, their numerical formulations, and their data decompositions lead to complex and dynamic interaction and data exchange patterns that are only defined at runtime. Moreover, these simulations often run on separate resources and progress at different rates, which adds to their complexity. Efficient and scalable implementation of these coupled application workflows present several challenging programming, orchestration, coordination, and data exchange requirements. Existing programming frameworks, however, are rigid and provide limited support for the dynamic inter- actions manifested by these applications. For example, existing frameworks need to gather global application knowledge, impose tight synchronization between applications, or demand pre-defined and static interaction patterns that must be known prior to execution. These constraints can introduce significant performance penalties and can limit application interaction programming expressiveness. This thesis explores a new communication and coordination model to enable flexible and asynchronous application coupling for coupled applications workflows. It derives from the tuple-space model and provides the abstraction of a virtual distributed shared-space, which is customized for the application data domain. It enables applications to coordinate and exchange data by inserting and retrieving data objects. This model does not impose any synchronization requirements between independent applications. Data stored on the space can be accessed by multiple applications, which can associatively query the space and retrieve data objects. Furthermore, it enables decoupled and dynamic interactions driven by application computations. This thesis presents DataSpaces, a prototype implementation of the distributed shared-space model. DataSpaces enables memory-to-memory application coupling and transparent data redistribution. It can complement existing workflow engines to enable in-memory data transports between distributed applications that run on separate resources as part of end-to-end scientific workflows. The thesis also presents ActiveSpaces, which extends DataSpaces and the shared-space model to enable in-transit data processing. It proposes and demonstrates a shift in the data processing paradigm by moving processing code closer to the data. ActiveSpaces provides programming support for defining data processing routines, and a runtime execution system to deploy and remotely execute these routine on the space. The research concepts and software frameworks have been deployed and evaluated using real application workflows in production runs on high-end computing systems.
NoteIncludes bibliographical references
Noteby Ciprian Docan
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.