The DOE SC program in Advanced Scientific Computing Research (ASCR) hereby announces its interest in basic research in computer science exploring innovative approaches to the management and storage of scientific data.
Supplementary Information
Modern scientific computing relies on processing a deluge of data coming from both experiments and simulations, with even relatively modest scientific activities generating petabytes of data. Planned upgrades of experimental facilities in the foreseeable future, combined with the increased computing capabilities of DOE’s exascale supercomputers and other state-of-the-art computing capabilities coming online over the next few years (for more information, see https://science.osti.gov/ascr/Facilities/User-Facilities/Upgrades), promise to compound the many challenges in storing and managing data such that it can be effectively used to fuel scientific discovery [2-12].
Traditional large-scale scientific data management has relied on the use of file formats optimized for simple access patterns on parallel, distributed file systems. These files have tended to be metadata poor and complicated to access, lacking flexible indexing for efficient searching, where enabling new kinds of analysis often requires writing new, low-level code. Scientific workflows have also become increasingly complicated, integrating both simulation and the analysis of data from experiments, exploiting advanced machine-learning techniques, and requiring distributed, multi-stage processing. Additionally, significant opportunities exist to enhance trust and aid scientific reproducibility by enhancing our ability to record data provenance and verify data integrity. Fortunately, through a combination of past scientific-data-management investments and leveraging the growing ecosystem of big-data and database technologies, scientific endeavors have made significant improvements in their data management and use. While the ever-increasing scale of scientific data threatens that progress, new “smart” storage and networking technologies that provide embedded computational capabilities; novel methods for indexing, representing, and distributing data; and advanced techniques for interfacing with data management systems and integrating into programming environments promise significant breakthroughs. Moreover, new techniques for scientific data management can help integrate data into large scientific-data and computational ecosystems that embody the FAIR principles of Findability, Accessibility, Interoperability, and Reuse, thereby enabling collaborative, responsive science at yet-unprecedented scales.
Priority Research Directions
As highlighted by the recent ASCR Workshop on the Management and Storage of Scientific Data, building on the outcomes of prior community activities, including Storage Systems and I/O: Organizing, Storing, and Accessing Data for Scientific Discovery and the Office of Science Roundtable on Data for AI, and aligned with needs highlighted by interagency planning important priority research directions are:
- “High-productivity interfaces for accessing scientific data efficiently” – Innovative interfaces to data-management capabilities allowing for flexible, high-performance access to large data sets, potentially federated across different kinds of memory, edge devices, and repositories, capturing relevant usage statistics, provenance, and other metadata.
- “Understanding the behavior of complex data management systems in DOE science” – Understanding how the behavior of users, application and system algorithms, and hardware can be combined and exploited to improve performance and resilience of scientific-data-management systems, recognizing that the relevant behaviors can change over time.
- “Rich metadata and provenance collection, management, search, and access” – Innovative methods for collecting and managing provenance and other metadata to support FAIR principles, resilience, and scientific reproducibility and discovery.
- “Reinventing data services for new applications, devices, and architectures” – Innovative methods to design scientific-data-management services for state-of-the-art storage and networking devices, including those providing computational capabilities.
Each pre-application and application must address, as its primary focus, one or more of these priority research directions.
If you intend to submit complete the notification form in the InfoReady portal to provide your contact information and the title and brief description of your project.
Questions concerning the limited submissions process may be submitted to limitedsubs@psu.edu.