(CLOSED) Management and Storage of Scientific Data (DE-FOA-0002725

Sponsor Name: 
DOE
Description of the Award: 

The DOE SC program in Advanced Scientific Computing Research (ASCR) hereby announces its interest in basic research in computer science exploring innovative approaches to the management and storage of scientific data.


Supplementary Information
Modern scientific computing relies on processing a deluge of data coming from both experiments and simulations, with even relatively modest scientific activities generating petabytes of data. Planned upgrades of experimental facilities in the foreseeable future, combined with the increased computing capabilities of DOE’s exascale supercomputers and other state-of-the-art computing capabilities coming online over the next few years (for more information, see https://science.osti.gov/ascr/Facilities/User-Facilities/Upgrades), promise to compound the many challenges in storing and managing data such that it can be effectively used to fuel scientific discovery [2-12].

Traditional large-scale scientific data management has relied on the use of file formats optimized for simple access patterns on parallel, distributed file systems. These files have tended to be metadata poor and complicated to access, lacking flexible indexing for efficient searching, where enabling new kinds of analysis often requires writing new, low-level code. Scientific workflows have also become increasingly complicated, integrating both simulation and the analysis of data from experiments, exploiting advanced machine-learning techniques, and requiring distributed, multi-stage processing. Additionally, significant opportunities exist to enhance trust and aid scientific reproducibility by enhancing our ability to record data provenance and verify data integrity. Fortunately, through a combination of past scientific-data-management investments and leveraging the growing ecosystem of big-data and database technologies, scientific endeavors have made significant improvements in their data management and use. While the ever-increasing scale of scientific data threatens that progress, new “smart” storage and networking technologies that provide embedded computational capabilities; novel methods for indexing, representing, and distributing data; and advanced techniques for interfacing with data management systems and integrating into programming environments promise significant breakthroughs. Moreover, new techniques for scientific data management can help integrate data into large scientific-data and computational ecosystems that embody the FAIR principles of Findability, Accessibility, Interoperability, and Reuse, thereby enabling collaborative, responsive science at yet-unprecedented scales.

Priority Research Directions
As highlighted by the recent ASCR Workshop on the Management and Storage of Scientific Data, building on the outcomes of prior community activities, including Storage Systems and I/O: Organizing, Storing, and Accessing Data for Scientific Discovery and the Office of Science Roundtable on Data for AI, and aligned with needs highlighted by interagency planning important priority research directions are:

  1. “High-productivity interfaces for accessing scientific data efficiently” – Innovative interfaces to data-management capabilities allowing for flexible, high-performance access to large data sets, potentially federated across different kinds of memory, edge devices, and repositories, capturing relevant usage statistics, provenance, and other metadata.
  2. “Understanding the behavior of complex data management systems in DOE science” – Understanding how the behavior of users, application and system algorithms, and hardware can be combined and exploited to improve performance and resilience of scientific-data-management systems, recognizing that the relevant behaviors can change over time.
  3. “Rich metadata and provenance collection, management, search, and access” – Innovative methods for collecting and managing provenance and other metadata to support FAIR principles, resilience, and scientific reproducibility and discovery.
  4. “Reinventing data services for new applications, devices, and architectures” – Innovative methods to design scientific-data-management services for state-of-the-art storage and networking devices, including those providing computational capabilities.

Each pre-application and application must address, as its primary focus, one or more of these priority research directions.

Limit (Number of applicants permitted per institution): 
2
Sponsor LOI Deadline: 
May 05, 2022
Sponsor Final Deadline: 
Jun 13, 2022
OSVPR Application or NOI Instructions: 

If you intend to submit complete the notification form in the InfoReady portal to provide your contact information and the title and brief description of your project.

To be considered as a Penn State institutional nominee, please submit a notice of intent by the date provided directly below.
Penn State OSVPR NOI Deadline: 
Wednesday, April 20, 2022 - 4:00pm
This limited submission is in downselect: 
Penn State may only submit a specific number of proposals to this funding opportunity. The number of NOIs received require that an internal competition take place, thus, a downselect process has commenced. No Penn State researchers may apply to this opportunity outside of this downselect process. To apply for this limited submission, please use this link:
For help or questions: 

Questions concerning the limited submissions process may be submitted to limitedsubs@psu.edu.

Notes: 
Mahmut Kandemir (CoE); PSU is eligible to submit one additional pre-application to DOE, contact limitedsubs@psu.edu if interested.