National Science Foundation (NSF) Big Data Regional Innovation Hubs: Establishing Spokes to Advance Big Data Applications (BD Spokes)

Sponsor Name: 
Description of the Award: 

This solicitation expands upon the BD Hubs network established by the solicitations entitled Big Data Regional Innovation Hubs (BD Hubs): Accelerating the Big Data Innovation Ecosystem solicitation (NSF 15-562) and Big Data Regional Innovation Hubs (BD Spokes): Establishing Spokes to Advance Big Data Applications (NSF 16-510).

There are two proposal categories covered by this solicitation: SMALL and MEDIUM BD Spokes.

All (SMALL or MEDIUM) BD Spoke proposals submitted in response to this solicitation must include a Letter of Collaboration from a regional BD Hub. Proposals not including a Letter of Collaboration from a BD Hub will be returned without review. No exceptions will be made.

NORTHEAST: This region includes Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, and Vermont.

Overarching Themes

This BD Spokes solicitation supports Big Data activities in a specific topic area of interest to a corresponding regional BD Hub. The activities of a BD Spoke should address one or more of the following Big Data Innovation themes:

  • Accelerating progress towards societal grand challenges relevant to regional and national priority areas. Due to the pervasiveness of Big Data in virtually all national priority areas, the BD Spokes have the opportunity to bring rapid change in application areas by facilitating the creation of interdisciplinary and multidisciplinary data-intensive teams.
  • Helping to automate the Big Data lifecycle. Managing the end-to-end lifecycle of Big Data assets can be a tedious and manual task. Steps in the data lifecycle include: ingestion, validation, curation, quality assessment, anonymization, publication, active data management, and analysis (including information extraction, visualization, and annotation). Automated (or semi-automated) techniques are needed in order to keep up with the rapid data rates, large volumes, and immense heterogeneity of Big Data. Automation may also aid the reproducibility of data processing and analysis workflows. The data challenges and lessons learned by a BD Spoke on such automation efforts are expected to be shared with the BD Spoke's stakeholders as well as more broadly across the network of BD Hubs and Spokes.
  • Enabling access to and increasing the use of important and valuable available data assets, including international data sets, where relevant. Many valuable data sets are underutilized, and results from the analysis of such data are not shared, due to a variety of actual or perceived costs, including cost of curation, cost of data reuse, attribution and intellectual property considerations, etc. One of the desirable roles for a BD Spoke is to act as a catalyst for organizing and sharing data sets and related data services among a larger set of stakeholders, across disciplinary areas, within the geographic region, or across the national community. BD Spokes are expected to play an important role in supporting and promulgating open data and open source software policies within their projects—at the BD Hub level and across BD Spokes—to further facilitate the sharing of data and outcomes of analyses. In addition, issues of data security and privacy are expected to be addressed at the BD Hub level and across BD Spokes.


Topics and Application Areas

Proposed BD Spoke projects are expected to focus on their articulated regional challenges and opportunities. In particular, this solicitation welcomes submissions addressing the following areas of emphasis:

  • Education: Support innovations in software infrastructure and the use of education and learning data sets arising from both administrative data and information collected from interactive learning systems to improve learning outcomes. Projects could also propose to develop innovative education and/or workforce development and training programs that both broaden participation in Big Data research and development activities and enable a workforce for the 21st century. Workforce and training activities will be evaluated on their innovativeness and their ability to be replicated in new environments.
  • Data Intensive Research in the Social, Behavioral, and Economic Sciences: Accelerate research infrastructure and frameworks that integrate and operate on data from multiple sources including administrative data; scientific instruments from large-scale surveys, brain research, large-scale simulations, etc.; digitally-authored media, including text, images, audio, and emails; and streaming data from weblogs, videos, and financial/commercial transactions.
  • Data-driven Research in Chemistry: Encourage innovative partnerships that capitalize on the data revolution ( and utilize discovery-based science to verify scientific predictions and insights in chemistry. This area of emphasis looks for formation of new alliances to accelerate the discovery of new chemical species with predicted properties and/or new chemical reactions using approaches such as large-scale data analysis, data architectures, or machine learning. Proposal topics must be in alignment with the core research programs within the Division of Chemistry (CHE;
  • Neuroscience: Engage questions and opportunities in neuroscience that leverage BD Hub resources, such as enabling large scale, integrative modeling, sharing of diverse data and resources, and other neuroscience and neurotechnology approaches that require very large-scale, complex, or diverse data. Connections to other NSF programs on neuroscience research ( are welcome.
  • Data Analytics for Security: Better analytics and detection of security- and privacy-related patterns, anomalies, trends and changes in BD Spoke applications and/or regional data exchanges. Development of statistical, computational and/or interdisciplinary methods for improving BD Spoke security/privacy/trustworthiness through the management, exploration, analytics, mining, and visualization of structured or unstructured BD Spoke data from disparate sources.
  • Replicability and Reproducibility in Data Science: Facilitate robust and reliable science by improving the replicability and reproducibility of research instruments, procedures, codes and results.


Project Description (5-page limit): Include the title of the BD Spoke project, the name of the PI and the lead institution. Provide a summary description of the project. The project description must include:

Mission Statement: The Project Description must begin with a concise statement of the project vision.

Broader Impacts of the Proposed Work: The Project Description must provide a discussion of the broader impacts of the proposed activities.

Technical Description: The Project Description must provide a detailed explanation of the science and engineering undergirding the proposed project as it relates to technical aspects of Big Data upon which the project touches.

Roles and Responsibilities: The Project Description must list the specific roles of the collaborating universities, industry partners, government agencies, non-profits and other organizations involved.

Collaboration with host BD Hub and other BD Spokes: The Project Description must articulate how the BD Spoke will be managed and integrated into BD Hub activities.

Sponsor Final Deadline: 
Sep 18, 2017
To be considered as a Penn State institutional nominee, please submit a notice of intent by the date provided directly below.
This limited submission is in downselect: 
Penn State may only submit a specific number of proposals to this funding opportunity. The number of NOIs received require that an internal competition take place, thus, a downselect process has commenced. No Penn State researchers may apply to this opportunity outside of this downselect process. To apply for this limited submission, please use this link:
OVPR Downselect Deadline: 
Friday, June 9, 2017 - 4:00pm