Since 1993, the NASA Earth Observing System Data and Information System (EOSDIS), with the help of its Distributed Active Archive Centers (DAACs), has provided free and open long-term measurements of our changing planet. These data, collected from satellites, aircraft, field campaigns, and on-the-ground instruments, help scientists to better understand even the most remote regions of Earth. During the past 30 years, however, the ways in which data are collected, stored, and managed have shifted dramatically. The sheer volume of data coming in from NASA missions has increased drastically, creating the need for a cost-effective, scalable storage system. To address this need, NASA is moving their Earth science data over to the Earthdata Cloud, a commercial cloud environment hosted in Amazon Web Services. While the cloud introduces multiple benefits, users face some challenges due to this move as well, including learning when and why to work in a cloud environment and how to migrate workflows to the cloud. The NSIDC DAAC and other DAACs, aware of these oncoming challenges, are already shoring up resources to help users adopt a cloud-based workflow as smoothly as possible.
The Openscapes Framework
While moving data to the cloud, two major priorities for NASA and its DAACs are to help researchers move their workflows and to enable more open science. Beginning in 2021, NASA awarded a three-year grant to Openscapes, an organization dedicated to “championing open practices in environmental science to help uncover data-driven solutions faster,” to help users in transitioning workflows to the cloud. The organization developed the NASA Openscapes Framework, a leadership training and community-building framework, for this purpose. The framework includes three components: engaging DAAC mentors, empowering science research teams, and amplifying open science leaders. Scientists and staff from the various NASA DAACs, including the National Snow and Ice Data Center (NSIDC) DAAC, act as mentors, lead workshops, and develop tutorials and guides within the framework of Openscapes on how to access and work with NASA data in the cloud.
The mentors have one clear goal: help users move to the cloud. Once they started talking through how to best do that, they realized that they needed to develop clear resources for beginners on how to get started working in the cloud, like simple how-to guides and examples driven by user needs and science use cases. “Even the simplest questions, such as ‘What is the Earthdata Cloud?’ and ‘How do you work in the cloud?’ were not answered in any easy-to-access locations,” said Amy Steiker, a data services engineer at NSIDC and an NSIDC DAAC Openscapes mentor. “So, we worked together to address that.”
The Earthdata Cloud Cookbook takes shape
From these conversations, the Earthdata Cloud Cookbook, a learning-oriented resource to support scientific researchers who use NASA Earth data as NASA migrates data to the cloud, began to take shape. Said Steiker, “It organically developed as we pinpointed this need for a common set of how-to guides and tutorials on how to work in the Earthdata Cloud.”
The Earthdata Cloud Cookbook is being developed in GitHub, which is a “code hosting platform for version control and collaboration” that allows people to work together on projects from anywhere in the world. “GitHub enables greater collaboration,” said Steiker. “Instead of us building the Cookbook in a vacuum and throwing it out to the community, it’s all built in a collaborative way. You can see how we have built everything behind the scenes; it’s all well-documented and all of the code and infrastructure that goes into making the Cookbook is accessible and available.”
“The Cookbook is a work in progress,” said Steiker. “There’s this concept in Openscapes that’s a ‘future us’ mindset—the idea that we don’t want to build something once and never document it or know how to come back to it. We want to document it well so that people can collaborate and also come back to it later. It’s available for people to use but we also continue to iterate on it, and our process is all very open.”
Identifying user needs
Mentors want to know how to improve the Earthdata Cloud Cookbook, so the team has taken feedback from related workshops, tutorials, and hands-on events. During these events, they have asked data users questions such as ‘What is challenging or confusing about working in the cloud?’ and ‘What are we missing in terms of resources?’ and applied them directly to the Cookbook. "We realized, for example, that some people just need to see these concepts presented in a visual way,” said Steiker. “So, we took that feedback and developed some cheat sheets to help with this.”
In addition, the mentors realized that most of their resources were using the Python programming language. However, many of their users also work in the R programming language. So they are working on developing tutorials in both languages to serve the most users possible. “We are not finished with these yet,” said Steiker. “But we are thinking about our user needs and how best to serve them.”
The mentor team hopes to continually improve upon the Earthdata Cloud Cookbook and resources available to users that are moving workflows to the cloud. To submit feedback, including requests for additional use cases or examples, users can enter an issue in the GitHub repository. The Earthdata science community is also welcome to contribute to “hackdays” aimed at adding to and improving the Cookbook. To keep up with announcements on future hackdays and other topics of discussion surrounding the Cookbook, users can follow the “discussions” area in the GitHub repository. To contribute to the Cookbook, users should check out the “Workflow for contributing to our Cookbook” page on GitHub.