By Michon Scott
In October 2020, NASA reported that its Earth science data holdings exceeded roughly twice the amount of information stored by the U.S. Library of Congress. Over the next five years, NASA predicted, its data holdings would likely grow more than six-fold. That remarkable statistic merely includes NASA Earth science data; it does not include data libraries assembled by the National Oceanic and Atmospheric Administration (NOAA), the National Science Foundation (NSF), or other science agencies and institutions.
As satellites, aerial surveys, science expeditions, and citizen-science networks continue to assemble data riches, the challenge grows to make all those observations findable and understandable. “You can’t do anything with data until you understand what the data are describing, how they’re describing it, or how you can read a data file,” says NSIDC’s Julia Collins. “Unless you can actually use the data, they’re worthless. They might as well not exist.”
Making—and keeping—data holdings findable and usable is a multipronged effort, but one of the sharpest prongs is metadata: data about data. Collins is a software engineer and metadata architect who joined NSIDC in 2006. In this Q&A, she discusses her evolving role at NSIDC and at its parent institution, the Cooperative Institute for Research in Environmental Sciences (CIRES), as well as outside organizations. This interview has been lightly edited for length and clarity.
Q: How did you come to CIRES and NSIDC?
I was working for a defense subcontractor, and I thought, “I need to get out of this classified environment. So, I’ll either get a new job or go back to school.” As it turned out, I did both because I got into a completely unrelated degree, in kinesiology, through the master’s program at CU, and then I found a job with the Climate Diagnostics Center at CIRES. When NOAA began reorganizing and merging operations, I started looking for new places, ideally within CIRES. NSIDC had a 60-percent position available at the time, so I split my time between NSIDC and the CIRES Center for Environmental Technology. Then I gradually moved over to the NSIDC full time.
CIRES is the result of a cooperative agreement between the University of Colorado Boulder and the National Oceanic and Atmospheric Administration (NOAA). As a CIRES employee, Collins was a participant of the original CIRES Members’ Council (CMC).
Q: How did you get involved in the CMC?
I was part of the group that came up with its name and structure, which was modified over time. For several years, I was part of that original members’ council, and also served as the representative to the CIRES Fellows and Executive Committee meetings. It was an interesting way to get some insight into how CIRES works.
Part of the purpose of the CMC was to get all the NOAA Labs, for example, to be more aware that CIRES was a collaborative relationship, not a contractor relationship. It was an important distinction to recognize we were colleagues. By having a representative from the members’ council on the CIRES Fellows Council, there was more interaction, more feedback, and more ability to interact with the people running CIRES. While CIRES is a research institution, the largest group of people in CIRES are not research scientists or Ph.D. scientists.
Q: What did you learn from your time working with the CIRES Fellows?
I learned that the Fellows are trying to do the right thing in general, especially with hires. It was interesting to hear the discussion there. A lot of times there was even a recognition that a lot of women in science have some disadvantages that men in science do not. I remember one discussion where I was impressed that there was some effort made to understand why a woman applying for a position wouldn’t have the same CV [curriculum vita], wouldn’t have as many publications, or wouldn’t have done this or that. So, I think some Fellows may have been a little bit ahead of the curve in recognizing that they couldn’t judge people just on the specifics of a traditional CV. But it also is an insight into how the academic world works. We’re not working in a business environment here by any means.
Q: What is metadata and why does it matter?
The simplest description of metadata is that it’s data about the data.
Metadata provides information that makes data usable. When you think about interdisciplinary use or interoperability of data, metadata is even more critical because it seems every discipline has its own way of describing, storing, or interacting with their data. That multiplies the hurdles you have to go through to compare data correctly. You want to be using the data in the way they’re intended to be used. So that’s why it’s important to understand what the data are.
Some of NSIDC’s software developers write code to take in low-level data, do the right things to tune the data, and then spit out a file that has higher-level data. I haven’t been involved with that, so much as I’ve been involved with describing the data and storing those descriptions in such a way that you can then easily search for the data and find and use the data. So, I’m involved in trying to make it easier for people to understand what is available to them.
Collins has supplemented her metadata management at NSIDC with involvement in the Research Data Alliance.
Q: What is the Research Data Alliance?
The Research Data Alliance has been going for about 10 years, and it’s an organization that focuses on research data of all kinds.
The organization has interest groups and working groups. The interest groups are long-lived, and the working groups are shorter. The organization has had a couple of interest groups on digital humanities, which have been interesting to follow, some on education, some on data management. I’ve had varying levels of involvement in a few different groups, including metadata management, and I try to stay abreast of developments in the citation group. The last few years, I have been the co-chair of the Software Source Code Interest Group. The push is treating software like a data product that also needs to be properly managed and archived and controlled.
At NSIDC, the Research Data Alliance, and in other organizations, there’s a growing movement to use professional software development techniques for data generation, so that the software is reproducible. That’s as opposed to the older FORTRAN (Formula Translation) code that’s been handed down from advisor to advisee, and then the next round, and the next. When the software is more carefully maintained and controlled and described, there’s more trust in how the data were produced.
Q: What were the biggest tasks you worked on for ELOKA?
For ELOKA, I formatted data so it could be reused. The data included observations coming from local communities. The focus here has generally been on the interfaces to get to the data.
Specifically for ELOKA, we did a little weather application, the Silalirijiit Project, that retrieves weather data from several sites and makes little plots, updated once an hour. That has been moved back to the community, and they’re managing the sites and data now.
SIZONet was another one we did that took observations from North Slope communities—on sea ice, weather, and wildlife—and added them to a database and then created a little interface to show summaries of that information. It was a combination of both handling the data and setting up the interface to get to the data.
Q: How have observations for ELOKA been collected?
People could collect data in different ways. They could use smartphones. They could be scribbling it down. They could just be remembering.
One of the interesting things that [former NSIDC scientist] Shari Fox worked on happened right about the time I started. She and some students developed a little handheld device, so that the people out hunting, if they saw something, could easily record observations. So, it probably had buttons for three or four different animals. If you saw three whales, you hit the button three times. It was a simple little handheld thing. It was brilliant.
The local and traditional knowledge gives you a different insight into mechanisms for data collection and meaning of information for subsistence communities. I consider that a really valuable part of what I’ve been able to do at NSIDC, just personally. I would never have thought about it had I not worked on those projects. To me, it’s just a scientific method. You observe, you collect, you analyze, and that’s the way it is. For those communities, what we would call data is just a part of their life. It’s information they’re using, refining, and handing down on a daily basis.
Q: What are the biggest changes you’ve seen in software development at NSIDC?
When I started, the software development group was smaller and more assigned to specific scientists. Over time, it’s been a struggle for the software group to figure out what we really are and how we want to work. It’s not always compatible with how we get funded for different projects.
We had a big push for Agile development around 2010, and some of that was successful. I think the push from the developers was in response to frustrations with how we were operating some projects, but it’s easy to go overboard on Agile or any software development practice. You can get way too serious about it. I think we found more of a happy medium for a lot of our projects.
Best practices for software development can collide with the academic way of doing things, which is entrenched and slowly changing. The academic approach says, “I write a proposal, I say what I’m going to do, I get money, and I do it. We don’t want people figuring out the technology ahead of time.” So, part of the struggle is getting people to consult with not only the developers, but also the writers and other groups before they write the proposals. We needed to have more voice, to give a little more context. So that has improved. We’re still not working in what you might consider a typical software development environment. But I think software development at NSIDC is a more cohesive group. We also have better technological support now. I no longer worry about whether my laptop will work as expected, or whether I’ll be able to reach the machines I need to do my job.
What exactly changed at NSIDC? I’m not sure. But I think overall, it’s a more professional organization, hopefully in good ways.
Q: What should people know about your time at NSIDC?
When I was little, I wanted to be a “scientist” when I grew up. Even though I ended up with a job description that doesn’t include the word “scientist,” working at NSIDC allows me to support the science and feel like I’m making a small, positive impact on our understanding of the world.