Spotlight

Libre frees polar data

An Arctic researcher removes a snow core sample. — Credit: Andrew Slater

Knowledge is the common wealth of humanity. --Adama Samassekou, Convener of the UN World Summit on the Information Society

By Jane Beitler

Science is increasingly based on data, yet the systems and tools for sharing data lag the immediacy and variety of data being produced. The rapid changes being observed in Earth’s frozen regions—thawing ground, retreating glaciers, melting ice sheets, and waning summer sea ice extent--are a case in point. Scientists who collaborated during the recent International Polar Year identified ease of data sharing as key to help pace scientific understanding with the speed of climate-induced changes being observed. A project at NSIDC called Libre puts data sharing tools in the hands of researchers—tools that are free and easy to use, and address some of the common sticking points that prevent open sharing.

Simplifying data sharing

Libre is a project devoted to liberating science data from traditional constraints of publication, location, and findability. Leveraging open-source technology and data management standards, Libre's Web-based tools and services make it easy for scientists to publish and advertise their data and share it with the world.

Data sharing diagram
Libre provides simple tools for data sharing that can be used by individual investigators and research projects of any size.

A major challenge recognized throughout the Earth science data community is the problem of uniform discovery of all data relevant to a particular user's needs. The problem is that in most cases, relevant data may be found in any number of discipline-specific repositories, national data centers, organizational repositories, and libraries. Or the data may not reside in any repository at all. Instead, the data may reside with an individual researcher, laboratory, or work group. In this case, it can be difficult for an investigator to find or obtain the data. Libre allows data providers, whether an individual investigator with a single data set to share or a data repository with potentially hundreds of data sets to share, to advertise their holdings in a Web-discoverable way, and for registries to find those advertisements wherever they are located, and to aggregate those that are relevant to their particular user communities.

Tools and services

Libre offers several tools to these ends. First, the Libre Collection Caster is a Web-based tool that creates Atom feeds so data providers can advertise their data sets, and users and computers on the internet can discover the data quickly and more efficiently. The Collection Caster uses Atom technology, which means that the Collection Cast or feed can be viewed with any desktop or browser-based feed reader. Once data are exposed to the Web, Libre’s OpenSearch Application Programming Interface (API) provides a simple way to discover and access data holdings that have been found by Libre’s web crawling and aggregation services. The API can be tailored for queries specific to a user's research, and return ATOM feeds listing the search results. For example, the Libre OpenSearch API underlies the NASA DAAC IceBridge Portal developed by NSIDC. From the Portal, users can subscribe to a feed listing all the related data sets that match their data query and that are known to the NSIDC system. Once subscribed, whenever a new data set matching the query criteria becomes available, or one of the existing data sets on their personal feed is updated, users are notified. And finally, Libre makes it easy for data providers to clearly communicate to others about appropriate reuse of their data. Libre’s badging tool makes it simple to declare your data open for broad use, while asserting that such data should be used according to the Ethical Norms of Data Sharing developed by the polar science community. This tool provides the data provider with a Creative Commons badge to display on their Web pages and in their data documentation that conveys their wishes in a way readily understood by both people and data systems.