Apr 25, 2013

Strategy for Civil Earth Observations - Data Management for Societal Benefit

The US National Science and Technology Council recently released a National Strategy for Civil Earth Observations. The goal of this strategy is to provide a framework for developing a more detailed plan that would enable "stable, continuous, and coordinated global Earth observation capabilities for the benefit of society."

The strategy establishes a way to evaluate Earth-observing systems and their information products around 12 societal benefit areas: agriculture and forestry, biodiversity, climate, disasters, ecosystems (terrestrial and freshwater), energy and mineral resources, human health, ocean and coastal resources, space weather, transportation, water resources, weather, and reference measurements. The production and dissemination of information products should be based on the following principles:

  • Full and open access
  • Timeliness
  • Non-discrimination
  • Minimum cost
  • Preservation
  • Information quality
  • Ease of use
Data management in federal agencies that are responsible for earth science data is described based on the three components of the data life cycle: planning and production, data management, and usage. The latter two components are the main focus of the data management strategy. The suggested activities for those are:

  • Data management
    • Data collection and processing - initial steps to store data and create usable data records.
    • Quality control - follow the principles of the “Quality Assurance Framework for Earth Observation” (QA4EO)
    • Documentation - basic information about the sensor systems, location and time available at the moment of data collection, etc.
    • Dissemination - data should be offered in formats that are known to work with a broad range of scientific or decision-support tools. Common vocabularies, semantics, and data models should be employed.
    • Cataloging - establishing formal standards-based catalog services, building thematic or agency-specific portals, enabling commercial search engines to index data holdings, and implementing emerging techniques such as feeds, self-advertising data, and casting.
    • Preservation and stewardship - guarantee the authenticity and quality of digital holdings over time.
    • Usage tracking - measuring whether the data are actually being used; to enable better usage tracking, data should be made available through application programming interfaces (APIs).
    • Final disposition - not all data and derived products must be archived, derived products that most users have access to may adequately replace raw data and processing algorithms.
  • Usage activities
    • Discovery - enabled by dissemination, cataloging and documentation activities.
    • Analysis - includes quick evaluaionts to assess the usefulness of a data set and an actual scientific analysis.
    • Product generation - creating new products by averaging, combining, differencing, interpolating, or assimilating data.
    • User feedback - mechanisms to provide feedback to improve usability and resolve data-related issues.
    • Citation - different data products, e.g., classifications, model runs, data subsets, etc., need to be citable.
    • Tagging - identify a data set as relevant to some event, phenomenon, purpose, program, or agency without needing to modify the original metadata.
    • Gap analysis - the determination by users that more data are needed, which influences the requirements-gathering for new data life cycles.

Each activity raises a lot of questions and challenges. The activities of cataloging, usage tracking, final disposition, tagging and gap analysis are particularly interesting. They raise questions that are rarely addressed in the data management literature. Does anybody use data that are being shared? Do all the data need to be preserved? How can we avoid duplicates and unnecessary modifications of metadata if data are being re-used? To what extent do we need to serve immediate user interests versus the future possibilities for research?