Jan 13, 2014

Identifier Test-bed Activities Report (ESIP Federation)

Below is a brief summary from a recent report to ESIP Federation's Data Stewardship Committee that evaluated identifier schemes for Earth system science data and information(see also executive summary and links). The report seems to be a hands-on continuation of the paper published in 2011 "On the utility of identification schemes for digital earth science data: an assessment and recommendations" by Ruth Duerr and others(link).

The paper introduced four uses cases and three assessment criteria:

Use cases:
  • unique identification (identify a piece of data, no matter which copy)
  • unique location (locate an authoritative copy)
  • citable location (identify cited data)
  • scientifically unique identification (to tell whether two data instances have the same info even if the formats are different)
Assessment criteria:
  • Technical value (e.g., scalability, interoperability, security, compatibility, technological viability)
  • User value (e.g., publishers' commitment, transparency)
  • Archive value (e.g., maintenance, cost, versatility)

The report took those use cases, expanded assessment criteria and used all of it to test the implementation of 8 identification schemes, DOI, ARK, UUID, XRI, OID, Handles, PURL, LSID, and URI/URN/URL, using two datasets: the Glacier Photo Collection from the National Snow and Ice Data Center (JPEG and TIFF images) and a numerical data set from the NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) sensor.

Report recommendations:

  • UUID are most appropriate as unique identifiers, any other use requires effort.
  • DOI, ARK and Handles are the most suitable as unique locators, DOI and ARK also support citable locators. Handles need a local dedicated server. ARKs are cheaper than others, but DOIs are accepted by publishers.
  • PURL has no means for creating opaque identifiers and the API support for batch operations is poor.
  • The rest of the ID schemes are less suitable.

It seems that the overall conclusion is that DOI and ARK are generally better, but there is a need for support of multiple ID schemes in a system. From the report I didn't quite get whether any of the ID schemes can support the fourth use case - scientifically unique identification. The paper argued that "none of the identifier schemes assessed here even minimally address this use case".