DIKW: Data, Information, Knowledge, Wisdom: Identifier Test-bed Activities Report (ESIP Federation)

Below is a brief summary from a recent report to ESIP Federation's Data Stewardship Committee that evaluated identifier schemes for Earth system science data and information(see also executive summary and links). The report seems to be a hands-on continuation of the paper published in 2011 "On the utility of identification schemes for digital earth science data: an assessment and recommendations" by Ruth Duerr and others(link).
The paper introduced four uses cases and three assessment criteria:

Use cases:

unique identification (identify a piece of data, no matter which copy)

unique location (locate an authoritative copy)

citable location (identify cited data)

scientifically unique identification (to tell whether two data instances have the same info even if the formats are different)

Assessment criteria:

Technical value (e.g., scalability, interoperability, security, compatibility, technological viability)

User value (e.g., publishers' commitment, transparency)

Archive value (e.g., maintenance, cost, versatility)

The report took those use cases, expanded assessment criteria and used all of it to test the implementation of 8 identification schemes, DOI, ARK, UUID, XRI, OID, Handles, PURL, LSID, and URI/URN/URL, using two datasets: the Glacier Photo Collection from the National Snow and Ice Data Center (JPEG and TIFF images) and a numerical data set from the NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) sensor.
Report recommendations:

UUID are most appropriate as unique identifiers, any other use requires effort.
DOI, ARK and Handles are the most suitable as unique locators, DOI and ARK also support citable locators. Handles need a local dedicated server. ARKs are cheaper than others, but DOIs are accepted by publishers.
PURL has no means for creating opaque identifiers and the API support for batch operations is poor.
The rest of the ID schemes are less suitable.

It seems that the overall conclusion is that DOI and ARK are generally better, but there is a need for support of multiple ID schemes in a system. From the report I didn't quite get whether any of the ID schemes can support the fourth use case - scientifically unique identification. The paper argued that "none of the identifier schemes assessed here even minimally address this use case".

DIKW: Data, Information, Knowledge, Wisdom

Pages

Jan 13, 2014

Identifier Test-bed Activities Report (ESIP Federation)

No comments:

Post a Comment