Feature Story

On the shoulders of giants: Why include data citations in research?

By Laura Naranjo

When pursuing a new line of study, where does a researcher begin? Most start by poring through existing research, digging into books and journals and drilling down into the appended reference lists. References have long been a staple of scholarly literature, allowing researchers to cite the findings their work builds on. Google Scholar has even adopted the phrase “Stand on the shoulders of giants” on its home page, paraphrasing Isaac Newton.

But most research also involves data collection or analysis, and locating that supporting data can be another story. Unlike literature sources, data sources have not been commonly or thoroughly cited.

Giving data its due

Data citations have often gotten short shrift, until recently. The rise of informatics, or the science of preserving data and making it accessible, promoted policies that ensure data are archived and protected. Gone are the days when data might be relegated to a hard drive in a basement somewhere. Researchers and their funding agencies now share a duty to preserve data. Citing that data and making it available are part of that responsibility. NSIDC has a long and successful tradition not only as a data archive, housing everything from glacier photographs to near-real time satellite data, but as a model for data citation. For more than ten years, the user guide that accompanies every data product has included a full citation. In addition, NSIDC offers guidelines for citing data on its Use and Copyright page.

Similar to books and other resources, data citations are a way to locate that exact data, including a product title, the year produced or updated, the author or data collector, the data distributor, and a link to or description of where to find the files. Because most of the data NSIDC offers are accessible online, in 2012 NSIDC also began including a digital object identifier (DOI). A DOI is a unique string of characters assigned to an individual item such as a journal article, publication, website page, or data set. DOIs are a stable, permanent way to locate the data. The DOI will stay the same, even if the data are moved. DOIs are part of each NSIDC data citation, and link to each product’s web page.

Screen capture of NSIDC data product page showing the recommended data citation
To help promote data citations, NSIDC includes a full citation as part of each data set’s documentation. — Credit: NSIDC

Measuring the impact

Citations are a goldmine of information for those engaged in similar research. The ability to quickly ferret out a resource makes it much less tedious to replicate a finding or understand how a researcher came to a particular conclusion. And the benefits extend beyond the research itself. Citing data also credits the data producer, making it easier to track how other researchers are using that data, and how often. In 2006, for instance, 106 studies cited data distributed by NSIDC. By 2015, that number grew to 641, testament not only to NSIDC’s impact on snow and ice research, but also to the increased use of data citations.

As data plays a growing and ever more public role in research, citations become critical. Funding agencies increasingly request that data acquired during research are safely archived and made publicly available. Citing data helps funding agencies measure a data product’s reach, and can provide impetus to further fund a line of research. Being able to locate and understand the data is also critical to verifying or reproducing a researcher’s results.

Almost all research relies on those who have made previous discoveries. Future success hinges on making every one of those discoveries, including the underlying data, available to others. NSIDC retains teams of scientists, project leads, and data specialists who work with in-house technical writers to make sure data citations increasingly find their way onto references lists, lending their weight to future research. If researchers must stand on the shoulders of others, including data citations gives them a leg up.

Chart showing increasing number publications citing NSIDC-distributed data
NSIDC has included a citation in each data product’s documentation. Providing a citation makes it easier for researchers to include in reference lists, and the number of citations of NSIDC-distributed data has grown over the past ten years. — Credit: NSIDC

For more information

NSIDC Use and Copyright: Citing NSIDC Scientific Data Sets