Data Archiving and Citation

(updated 23 June 2016)

1. Introduction

In a 2013 policy statement, the American Meteorological Society (AMS) affirmed its commitment to promoting full, open, and timely access to environmental data, associated metadata, and derived data products within the Earth system science community. As part of this commitment, AMS expects that all scholarly papers published in its journals contain sufficiently detailed references to publicly available sources of information (literature and data) and methods such that independent research can test the paper’s scientific conclusions. This expectation assumes that the data and metadata upon which the conclusions rest are properly cited and are readily available to the scientific community and the general public. At the time of submission, authors will be queried as to whether their datasets are archived and cited/referenced, and peer-review editors are encouraged to check whether this AMS expectation is being met.

2. Archiving data

AMS encourages authors to archive their data in a repository that can ensure the longevity and continued utility of datasets. Datasets provide the evidence used to justify the conclusions of the publication. Data may be generated by various means, including observation, computation, or experiment. AMS discourages the archiving of data on personal servers and websites because of their lack of permanence.

Data should be archived if they underlie the scientific findings in a paper. The best option is to archive data with an established data center. Many data repositories exist for different kinds of data. The American Geophysical Union (AGU) maintains a list of suggested data centers for geoscience data. A larger and more general list of data repositories is hosted at the Registry of Research Data Repositories (re3data). If data centers are not available or appropriate for particular datasets, authors are encouraged to investigate other data archiving options, including their local institutional library. AMS journals also allow data to be published as supplements to articles, although this option should be considered only after data centers and university libraries are investigated. If none of these options are available, authors are expected to provide a transparent process to make the data available to anyone upon request.

AMS recognizes that in some instances sharing data may not be feasible. For example, studies collecting sensitive data about human subjects should be shared only if stringent safeguards exist to ensure confidentiality and protect the identity of subjects. Authors are expected to be compliant with applicable institutional review board and funding agency policies and regulations when collecting human subject data. Any other limitations or restrictions on sharing data, such as proprietary or other legal restrictions, must be reported to the journal editor for consideration at the time of submission.

3. Creating data citations

Citations to datasets should be provided for any datasets used to produce a scientific paper. Data citations should be listed in the reference list of the associated article. Data citations should enable readers to identify and find the dataset(s) related to a given publication. The Federation of Earth Science Information Partners (ESIP) has produced data citation recommendations for the Earth system sciences. Please refer to their website for general guidance and for information on each individual citation element listed below. Please also check with your data providers, because many data providers request that data users cite their data in particular ways. Datasets that are not curated or cannot be reliably made available to anybody requesting data should not be cited in AMS publications but should be noted through an in-text reference to unpublished data, as noted below.

a. In-text data citation style format

The in-text citations should be formatted the same as other publication types, using the author’s name and year of publication [e.g., “dataset produced by Knutti (2014),” or “as shown by an earlier dataset (Knutti 2014)”]. When dataset authors consist of organizations with lengthy names, abbreviate the author names appropriately [e.g., use “(NCEP 2005)” instead of “(National Centers for Environmental Prediction 2005)”].

If the citation is for a reference with two authors, use both author names [e.g., “Yeager and Large (2008)”]. References with three or more authors are always cited as the first author’s name followed by “et al.” [e.g., “Lawrimore et al. (2011)”].

Data that are not curated or available upon request should be referenced as “unpublished data,” giving the name(s) of the person(s) who provided the data [e.g., “(V. Ferrera 2000, unpublished data)” or “V. Ferrera (2000, unpublished data)”]. In this example, the year is when the paper author obtained the data from the cited person(s). If the unpublished data are the authors’ own data, the authors’ names should be listed with the year of dataset creation.

b. Reference list data citation style format

The basic format for datasets is the following: Dataset authors/producers, data release date: Dataset title, version. Data archive/distributor, access date in standard AMS format, data locator/identifier (doi or URL).

Other citation elements should be used where applicable, such as dataset editors, subsets used, and the data archive or distributor physical location. Reference examples are given below.

c. Citing processed/derived data

Findings presented in scientific articles are often the product of multistage workflows that involve combining, extracting, processing, and deriving datasets. Information generated by numerical simulation models should also be regarded as derived data. Data citations, in these cases, should be to any external datasets that were used from an external source, and, if possible, to the final derived dataset(s), if they are archived in a reliable location. The goal is to provide transparency and traceability for the results of computational processes and models. For particular cases, it may be more appropriate to provide citation and access to processing or model software than to the output data themselves. Questions about this should be discussed with the journal editors.

d. Citing papers that describe a dataset vs citing datasets themselves

The most common current “data citation” practice is to cite a published paper that describes a dataset or presents findings that are based on a dataset. The problems with this approach are 1) such papers may not link directly to the datasets themselves and 2) such papers might be out of synchronization with the datasets themselves, particularly as the datasets may be updated or revised over time. The general recommendation is to cite both the paper and the datasets themselves, because the two citations accomplish different things. The paper citation links to an important (but incomplete) source of information about the dataset, and the data citation links directly to the datasets and associated metadata.

e. Acknowledging datasets vs citing datasets

Authors may have previously used the acknowledgement section to state the name and source of data used in a submitted paper. This is no longer considered to be the proper method of dataset citation, and authors are advised to cite datasets as outlined in this document. An acknowledgement may be included if explicitly requested by the dataset provider, but because acknowledgements often do not contain detailed information about the dataset, the recommendation is to also create a formal citation as outlined here. If authors are unsure of the details needed for a dataset citation, they are encouraged to contact the dataset provider for the specific information.

f. Dataset reference examples

Cavalieri, D. J., C. L. Parkinson, P. Gloersen, and H. J. Zwally, 1996: Sea ice concentrations from Nimbus-7 SMMR and DMSP SSM/I-SSMIS passive microwave data, version 1. Subset used: Northern Hemisphere daily data (updated yearly), NASA National Snow and Ice Data Center Distributed Active Archive Center, accessed 10 February 2016, doi:10.5067/8GQ8LZQVL0VL.

CERES Science Team, 2015: CERES SYN1deg, version Edition 3A. Subset: Daily, March 2000–May 2013, NASA Atmospheric Science Data Center, accessed 11 February 2016, doi:10.5067/Terra+Aqua/CERES/SYN1degDAY_L3.003A.

CloudSat, 2007: 2B-CLDCLASS-LIDAR P_R04. Subset: 2007–2010, CloudSat Data Processing Center, accessed 19 February 2016. [Available online at]

Comiso, J. C., 2000: Bootstrap sea ice concentrations from Nimbus-7 SMMR and DMSP SSM/I-SSMIS, version 2. Subset used: Northern Hemisphere daily data (updated yearly), National Snow and Ice Data Center Distributed Active Archive Center, accessed 10 February 2016, doi:10.5067/J6JQLS9EJ5HU.

Dutton, E. G., D. Halliwell, A. Herber, M. Maturilli, and V. Kustov, 2014: Basic measurements of radiation from the Baseline Surface Radiation Network (BSRN) of five stations in the years 1993 to 2013 for the December, January, and February seasons, reference list of 142 datasets. PANGEA, accessed 1 June 2015, doi:10.1594/PANGAEA.150003.

ECMWF, 2009: ERA-Interim project. National Center for Atmospheric Research Computational and Information Systems Laboratory Research Data Archive, accessed 30 March 2014, doi:10.5065/D6CR5RD9.

Eicken, H., R. Gradinger, T. Heinrichs, M. Johnson, A. Lovecraft, and M. Kaufman, 2012: SMMR and SSM/I derived dates of Arctic sea ice surface melt/freeze. UCAR/NCAR–CISL–ACADIS, doi:10.5065/D6KW5CXQ.

GPM Science Team, 2014: GPM GMI Level 1C Common Calibrated Brightness Temperatures Collocated, version 03. NASA Goddard Earth Science Data and Information Services Center, accessed 10 February 2016. [Available online at]

Hopkins, R., T. Addis, and the Puma Ocean Racing Team, 2015: Sea surface temperature (SST) and surface current data collected from the Mar Mostro during the around-the-world Volvo Ocean Race (VOR) from 2011-11-05 to 2012-07-12, version 1.1. NCEI Accession 0130694, NOAA/National Centers for Environmental Information, accessed 18 August 2015. [Available online at]

Iguchi, T., and R. Meneghini, 2014: GPM DPR Level 2A DPR Environment, V03 (GPM 2ADPR), version 03. NASA Goddard Earth Science Data and Information Services Center, accessed 8 April 2015. [Available online at]

Knutti, R., 2014: IPCC Working Group I AR5 snapshot: The rcp85 experiment. DKRZ World Data Center for Climate, accessed 14 October 2014, doi:10.1594/WDCC/ETHR8.

Lawrimore, J. H., M. J. Menne, B. E. Gleason, C. N. Williams, D. B. Wuertz, R. S. Vose, and J. Rennie, 2011: Global Historical Climatology Network–Monthly (GHCN-M), version 3. NOAA National Climatic Data Center, accessed 14 October 2014, doi:10.7289/V5X34VDR.

Mocko, D., 2013:  NASA/GSFC/HSL, NLDAS Forcing Data L4 Monthly Climatology 0.125 x 0.125 degree, version 001. Goddard Earth Sciences Data and Information Services Center, accessed 10 February 2016, doi:10.5067/AAX6TSE317FP.

NOAA/NCEP, 1995: NCEP/NCAR Global Reanalysis 8-day Forecast Products. NCAR Computational and Information Systems Laboratory Research Data Archive, accessed 19 February 2016. [Available online at]

NOAA/NCDC, 2013: VIIRS Climate Raw Data Record (C-RDR) from Suomi NPP, version 1. Subset used: October 2007–September 2008, NOAA/National Climatic Data Center, accessed 14 October 2014, doi:10.7289/V57P8W90.

NOAA/NCEP, 2000: NCEP FNL Operational Model Global Tropospheric Analyses, continuing from July 1999 (updated daily). NCAR Computational and Information Systems Laboratory Research Data Archive, accessed 14 October 2014, doi:10.5065/D6M043C6.

NOAA/OSPO, 2014: GHRSST Level 2P Global Skin Sea Surface Temperature from the Visible Infrared Imaging Radiometer Suite (VIIRS) on the Suomi NPP satellite created by the NOAA Advanced Clear-Sky Processor for Ocean (ACSPO). Version 2.3. PO.DAAC, accessed 11 February 2016. [Available online at]

OWLES, 2014: NCEP Global Forecast System Model. Subset used: 0600:00 UTC 29  January 2014, University Corporation for Atmospheric Research Earth Observing Laboratory, accessed 10 February 2016. [Available online at]

Perovich, D. K., J. A. Richter-Menge, B. Elder, T. Arbetter, K. Claffey, and C. Polashenski, 2013:  Mass Balance: Buoy 2015F. Subset used: August 13, 2015–present, Cold Regions Research and Engineering Laboratory, accessed 10 February 2016. [Available online at]

Saha, S., and Coauthors, 2010: NCEP Climate Forecast System Reanalysis (CFSR) Selected Hourly Time-Series Products, January 1979 to December 2010. NCAR Computational and Information Systems Laboratory Research Data Archive, accessed 11 February 2016, doi:10.5065/D6513W89.

Water and Atmospheric Resources Monitoring Program, 1998: Illinois Climate Network. Prairie Research Institute Illinois State Water Survey, accessed 14 October 2014, doi:10.13012/J8MW2F2Q.