Data Archiving and Citation

(updated 1 August 2017)

Please refer to the AMS Dataset References page for guidelines and examples of how to reference and cite data according to AMS style.

1. Introduction

In a 2013 policy statement, the American Meteorological Society (AMS) affirmed its commitment to promoting full, open, and timely access to environmental data, associated metadata, and derived data products within the Earth system science community. As part of this commitment, "AMS expects that all scholarly papers published in its journals contain sufficiently detailed references to publicly available sources of information (literature and data) and methods such that independent research can test the paper’s scientific conclusions."

This expectation assumes that the data and metadata upon which the conclusions rest are properly cited and are readily available to the scientific community and the general public. At the time of submission, authors will be queried to acknowledge that their datasets are archived and cited/referenced properly, and peer-review editors are encouraged to check whether this AMS expectation is being met.

In January 2015, AMS became a signatory to the statement of commitment developed by the Coalition on Publishing Data in the Earth and Space Sciences (COPDESS). This group connects Earth and space science publishers and data facilities to help translate the aspirations of open, available, and useful data from policy into practice. In addition to developing a statement of commitment, now signed by most leading publishers and repositories, COPDESS provides a directory of repositories for publishers and recommended best practices around data and identifiers. These best practices have been incorporated into these guidelines as much as possible.

2. Archiving data

AMS strongly encourages authors to archive their data in an established repository that follows best practices and can ensure the longevity and continued utility of datasets. Lists of repositories in the Earth and Space Sciences are available at COPDESS and the Registry of Research Data Repositories (re3data). The American Geophysical Union (AGU) also maintains a list of suggested data centers for geoscience data. Datasets provide the evidence used to justify the conclusions of the publication. Data may be generated by various means, including observation, computation, or experiment. AMS discourages the archiving of data on personal servers and websites because of their lack of permanence.

Data should be archived if they underlie the scientific findings in a paper. If data centers are not available or appropriate for particular datasets, authors are encouraged to investigate other data archiving options, including their local institutional library. AMS journals also allow data to be published as supplements to articles, although this option should be considered only after data centers and university libraries are investigated. If none of these options are available, authors are expected to provide a transparent process to make the data available to anyone upon request.

AMS recognizes that in some instances sharing data may not be feasible. For example, studies collecting sensitive data about human subjects should be shared only if stringent safeguards exist to ensure confidentiality and protect the identity of subjects. Authors are expected to be compliant with applicable institutional review board and funding agency policies and regulations when collecting human subject data. Any other limitations or restrictions on sharing data, such as proprietary or other legal restrictions, must be reported to the journal editor for consideration at the time of submission.

3. Citing data

Please refer to the AMS Dataset References page for guidelines and examples of how to reference and cite data according to AMS style.

AMS journals endorse the Joint Declaration of Data Citation Principles. Citations to datasets should be provided for any datasets used to produce a scientific paper. Data citations should be listed in the reference section of the associated article and enable readers to identify and find the dataset(s) related to a given publication. The citation should adhere to emerging practices and include as much of the following information as possible: Dataset or software authors/producers, release date; title; version; archive/distributor, and the locator/identifier (doi preferred, or URL), and year. Examples of citations are provided below.

Data sets or software should not be formally referenced or listed in the acknowledgments section. Rather, the acknowledgements should include a general statement indicating where the data are available and any issues regarding availability (e.g., all the data used are listed in the references or archived in xxx repositories). The Federation of Earth Science Information Partners (ESIP) has produced data citation recommendations for the Earth system sciences. Please refer to their website for general guidance, examples, and information on each individual citation element listed below.

Please also check with your data providers, because many data providers request that data users cite their data in particular ways. Datasets that are not curated or cannot be reliably made available to anybody requesting data should not be cited in AMS publications but should be noted through an in-text reference to unpublished data, as noted below.

a. Citing processed/derived data

Findings presented in scientific articles are often the product of multistage workflows that involve combining, extracting, processing, and deriving datasets. Information generated by numerical simulation models should also be regarded as derived data. Data citations, in these cases, should be to any external datasets that were used from an external source, and, if possible, to the final derived dataset(s), if they are archived in a reliable location. The goal is to provide transparency and traceability for the results of computational processes and models. For particular cases, it may be more appropriate to provide citation and access to processing or model software than to the output data themselves. Questions about this should be discussed with the journal editors.

b. Citing papers that describe a dataset vs citing datasets themselves

The most common current “data citation” practice is to cite a published paper that describes a dataset or presents findings that are based on a dataset. The problems with this approach are 1) such papers may not link directly to the datasets themselves and 2) such papers might be out of synchronization with the datasets themselves, particularly as the datasets may be updated or revised over time. The general recommendation is to cite both the paper and the datasets themselves, because the two citations accomplish different things. The paper citation links to an important (but incomplete) source of information about the dataset, and the data citation links directly to the datasets and associated metadata.

c. Acknowledging datasets vs citing datasets

Authors may have previously used the acknowledgement section to state the name and source of data used in a submitted paper. This is no longer considered to be the proper method of dataset citation, and authors are advised to cite datasets as outlined in this document. An acknowledgement may be included if explicitly requested by the dataset provider, but because acknowledgements often do not contain detailed information about the dataset, the recommendation is to also create a formal citation as outlined here. If authors are unsure of the details needed for a dataset citation, they are encouraged to contact the dataset provider for the specific information.