The guidance presented here is designed to help authors make the data, software, and documentation supporting the research presented in AMS journals as open and accessible as possible to readers and users, in accordance with the FAIR (Findable, Accessible, Interoperable, and Reusable) Principles (Wilkinson et al. 2016). The guidance stems from the updated 2019 AMS policy statement “Full, Open, and Timely Access to Data” and the 2021 professional guidance statement on “Software Preservation, Stewardship, and Reuse” (and the references therein), as well as the existing best practices policy statement “Best Practices for Data Management”. Our data and software policies are designed to be flexible enough so that no author should be excluded from submitting to our journals, especially due to resource limitations. In general, data and software should be as open as possible, and as closed as necessary. Please see the Background page for more information.
- Requirements for Authors
- About Archiving Data and Software
- Referencing and Citing Data and Software
- Availability Statement
1. Requirements for Authors
While AMS is committed to the FAIR principles cited above and on the Background page, our data and software policies are designed to be flexible enough so that no author should be excluded from submitting to our journals, especially due to resource limitations. Special circumstances should be discussed with the journal Editor, and explained in the Availability Statement section of the manuscript. Where appropriate, authors should also consider relevant ethical principles concerning data, such as the CARE principles (Collective benefit, Authority to control, Responsibility, Ethics) for indigenous data governance (Carroll et al. 2020).
Requirements for authors include the following:
- Confirm during initial submission that they are aware of the AMS data and software policies, including the expectation that datasets and software used or derived in the reported work are archived and cited/referenced properly.
- Archive core research outputs (data, software, samples, etc.) in valid FAIR-aligned repositories, if possible. This includes the assignment and use of persistent identifiers such as DOIs for as much of the relevant archived data, software, and documentation as possible. Please see the Data and Software Archiving Guidance section below for more information about identifying valid repositories. Note that article supplemental material should no longer be used as a primary archive for data and software, and “data and software available from the corresponding author” statements are to be avoided.
- In cases where archiving is not possible, the Availability Statement should describe the reasons why, and what resources are available for other researchers to understand how the research being reported on was conducted.
- Include an Availability Statement section in the submitted manuscript immediately following the Acknowledgments section. The Availability Statement should describe where the data and software underlying the findings for the article are archived and documented, and how they can be accessed and reused. See the Availability Statement Examples section for more information.
- Include properly formatted citations to the deposited data and software mentioned in the Availability Statement in the Reference section. Formal citations are important for attributing proper credit for data and software creation and use. See the Data and Software Reference and Citation Examples section for more information.
Peer review editors/reviewers are asked to check that the above requirements have been adequately addressed by the inclusion of relevant citations, identifiers and links, and/or other information provided in the Availability Statement. More information is available in the Reviewer Guidelines for AMS Journals" information web page.
As laid out in the AMS data policy statement, the spectrum of what constitutes “data” is diverse and includes in situ and remotely sensed observations, environmental simulations or predictions generated by numerical models, and data products derived from integrations of observational and model-generated sources. Associated software and computer model codes should also be archived if possible. For more information please see the Models and Simulations section below (section 2c). Questions regarding these data policy guidelines should be directed to [email protected].
2. About Archiving Data and Software
In accordance with AMS’s commitment “to promoting full, open, and timely access to the environmental data, associated metadata, and derived data products that underlie scientific findings”, authors should make every attempt to archive core research outputs (data, software, etc.) to FAIR-aligned repositories, if possible and appropriate, following the Enabling FAIR Data principles. In general, supplemental material should no longer be used as a primary archive for data and software. Also, “data and software available from the corresponding author” or “data available upon request” statements are to be avoided. These statements have been shown to be inefficient (Tedersoo et al. 2021), and do not support the sharing of data and software that are important for open science.
a. Data and Software Archiving Guidance
- Funder requirements. Funders may have specific repositories that must be used as a condition of the grant under which the work was performed.
- Domain repositories. A repository that specializes in data for your scientific domain is highly recommended as this will maximize the probability that the deposited data will be findable, accessible, interoperable, reusable, and well-supported.
- Institutional repositories. Many universities and research organizations provide local data management support, usually through the institutional library.
- High performance computing centers. In research using computer models and simulations that involve generating and/or analyzing high volume data, the operations team at the center may have options and recommendations for data management, storage, and preservation.
- General repositories such as Figshare, Dryad, Zenodo, Mendeley Data, and Open Science Framework may be used if a domain repository or other repositories above are unavailable or inappropriate. This generalist repository comparison chart will help researchers in identifying an appropriate generalist repository.
If none of the above options are available or appropriate, authors must provide a transparent and equitable process for accessing the supporting data.
A new tool that is available to assist researchers in finding an appropriate repository is https://commons.datacite.org/. It is built on top of the previous Repository Finder tool, originally developed by DataCite for the Enabling FAIR Data project. This tool uses the content of re3data.org, a registry of repositories, to allow authors to search by topic and lists repositories that currently are accepting data to support publication, including those that are certified and support the FAIR principles. More information about the Enabling FAIR Data guidelines is available at the project FAQ page.
Software—As per the AMS Software Preservation, Stewardship, and Reuse policy guidance statement, no matter how closed or protected the software is, the following recommendations apply:
- Use a collaborative software development platform (e.g., GitHub or Bitbucket) to manage software code changes, and support public access capabilities when possible.
- Assign a clear version number to your software.
- Ensure that your software and its documentation is archivally preserved in a trusted repository using persistent identifiers such as DOIs (e.g., Zenodo or Figshare) that will facilitate sharing of the version used to support your research outcomes, or link to an archived snapshot of your software (e.g., by Software Heritage).
- Intellectual property (IP) concerns and restrictions exist for software in both public and private sectors. Assign a license that describes terms of software reuse and access. Check with your institution and/or sponsor for guidance on choosing an appropriate software license.
- If software cannot be publicly shared due to IP and/or licensing considerations, include a reference to a publication that describes the underlying logic and methods of software source code, if possible.
AMS strongly prefers research data and software to be made available under open licenses that permit unrestricted and free reuse. AMS does not require transfer of copyright for research data or software.
While embargoes on data and software sharing are generally discouraged, there may be situations where they can be appropriately applied. These situations must be discussed with and approved by the Editor and included in the Availability Statement.
b. Social Science Data and Data Restrictions
AMS recognizes that social science data and other data involving humans may be subject to restrictions, up to and including being unavailable. Authors must comply with applicable institutional review board and funding agency policies and regulations when collecting human subject data.
Authors using data that are subject to restricted access or that are unavailable, such as those with proprietary or other legal restrictions, should provide an explanation to the journal editor at initial submission and in the Availability Statement section of the manuscript immediately following the Acknowledgements section. More information about availability statements can be found in the section below and on the Availability Statement Examples page.
c. Models and Simulations
For authors of articles that involve the use of dynamical models for simulations or predictions that produce large volumes of output data, there may be limited resources available for preserving and sharing these outputs. Because of this, authors may only be able to preserve and share a lesser volume of selected model outputs. A recommended resource for deciding on what model outputs (e.g., simulation workflow outputs) to preserve and share is the rubric and guidance developed by the EarthCube Research Coordination Network (RCN) project "What About Model Data?" Determining Best Practices for Preservation and Replicability, and discussed in associated resources such as Mullendore et al. (2021) and Schuster et al. (2022).
Additionally, authors should preserve, share, and cite all elements of their simulation workflow, including the initialization and forcing data, preprocessing code, simulation configuration and codes, and post processing and data analysis codes, to support research replicability. Ideally authors should preserve and share these workflow components through community accepted, trusted repositories that provide persistent identifiers (e.g., DOIs) and metadata that describe workflow components using community standards. In some cases, selected components of the simulation workflow, including model codes and initialization and forcing data may be provided by third parties. Please cite those third-party resources in such cases. If a published paper has a complete description of a model or complementary software used in simulation research, please cite that also. Citations should accurately capture the authors/creators of a model and complementary software.
3. Referencing and Citing Data and Software
Authors should cite and link to the data and software in the article, following the general guidelines below, which are derived from FORCE11’s Joint Declaration of Data Citation Principles and the Earth Science Information Partners (ESIP) Guidelines, using the unique, resolvable, and persistent identifiers provided by the repository in which the data are archived. [FORCE11 (The Future of Research Communications and e-Scholarship) is an international coalition of researchers, librarians, publishers, and research funders working to reform or enhance the research publishing and communication system. ESIP is a community of data and information technology practitioners that come together to coordinate Earth science interoperability efforts.]
Specific examples can be viewed on the Data and Software Reference and Citation Examples page.
Citations should appear in the body of the article with a corresponding reference in the reference list. Citations should include persistent identifiers in well-formed references to data and software, so they can be accurately tracked. Also, citations should include software used in the research following the FORCE11 Software Citation Principles, which recommends a similar depositing of the software in an archival repository, and citation/references that include the persistent identifiers provided by the repository.
Specific citation guidance can also be found in Katz et al. (2021).
The following paragraphs below can be applied to data and software.
Citing dataset and software references in text
The in-text citations for dataset and software references should be formatted the same as other publication types, using the author’s name and year of publication [e.g., “dataset produced by Smith (2018),” or “as shown by an earlier dataset (Smith 2018)”].
When authors consist of organizations with lengthy names, abbreviate the author names appropriately [e.g., use “(NCEP 2005)” instead of “(National Centers for Environmental Prediction 2005)”]. If the citation is for a reference with two authors, use both author names [e.g., “Yeager and Large (2008)”]. References with three or more authors are always cited as the first author’s name followed by “et al.” [e.g., “Lawrimore et al. (2011)”].
Unpublished or inaccessible data and software
Data and software that are not curated or cannot be reliably made available should not be included in the references and should be cited directly in the text as “unpublished data/software,” giving the names of the person(s) who provided the data or software and the year in which it was provided. If the unpublished data and software are the authors’ own, the authors’ names should be listed with the year of dataset/software creation: J. Weatherly (2017, unpublished data/software); (J. Weatherly 2017, unpublished data/software).
Citing processed/derived data
Findings presented in scientific articles are often the product of multistage workflows that involve combining, extracting, processing, and deriving datasets. Information generated by numerical simulation models should also be regarded as derived data. In these cases, citations should be to any dataset(s) from an external source, and, if possible, to the final derived dataset(s), if they are archived in a FAIR-aligned, community repository. The goal is to provide transparency and traceability for the results of computational processes and models. In some cases, it may be more appropriate to provide citation and access to processing or model software and configurations than to the output data themselves as described in the Models and Simulations section. Questions about these cases should be discussed with the journal editors.
Citing papers that describe a dataset/software vs citing a dataset/software
Avoid citing only the published paper that describes a dataset/software or presents findings that are based on a dataset/software. Such papers may not link directly to the dataset/software and/or might be out of synchronization with the dataset/software, particularly when a dataset/software is updated or revised. It is best practice to cite both the descriptive paper and the dataset/software: the paper citation links to an important (but incomplete) source of information about the dataset/software, and the data/software citation links directly to the dataset/software and associated metadata.
Acknowledging a dataset/software vs citing a dataset/software
The acknowledgments section may include a brief statement if explicitly requested by the dataset/software provider, but the author must also create a formal citation to ensure the inclusion of detailed information about the dataset/software. If authors are unsure of the details needed for a dataset/software citation, contact the dataset/software provider for the specific information.
a. Data and Software Reference and Citation Examples
Refer to the AMS Data and Software Reference and Citation Examples page for specific examples of how to reference and cite data according to AMS style. References should include as much of the following information as possible: Dataset or software authors/producers, release date; title; version; archive/distributor, and the locator/identifier (persistent identifier such as DOI preferred), and year.
4. Availability Statement
Authors are expected to include a separate Availability Statement section in their manuscript immediately following the Acknowledgments section that contains information and details about data, software, and other research objects (e.g, notebooks) from the reported work, and how they can be accessed (listing specific restrictions, if any).
The Statement should include:
- A brief description of the types of data or software
- Repository name(s) where they are deposited
- Version (of software)
- DOI, persistent identifier link to data or software
- Link to publicly accessible collaborative development platform (in the case of software; e.g., GitHub)
- Access conditions
- Licensing/permissions (e.g. Creative Commons attribution)
- In-text citation in References, if possible
In special cases, where access to the above research objects is restricted, authors are required to mention these restrictions in the Availability Statement. In short, authors should provide unrestricted access to all data, software, and materials underlying reported findings for which ethical or legal constraints do not apply, to the greatest extent possible.
In other cases where the data or model output cannot be archived (due to their size or nature), the Availability Statement should point to documentation and other resources that are available so that a transparent roadmap for how to replicate the work is presented.
a. Availability Statement Examples
See the Availability Statement Examples page for more details.