MetPy and Machine Learning: Unit Handling and Metadata Management Throughout the Machine Learning Lifecycle

Many researchers and professionals today are using various machine learning workflows that require raw arrays of data (without any attached metadata). This course provides workflows and best practices to use MetPy before and after ML workflows to create and maintain units, attributes, and other metadata for the full life cycle of ML workflows and analysis. This workshop will also have a short discussion period for ‘CF Conventions for ML’.

105th AMS Annual Meeting
New Orleans Ernest N. Morial Convention Center
January 11, 2025 at 8:00 AM - 5:00pm & January 12, 2025 at 8:00 AM - 3:30 PM Central Time (In Person)

Registration for this course will open in October.

Course Description:

Units, attributes, and metadata (CF Conventions and more) are important for all meteorological and atmospheric data analysis. This information is not maintained once put through standard machine learning frameworks in python (scikit-learn, Keras, Tensorflow, PyTorch). This course will explore how to best handle this metadata before and after machine learning workflows, and also cover what metadata might be helpful to attach to ML outputs eg. ‘CF Conventions for ML’. This course will also showcase some of the plotting improvements in MetPy with using ML outputs.

Participants will:

  • Learn and apply best practices for metadata, units, and other attribute handling for full cycle machine learning workflows using MetPy.
  • Explore Python plotting techniques using the latest improvements to MetPy.
  • Troubleshoot common errors for attaching units and working with Xarray.
  • Discuss ‘CF for Machine Learning’, and what attributes to attach for reproducibility and interpretability.

This course is not designed to be an introduction to ML methods, or teach which ML model is best suited for specific data or research questions. Previous experience with the scientific python ecosystem for Atmospheric Science (e.g. MetPy, Xarray, and a ML/numerical modeling package of choice) is highly encouraged. This course is not designed to be an introduction to any of those topics. If you are completely unfamiliar with Python and Xarray, it is not recommended to register for the course.

VIEW AGENDA

If you have questions regarding the course, please contact Thomas Martin.

Instructors:

Drew Camron

NSF Unidata

Max Grover

Argonne National Laboratory

Ryan May

NSF Unidata

Thomas Martin

NSF Unidata