Machine Learning in Python for Environmental Science Problems

The AMS Machine Learning in Python for Environmental Science Problems Short Course is an introductory course for researchers interested in learning about how methods from machine learning and data science can be applied to environmental research questions. This course will give participants an opportunity to interact with real world data and develop ML pipelines in python using Jupyter notebooks.

This course will have both a beginner track and an intermediate track. The beginner track will provide an introduction to machine learning, and will cover topics such as supervised and unsupervised machine learning, with an introduction to deep learning. Participants will be guided through example code and notebooks to try out machine learning methods for themselves. Participants will become familiar with the ML pipeline, starting from an investigation of the dataset and its features. Then, participants will learn how to configure and train models for tabular and image based datasets. Finally, participants will learn techniques for evaluating and comparing models to select the one that fits their needs.

The intermediate track will assume that the participant is familiar with the basic ML pipeline and has some experience with model development. Topics will include automated hyperparameter tuning and statistical testing of model performance. Environmental science modelers are increasingly using explainable AI techniques to investigate their models for debugging and to see if they can extract scientific insight from what the model has learned. However, there are many pitfalls in doing so, especially with complex models. In this course, participants will learn about some of the pitfalls that can cause explainable XAI methods to give misleading explanations, and some techniques to mitigate these issues.

January 28, 2024 at 8:00 AM - 3:45 PM Eastern Time (Hybrid) - Baltimore Convention Center

REGISTRATION RATES

Course Description:

Course participants will learn:

All stages of the ML pipeline including data pre-processes, training ML models, and evaluation of the results.
A comparison of several popular ML architectures. Students will be able to select the architectures (including loss functions, etc) that are appropriate for their modeling task.
An understanding of data bias and potential social impact from forecasts that use biased data.
Strategies for handling extremely imbalanced datasets.
Techniques for working with arbitrary raster data including dilated convolution and attention mechanisms. Also, more sophisticated metrics that take into account the spatial properties of the data when computing model performance.
Techniques to help with developing trustworthy models including both interpretable modeling techniques and post-hoc explanation methods.

VIEW AGENDA

If you have questions regarding the course, please contact Kara Lamb or Evan Krell.

Instructors:

Kara Lamb

Columbia University

Evan Krell

Texas A&M University - Corpus Christi

Hamid Kamangir

UC Davis

Maria Molina

University Maryland