Climate Modeling and Analysis

''This page is about the intersection of climate science and machine learning. For an overview of climate science as a whole, please see the Wikipedia page on this topic. For an overview of physics-driven climate models, please see the Wikipedia page or this Carbon Brief Q&A.'' Climate models are physics-driven numerical models that consist of different components of the climate system (e.g., atmosphere, land, ocean, sea-ice, etc.) that are connected by different feedbacks and exchange of carbon, water and energy. Climate models can be of different complexity, ranging from simple zero or one-dimensional energy balance models to fully coupled comprehensive Earth system models or General Circulation Models (ESMs or GCMs, respectively). These models are able to simulate historical climate changes and are used in assessment reports of the Intergovernmental Panel on Climate Change (IPCC)  that provide policy-relevant information regarding the current state of the climate system and future climate projections under different emission scenarios. Climate models also help with asessing climate risks (see Policy, Markets, and Decision Science and Climate Change Adaptation).

Recent advancements in computational resources create opportunities for use of ML methods to aid climate models development, analysis of output, and make use of the vast amount of observational data.

Climate modelling and climate data analysis

 * Accelerating climate models: Small-scale processes cannot be directly represented in climate models (due to their coarse resolution), and thus often approximated as parametrisations, such as cloud parametrization, convection, aerosols, dynamic vegetation changes, among many other components of GCMs. Traditional solutions to representation of these processes in GCMs are computationally expensive. ML can help with emulating some of these sub-grid processes, such as vegetation changes , and clouds parametrization and convection.
 * Physically-constrained ML projections: Hybrid modelling, by incorporating physical-constraints into data-driven ML or deep learning models is a promising field of leveraging the large amounts of data available from observational products, while making use of physical constraints present in the climate system, to ensure robust projections and extrapolating well outside of the training data . The output from physics-driven GCM climate models can be used for a "perfect model test" of the ML models, before the ML model is applied to make projections based on the observations.
 * Climate model evaluation and narrowing down future climate projections: Climate models can be extremely complex, and involve interactions and feedbacks among different components of the climate system. The resulting climate predictions are often made using the outputs of 20+ different climate models, which leads to a wide spread of future climate projections. However, since some components are shared among some climate models, the multi-model mean response is not truly independent . ML can help identify and leverage relationships between variables within climate models, which, together with the observed climate changes (i.e., observational constraint) could narrow down the spread in the future climate projections.
 * Downscaling climate models: Climate models often are run on a coarser grid (for computational speed). Downscaling climate projections for smaller grids or specific regions is an important source of information for local impact assessments. ML and deep learning can be useful for interpolation and approximating the fine-scale regional responses based on such coarser climate model output,.


 * Data Assimilation: Assimilation of diverse observation-based data sources can improve climate models, and machine learning can transform raw sensor output into more relevant derived data. Relevant applications include sensor calibration and analyzing information in remote sensing data or assimilating climate model output with the observations . Well-curated benchmark datasets have the potential to advance several geoscience problems.
 * Filling in gaps in the observations: Historical record provides valuable information for evaluating the performance of climate models with respect to the observed changes. However, especially early historical observations are available only for sparse regions. ML can help with filling in the gaps in observations to provide a complete record for different climate variables, such as ocean carbon uptake or surface air temperature using neural networks, Kriging  , or Empirical Orthogonal Functions.
 * Detection and attribution of anthropogenic climate change: Separating forced signal (due to anthropogenic climate change) from the "noise" due to natural climate variability has been a challenging task, given only one realization of observational record . Large ensemble simulations, where a given climate model is run multiple times with different initial conditions but identical radiative forcing, are one way of separating the anthropogenic signal from the total response (that is a combination of the natural and anthropogenic signals). ML methods provide another avenue for addressing this signal-to-noise problem, to aid with detecting the anthropogenic signal and attributing it to a given forcing  . Statistical learning also allows detecting anthropogenic climate change from a single day.

Forecasting of seasonal variations and extreme events
For use of ML in weather forecasting, see weather prediction.
 * Seasonal forecasting: Seasonal variations, such as those due to El Niño/Southern Oscillation (ENSO) are difficult to predict using traditional methods. ML and deep learning can be useful for multi-year ENSO forecasting.
 * Extreme events predictions: Storms, droughts, fires, floods, and other extreme events are expected to become stronger and more frequent as climate change progresses. Machine learning can be used to refine what are otherwise coarse-grained forecasts (e.g., generated from climate or weather prediction models) of these extreme weather events. These high-resolution forecasts can guide improvements in system robustness and resilience.

Textbooks

 * Introduction to climate dynamics and climate modeling (2010) : A technical treatment of the climate system, energy balance, climate modeling, and climate perturbations. Available here.
 * Principles of Planetary Climate (2010) : An introduction to the physics of climate, with examples in python.

Other

 * Intergovernmental Panel on Climate Change (IPCC) Assessment Reports (e.g., AR5) and the IPCC Special Report Global Warming of 1.5 ºC
 * Carbon Brief explainers on How do climate models work, How well have climate models projected global warming, and The next generation of climate models (CMIP6).
 * Oxford Research Encyclopedia of Climate Science : A collection of articles on the climate systems, impacts of climate change, and the methods used in climate science.
 * Ted Talks about how climate models work and what they can be used for, by Dr Gavin Schmidt and Dr Kate Marvel.

Online Courses and Course Materials

 * An Introduction to Climate Modeling (2014) : A video lesson from Climate Literacy's Youtube channel.
 * A Climate Modelling course: A hands-on lectures notes and Python code by Prof. Biran E. J. Rose.
 * Advanced courses in climate date science: Research Computing in Earth Science, Introduction to Physical Oceanography, Geophysical Fluid Dynamics (with python code) by Prof. Ryan Abernathey.

Major conferences

 * American Geosciences Union (AGU) Fall Meeting: A yearly conference organised by the AGU, usually takes place in December, location varies across different states of the USA.
 * European Geosciences Union (EGU) General Assembly: A yearly conference organised by the EGU, usually takes place in April or Early May in Vienna, Austria.
 * International Meeting on Statistical Climatology (IMSC): Meetings occurring approximately every three years, usually around June or July, in different locations worldwide. IMSC is organised by statisticians, climatologists and atmospheric scientists aiming to transfer knowledge among different communities (e.g., see the previous meeting)
 * Climate Informatics (CI): annual workshop series, usually occurring in September in different locations worldwide (e.g., see the previous meetings).

Major journals
Applications of Machine Learning in different domains of climate science appear in various climate-related journals, such as:


 * Bulletin of the American Meteorological Society (BAMS): A journal published by the American Meteorological Society (AMS).
 * Earth System Dynamics (ESD): An open-access journal of the European Geophysical Union.
 * Environmental Research Letters (ERL): An open-access journal from IOPscience publishing group.
 * Environmental Data Science: A new open-access interdisciplinary journal.
 * Geophysical Research Letters (GRL): A journal of the American Geophysical Union.
 * Journal of Climate: A journal published by the Americal Meteorological Society (AMS).
 * Proceedings of the National Academy of Sciences (PNAS): A wide-reaching journal often featuring climate science.
 * Springer Nature journals: Often feature climate science topics, recently also with applications of machine learning -e.g., Nature Climate Change, Nature Geoscience, Nature Communications (open access), Communications Earth and Environment (open access), npj Climate and Atmospheric Science (open access).
 * Artificial Intelligence for the Earth Systems (AIES): A new journal focusing on AI published by the American Meteorological Society (AMS).

Major societies and organizations

 * American Geophysical Union (AGU): An organization supporting work across the geophysical sciences.
 * Climate Informatics (CI): An organization dedicated to computing in climate science.
 * European Geosciences Union (EGU): An organization supporting research in Earth, planetary, and space science in Europe.
 * Intergovernmental Panel on Climate Change (IPCC): A United Nations body that assesses climate change and provides policy-relevant information. IPCC provides Assessment Reports and Special Reports that provide a comprehensive summary of the state-of-the-art developments and findings.

Libraries and Tools

 * Pangeo: An open source python package for geoscience applications
 * Pangeo also maintains a list of packages useful for atmospheric, ocean, and climate science.
 * Climate model data is typically presented in netcdf4 format. These may be smoothly converted to csv files or pandas dataframes, but be aware that the data lies on irregular 3D spherical grids. Xarrayis a commonly used Python package for post-processing of climate data.
 * Climate Data Operators CDO and NCO are command-line tools that can be used in manipulating netcdf files and calculating climatologies.
 * AI2 Climate Modeling: The ai2cm climate modelling toolbox provides different Python wrappers to work with climate and weather models provided by the Allen Institute for AI.
 * Copernicus Climate Data Store: The EU Copernicus project provides an API and Toolbox to with different free data like raw satellite sensor data.

Data
The NCAR Climate Data Guide is a useful resource for learning about different datasets and where to find source data for different components of the climate system, including the atmosphere, ocean, and land-based climate indices, as well as observation-based product from satellite and reanalysis data sources.

The output from climate models includes various simulations, driven under historical and different future emission scenarios. Some simulations are idealised (e.g., in response to CO2-only forcing), to aid with inter-comparison of models, or focus on comparing specific components of the climate system (e.g. land or ocean carbon uptake). The Coupled Model Intercomparison Project (CMIP) is an international exercise among different modelling centres to compare the output of climate models for a given set of scenarios and simulations. Other climate model simulations may include with varied physics (e.g., perturbed ensemble simulations), or different initial conditions (often referred to as large ensemble simulations), where the differences in output for each model arise due to natural climate variability. These different types of simulations are used to explore a range of different future climate projections and quantify different sources of uncertainties.

Key resources for accessing climate and weather data:


 * The Coupled Model Intercomparison Project (CMIP): A gateway to climate models in use and development, available here. CMIP is associated with the Earth System Grid Federation, which also provides data analysis tools and tutorials: https://esgf.llnl.gov/.
 * Large ensemle simulations for different climate models, also referred to as Single Model Initial Condition Ensemble SMILEs, including The Community Earth System Model (CESM) Large Ensemble Project
 * Climate and weather datasets for ML research are listed here.
 * The Earth and climate science community is also working to create benchmark datasets
 * Google Cloud Weather and Climate Datasets: Petabyte-scale weather and climate datasets from sources like NOAA’s NEXRAD and NASA/USGS’s Landsat, made available for free as part of Google Cloud’s Public Datasets Program.
 * EARTHDATA: NASA's gateway to Earth Science data. Data are available at multiple levels of processing.