Understanding how CMOR3 works

CMOR3 writes netcdf files to disk following a strict set of rules. Some of these are hardcoded in the code itself but most requirements are defined by a Controlled Vocabulary (CV) file. As CMOR was developed for CMIP6 at first the only available CV file was CMIP6_CV.json, more CV files based on other projects are now also available.

A CV file is composed of two parts:
  • a list of required attributes

  • controlled vocabularies to define valid values for the attributes (optional)

Not all the attributes need to have predefined values, this depends on the conventions to apply.

Note

A generic CV file ACDD_CV.json with a minimum number of required attributes based on the ACDD conventions. The ACDD is used with CF conventions by NCI when publishing data.

The CV file defines the conventions to use, the CMOR tables (also json files) are used to define the variables to produce. Finally, the experiment configuration file lists the required and optional attributes defined for a specific simulation; these will be used to create the netcdf global attributes.

CMOR tables

The CMOR tables are lists of variables definitions including their names and attributes. For CMIP6 each table is a combination of realm, frequency and vertical levels. For example, Omon is ocean monthly data, 6hrPtlev are 6 hourly data on pressure levels etc.

Each variable as a specific cmor-name which is the key to the definition in the json file, this can be different from the actual variable name used in the file. In this way it is possible to define, for example, two tas variables with different frequency in the same table. The cmor-name and cmor-table are the fields used in the mappings table to identify which variable definition to apply.

A CMOR table is provided as a json file with two main keys:
  • header

  • variable_entry

The variable_entry is a list of dictionaries each representing a variable with cmor-name as key and a dictionary of values to represent the variable attributes.

CMOR table example
{
    "Header": {
        "data_specs_version": "01.00.33",
        "cmor_version": "3.5",
        "table_id": "Table SImon",
        "realm": "seaIce",
        "table_date": "18 November 2020",
        "missing_value": "1e20",
        "int_missing_value": "-999",
        "product": "model-output",
        "approx_interval": "30.00000",
        "generic_levels": "",
        "mip_era": "CMIP6",
        "Conventions": "CF-1.7 CMIP-6.2"
    },
    "variable_entry": {
        "sfdsi": {
            "frequency": "mon",
            "modeling_realm": "seaIce",
            "standard_name": "downward_sea_ice_basal_salt_flux",
            "units": "kg m-2 s-1",
            "cell_methods": "area: time: mean where sea_ice (comment: mask=siconc)",
            "cell_measures": "area: areacello",
            "long_name": "Downward Sea Ice Basal Salt Flux",
            "comment": "This field is physical, and it arises since sea ice has a nonzero salt content, so it exchanges salt with the liquid ocean upon melting and freezing.",
            "dimensions": "longitude latitude time",
            "out_name": "sfdsi",
            "type": "real",
            "positive": "down",
            "valid_min": "",
            "valid_max": "",
            "ok_min_mean_abs": "",
            "ok_max_mean_abs": ""
        },
        "siage": {
            "frequency": "mon",
            "modeling_realm": "seaIce",
            "standard_name": "age_of_sea_ice",
            "units": "s",
            "cell_methods": "area: time: mean where sea_ice (comment: mask=siconc)",
            "cell_measures": "area: areacello",
            "long_name": "Age of Sea Ice",
            "comment": "Age of sea ice",
            "dimensions": "longitude latitude time",
            "out_name": "siage",
            "type": "real",
            "positive": "",
            "valid_min": "",
            "valid_max": "",
            "ok_min_mean_abs": "",
            "ok_max_mean_abs": ""
        }
    }
}
Definitions of coordinates, grids and formula terms are stored in separate tables. See:

We included all the original CMIP6 tables and a few custom ones in the repository data in src/data/cmor_tables/. There are custom tables for CM2 variables not yet included in the CMIP6 tables and tables for the AUS2200 AMIP runs configurations. The AUS2200 has a lot of output at higher frequencies and variables which aren’t covered by the original tables. Similarly, a user can define new tables if they want to post-process variables not yet included or if they want to adapt some of the available variable definitions. See New variable for more information.

Experiment input file

This provides user-supplied metadata and configuration directives used by CMOR, including which controlled vocabulary (CV), grids and coordinate definitions to use and values for the attributes describing the model and simulation.

We simplified this process so the user only has to pass one configuration file to control all the necessary inputs. The mop setup command will then create an experiment file as expected by CMOR based on this and the selected CV file. This is described in the Getting started section.

Example of experiment input file
{
    "Conventions": "CF-1.7, ACDD-1.3",
    "_AXIS_ENTRY_FILE": "ACDD_coordinate.json",
    "_FORMULA_VAR_FILE": "ACDD_formula_terms.json",
    "_control_vocabulary_file": "/scratch/v45/pxp581/MOPPER_output/cy286/CMIP6_CV.json",
    "calendar": "proleptic_gregorian",
    "contact": "sam.green@unsw.edu.au",
    "creator_email": "s.mckenna@unsw.edu.au",
    "creator_name": "Sebastion McKenna",
    "date_created": "2023-02-01",
    "experiment": "cy286",
    "experiment_id": "cy286",
    "frequency": "",
    "frequencyvariable_id": "",
    "geospatial_lat_max": -6.83,
    "geospatial_lat_min": -48.79,
    "geospatial_lon_max": 158.98,
    "geospatial_lon_min": 107.52,
    "institution": "University of New South Wales",
    "keywords": "Climate change processes, Adverse weather events, Cloud physics",
    "license": "https://creativecommons.org/licenses/by/4.0/",
    "outpath": "/scratch/v45/pxp581/MOPPER_output/cy286",
    "output_file_template": "<variable_id><source_id><experiment_id><frequency>",
    "output_path_template": "<product_version><frequency>",
    "product_version": "v1.0",
    "source": "ACCESS-CM2 (2019): aerosol: UKCA-GLOMAP-mode, atmos: MetUM-HadGEM3-GA7.1 (N96; 192 x 144 longitude/latitude; 85 levels; top level 85 km), atmosChem: none, land: CABLE2.5, landIce: none, ocean: ACCESS-OM2 (GFDL-MOM5, tripolar primarily 1deg; 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m), ocnBgchem: none, seaIce: CICE5.1.2 (same grid as ocean)",
    "source_id": "ACCESS-CM2",
    "time_coverage_end": "1010-01-01",
    "time_coverage_start": "0951-01-01",
    "title": "CM2 Seb"
}

Troubleshooting

CMOR can fail with a segmentation fault error because a required attribute is missing. This can be hard to diagnose as your job might hang or crush without an error message. We took as much care as possible so that mopper would create CMOR compliant tables and configuration files, however we cannot fix this issue currently as there’s no way to propagate the error from the CMOR C program to the python interface.

Warning

If you get the following warning in your cmor_log: Warning: while closing variable 0 (htovgyre, table Omon) ! we noticed you wrote 0 time steps for the variable, ! but its time axis 0 (time) has 2 time steps It can usually be safely ignored, see the relevant github issue

AUS2200 version

The AUS2200 configuration outputs some variables at 10 minutes frequency. To limit the amount of storage needed for these, the 4D variables were saved on only one model level (or as a reduction over all levels). Consequently, most of the 10 minutes variables are using the original 4D variable UM codes but are representing a different physical quantity.

While we have created correct mappings for these variables at all different frequencies available, mopdb template output will match some of them to both the correct and an incorrect mapping, as the tool can’t distinguish between different uses of a UM code in the same version. It’s up to the user to check for duplicates and select the relevant one.