Climate and Forecast Conventions version 1.12-draft has no DOI yet: 10.5281/zenodo.FFFFFF
This document is dedicated to the public domain following the Creative Commons Zero v1.0 Universal Deed.
The Climate and Forecasting Conventions website https://cfconventions.org/ contains additional resources and provides further information.
DON’T use the following reference to cite this version of the document, as it is only shown as a draft:
Eaton, B., Gregory, J., Drach, B., Taylor, K., Hankin, S. et al. (2024). NetCDF Climate and Forecast (CF) Metadata Conventions (1.12-draft). CF Community. https://doi.org/10.5281/zenodo.FFFFFF
- About the authors
- Abstract
- 1. Introduction
- 2. NetCDF Files and Components
- 3. Description of the Data
- 4. Coordinate Types
- 5. Coordinate Systems and Domain
- 5.1. Independent Latitude, Longitude, Vertical, and Time Axes
- 5.2. Two-Dimensional Latitude, Longitude, Coordinate Variables
- 5.3. Reduced Horizontal Grid
- 5.4. Timeseries of Station Data
- 5.5. Trajectories
- 5.6. Horizontal Coordinate Reference Systems, Grid Mappings, and Projections
- 5.7. Scalar Coordinate Variables
- 5.8. Domain Variables
- 5.9. Mesh Topology Variables
- 6. Labels and Alternative Coordinates
- 7. Data Representative of Cells
- 8. Reduction of Dataset Size
- 8.1. Packed Data
- 8.2. Lossless Compression by Gathering
- 8.3. Lossy Compression by Coordinate Subsampling
- 8.3.1. Tie Points and Interpolation Subareas
- 8.3.2. Coordinate Interpolation Attribute
- 8.3.3. Interpolation Variable
- 8.3.4. Subsampled, Interpolated and Non-Interpolated Dimensions
- 8.3.5. Tie Point Mapping Attribute
- 8.3.6. Tie Point Dimension Mapping
- 8.3.7. Tie Point Index Mapping
- 8.3.8. Interpolation Parameters
- 8.3.9. Interpolation of Cell Boundaries
- 8.3.10. Interpolation Method Implementation
- 8.4. Lossy Compression via Quantization
- 9. Discrete Sampling Geometries
- Appendix A: Attributes
- Appendix B: Standard Name Table Format
- Appendix C: Standard Name Modifiers
- Appendix D: Parametric Vertical Coordinates
- Atmosphere natural log pressure coordinate
- Atmosphere sigma coordinate
- Atmosphere hybrid sigma pressure coordinate
- Atmosphere hybrid height coordinate
- Atmosphere smooth level vertical (SLEVE) coordinate
- Ocean sigma coordinate
- Ocean s-coordinate
- Ocean s-coordinate, generic form 1
- Ocean s-coordinate, generic form 2
- Ocean sigma over z coordinate
- Ocean double sigma coordinate
- Appendix E: Cell Methods
- Appendix F: Grid Mappings
- Appendix G: Revision History
- Appendix H: Annotated Examples of Discrete Geometries
- H.1. Point Data
- H.2. Time Series Data
- H.2.1. Orthogonal multidimensional array representation of time series
- H.2.2. Incomplete multidimensional array representation of time series
- H.2.3. Single time series, including deviations from a nominal fixed spatial location
- H.2.4. Contiguous ragged array representation of time series
- H.2.5. Indexed ragged array representation of time series
- H.3. Profile Data
- H.4. Trajectory Data
- H.5. Time Series of Profiles
- H.6. Trajectory of Profiles
- Appendix I: The CF data model
- Introduction
- Design criteria of the CF data model
- Elements of CF-netCDF
- The CF data model
- Field construct
- Domain construct
- Domain axis construct and the data array
- Coordinates: dimension coordinate and auxiliary constructs
- Coordinate reference construct
- Domain ancillary construct
- Cell measure construct
- Domain topology construct
- Cell connectivity construct
- Field ancillary constructs
- Cell method construct
- Appendix J: Coordinate Interpolation Methods
- Appendix K: Mesh Topologies
- Revision History
- Version 1.12 (04 December 2024)
- Version 1.11 (05 December 2023)
- Version 1.10 (31 August 2022)
- Version 1.9 (10 September 2021)
- Version 1.8 (11 February 2020)
- Version 1.7 (7 August 2017)
- Version 1.6 (5 December 2011)
- Version 1.5 (25 October 2010)
- Version 1.4 (27 February 2009)
- Version 1.3 (4 May 2008)
- Version 1.2 (4 May 2008)
- Version 1.1 (17 January 2008)
- Version 1.0 (28 October 2003)
- Bibliography
List of Tables
3.1. Prefixes for decimal multiples and submultiples of units
3.2. Flag Variable Bits (from Example)
3.3. Flag Variable Bit 2 and Bit 3 (from Example)
7.1. Dimensionality, description, and additional required attributes for geometry_types.
9.1. Logical structure and mandatory coordinates for discrete sampling geometry featureTypes
9.2. The storage of a data variable using the orthogonal multidimensional array representation (subscripts in CDL order)
9.3. The storage of data using the incomplete multidimensional array representation (subscripts in CDL order)
9.4. The storage of data using the contiguous ragged representation (subscripts in CDL order)
9.5. The storage of data using the indexed ragged representation (subscripts in CDL order)
A.1. Attributes
C.1. Standard Name Modifiers
D.1. Consistent sets of values for the standard_names of formula terms and the computed_standard_name
E.1. Cell Methods
F.1. Grid Mapping Attributes
I.1. The elements of the CF-netCDF conventions
I.2. The constructs of the CF data model
J.1. Conversions and formulas used in the definitions of subsampling interpolation methods
K.2. Mesh topology attributes
List of Figures
4.1. Figure 4.1
7.1. Figure 7.1
7.2. Figure 7.2
8.1. Figure 8.1
8.2. Figure 8.2
8.3. Figure 8.3
8.4. Figure 8.4
I.1. Figure I.1
I.2. Figure I.2
I.3. Figure I.3
I.4. Figure I.4
I.5. Figure I.5
J.1. Figure J.1
J.2. Figure J.2
J.3. Figure J.3
J.4. Figure J.4
J.5. Figure J.5
List of Examples
2.1. String Variable Representations
3.1. Use of units_metadata
to distinguish temperature quantities
3.2. Use of standard_name
3.3. Ancillary instrument data
3.4. Ancillary quality flag data
3.5. A flag variable, using flag_values
3.6. A flag variable, using flag_masks
3.7. A region variable, using flag_values
3.8. A flag variable, using flag_masks
and flag_values
4.1. Latitude axis
4.2. Longitude axis
4.3. Atmosphere sigma coordinate
4.4. Example of a time coordinate variable
4.5. Use of units_metadata
and calendar
to define the treatment of leap seconds
4.6. Perpetual time axis
4.7. Paleoclimate time axis
5.1. Independent coordinate variables
5.2. Two-dimensional coordinate variables
5.3. Reduced horizontal grid
5.6. Rotated pole grid
5.7. "Lambert conformal projection"
5.8. Latitude and longitude on a spherical Earth
5.9. Latitude and longitude on the WGS 1984 datum
5.10. British National Grid
5.11. Latitude and longitude on the WGS 1984 datum + CRS WKT
5.12. British National Grid + Newlyn Datum in CRS WKT format
5.13. British National Grid + Newlyn Datum + referenced WGS84 Geodetic in CRS WKT format
5.14. "Multiple forecasts from a single analysis"
5.15. A domain with independent coordinate variables.
5.16. A domain with a rotated pole grid and a scalar coordinate variable.
5.17. A domain containing cell areas for a spherical geodesic grid.
5.18. A domain with no explicit dimensions.
5.19. A domain containing a timeseries geometry.
5.20. A domain containing a timeseries of station data in the indexed ragged array representation.
5.21. A two-dimensional UGRID mesh topology variable
6.1. Northward heat transport in Atlantic Ocean
6.1.2. Taxon names and identifiers
6.2. Model level numbers
7.1. Cells on a time axis
7.2. Cells in a non-latitude-longitude horizontal grid
7.3. Specifying formula_terms
when a parametric coordinate variable has bounds.
7.4. Cell areas for a spherical geodesic grid
7.5. Methods applied to a timeseries
7.6. Surface air temperature variance
7.7. Mean surface temperature over land and sensible heat flux averaged separately over land and sea.
7.8. Thickness of sea-ice and snow on sea-ice averaged over sea area.
7.9. Climatological seasons
7.10. Decadal averages for January
7.11. Temperature for each hour of the average day
7.12. Extreme statistics and spell-lengths
7.13. Temperature for each hour of the typical climatological day
7.14. Monthly-maximum daily precipitation totals
7.15. Timeseries with geometry.
7.16. Polygons with holes
8.1. Horizontal compression of a three-dimensional array
8.2. Compression of a three-dimensional field
8.3. Two-dimensional tie point interpolation
8.4. One-dimensional tie point interpolation of two-dimensional domain.
8.5. Multiple interpolation variables with interpolation parameter attributes.
8.6. Combining a grid mapping and coordinate interpolation, with time as a non-interpolated dimension.
8.7. Interpolation of the 2D cell boundaries corresponding to Figure 8.4
8.8. Quantization performed by BitRound algorithm in libnetcdf
8.9. Quantization performed by Granular BitRound algorithm in NCO
B.1. A name table containing three entries
H.1. "Point data"
H.2. Timeseries with common element times in a time coordinate variable using the orthogonal multidimensional array representation.
H.3. Timeseries of station data in the incomplete multidimensional array representation.
H.4. A single timeseries.
H.5. A single timeseries with time-varying deviations from a nominal point spatial location
H.6. Timeseries of station data in the contiguous ragged array representation.
H.7. Timeseries of station data in the indexed ragged array representation.
H.8. "Atmospheric sounding profiles for a common set of vertical coordinates stored in the orthogonal multidimensional array representation."
H.9. Data from a single atmospheric sounding profile.
H.10. Atmospheric sounding profiles for a common set of vertical coordinates stored in the contiguous ragged array representation.
H.11. Atmospheric sounding profiles for a common set of vertical coordinates stored in the indexed ragged array representation.
H.12. Trajectories recording atmospheric composition in the incomplete multidimensional array representation.
H.13. A single trajectory recording atmospheric composition.
H.14. Trajectories recording atmospheric composition in the contiguous ragged array representation.
H.15. Trajectories recording atmospheric composition in the indexed ragged array representation.
H.16. Time series of atmospheric sounding profiles from a set of locations stored in a multidimensional array representation.
H.17. Time series of atmospheric sounding profiles from a set of locations stored in an orthogonal multidimensional array representation.
H.18. Time series of atmospheric sounding profiles from a single location stored in a multidimensional array representation.
H.19. Time series of atmospheric sounding profiles from a set of locations stored in a ragged array representation.
H.20. Time series of atmospheric sounding profiles along a set of trajectories stored in a multidimensional array representation.
H.21. Time series of atmospheric sounding profiles along a trajectory stored in a multidimensional array representation.
H.22. Time series of atmospheric sounding profiles along a set of trajectories stored in a ragged array representation.
I.1. A single CF-netCDF variable corresponding to two data model constructs.
About the authors
-
Brian Eaton, NCAR
-
Jonathan Gregory, University of Reading and UK Met Office Hadley Centre
-
Bob Drach, PCMDI, LLNL
-
Karl Taylor, PCMDI, LLNL
-
Steve Hankin, PMEL, NOAA
-
John Caron, UCAR
-
Rich Signell, USGS
-
Phil Bentley, UK Met Office Hadley Centre
-
Greg Rappa, MIT
-
Heinke Höck, DKRZ
-
Alison Pamment, BADC
-
Martin Juckes, BADC
-
Martin Raspaud, SMHI
-
Randy Horne, Excalibur Laboratories, Inc., Melbourne Beach Florida USA
-
Jon Blower, University of Reading
-
Timothy Whiteaker, University of Texas
-
David Blodgett, USGS
-
Charlie Zender, University of California, Irvine
-
Daniel Lee, EUMETSAT
-
David Hassell, NCAS and University of Reading
-
Alan D. Snow, Corteva Agriscience
-
Tobias Kölling, MPIM
-
Dave Allured, CIRES/University of Colorado/NOAA/PSL
-
Aleksandar Jelenak, HDF Group
-
Anders Meier Soerensen, EUMETSAT
-
Lucile Gaultier, OceanDataLab
-
Sylvain Herlédan, OceanDataLab
-
Fernando Manzano, Puertos del Estado
-
Lars Bärring, SMHI
-
Christopher Barker, NOAA
-
Sadie Bartholomew, NCAS and University of Reading
Many others have contributed to the development of CF through their participation in discussions about proposed changes.
Abstract
This document describes the CF conventions for climate and forecast metadata designed to promote the processing and sharing of files created with the netCDF Application Programmer Interface [NetCDF]. The conventions define metadata that provide a definitive description of what the data in each variable represents, and of the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities.
The CF conventions generalize and extend the COARDS conventions [COARDS]. The extensions include metadata that provides a precise definition of each variable via specification of a standard name, describes the vertical locations corresponding to dimensionless vertical coordinate values, and provides the spatial coordinates of non-rectilinear gridded data. Since climate and forecast data are often not simply representative of points in space/time, other extensions provide for the description of coordinate intervals, multidimensional cells and climatological time coordinates, and indicate how a data value is representative of an interval or cell. This standard also relaxes the COARDS constraints on dimension order and specifies methods for reducing the size of datasets.
1. Introduction
1.1. Goals
The NetCDF library [NetCDF] is designed to read and write data that has been structured according to well-defined rules and is easily ported across various computer platforms. The netCDF interface enables but does not require the creation of self-describing datasets. The purpose of the CF conventions is to require conforming datasets to contain sufficient metadata that they are self-describing in the sense that each variable in the file has an associated description of what it represents, including physical units if appropriate, and that each value can be located in space (relative to earth-based coordinates) and time.
An important benefit of a convention is that it enables software tools to display data and perform operations on specified subsets of the data with minimal user intervention. It is possible to provide the metadata describing how a field is located in time and space in many different ways that a human would immediately recognize as equivalent. The purpose in restricting how the metadata is represented is to make it practical to write software that allows a machine to parse that metadata and to automatically associate each data value with its location in time and space. It is equally important that the metadata be easy for human users to write and to understand.
This standard is intended for use with climate and forecast data, for atmosphere, surface and ocean, and was designed with model-generated data particularly in mind. We recognise that there are limits to what a standard can practically cover; we restrict ourselves to issues that we believe to be of common and frequent concern in the design of climate and forecast metadata. Our main purpose therefore, is to propose a clear, adequate and flexible definition of the metadata needed for climate and forecast data. Although this is specifically a netCDF standard, we feel that most of the ideas are of wider application. The metadata objects could be contained in file formats other than netCDF. Conversion of the metadata between files of different formats will be facilitated if conventions for all formats are based on similar ideas.
This convention is designed to be backward compatible with the COARDS conventions [COARDS], by which we mean that a conforming COARDS dataset also conforms to the CF standard. Thus new applications that implement the CF conventions will be able to process COARDS datasets.
We have also striven to maximize conformance to the COARDS standard, that is, wherever the COARDS metadata conventions provide an adequate description we require their use. Extensions to COARDS are implemented in a manner such that the content that doesn’t depend on the extensions is still accessible to applications that adhere to the COARDS standard.
1.2. Principles for design
The following principles are followed in the design of these conventions:
-
CF-netCDF metadata is designed to make datasets self-describing as far as practically possible. A self-describing dataset is one which can be interpreted without need for reference to resources outside itself, and the CF principle is to minimise that need. Therefore CF-netCDF does not use codes, but instead relies on controlled vocabularies containing terms that are chosen to be self-explanatory (but more detailed definitions of them are provided in CF documents).
-
The conventions are changed only as actually required by common use-cases, and not for needs which cannot be anticipated with certainty.
-
In order to keep them logical, consistent in approach and as simple as possible, the netCDF conventions are devised with and within the conceptual framework of the CF data model, and new standard names are constructed as far as possible to follow the syntax and vocabulary of existing standard names.
-
The conventions should be practicable for both producers and users of data.
-
The metadata should be both easily readable by humans and easily parsable by programs.
-
To avoid potential inconsistency within the metadata, the conventions should minimise redundancy.
-
The conventions should minimise the possibility for mistakes by data-writers and data-readers.
-
Conventions are provided to allow data-producers to describe the data they wish to produce, rather than attempting to prescribe what data they should produce; consequently most CF conventions are optional.
-
Because many datasets remain in use for a long time after production, it is desirable that metadata written according to previous versions of the convention should also be compliant with and have the same interpretation under later versions.
-
Because all previous versions must generally continue to be supported in software for the sake of archived datasets, and in order to limit the complexity of the conventions, there is a strong preference against introducing any new capability to the conventions when there is already some method that can adequately serve the same purpose (even if a different method would arguably be better than the existing one).
1.3. Terminology
The terms in this document that refer to components of a netCDF file are defined in the NetCDF User’s Guide (NUG) [NUG]. Some of those definitions are repeated below for convenience.
- ancestor group
-
A group from which the referring group is descended via direct parent-child relationships
- auxiliary coordinate variable
-
Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the [NUG] and used by this standard - see below). Unlike coordinate variables, there is no relationship between the name of an auxiliary coordinate variable and the name(s) of its dimension(s).
- boundary variable
-
A boundary variable is associated with a variable that contains coordinate data. When a data value provides information about conditions in a cell occupying a region of space/time or some other dimension, the boundary variable provides a description of cell extent.
- CDL syntax
-
The ascii format used to describe the contents of a netCDF file is called CDL (network Common Data form Language). This format represents arrays using the indexing conventions of the C programming language, i.e., index values start at 0, and in multidimensional arrays, when indexing over the elements of the array, it is the last declared dimension that is the fastest varying in terms of file storage order. The netCDF utilities ncdump and ncgen use this format (see NUG section on CDL syntax). All examples in this document use CDL syntax.
- cell
-
A region in one or more dimensions whose boundary can be described by a set of vertices recorded in boundary variables. The term interval is sometimes used for one-dimensional cells. A two-dimensional cell is analogous to a pixel in a raster graphic, but is a more general concept (see Section 1.4, "Overview").
- calendar
-
A CF calendar defines an ordered set of valid datetimes with integer seconds.
- coordinate variable
-
A coordinate variable is a one-dimensional variable with the same name as its dimension e.g.,
time(time)
. In CF, a coordinate variable must be of a numeric data type (note that NUG section on coordinate variables does not have this requirement). The coordinate values must be in strict monotonic order (all values are different, and they are arranged in either consistently increasing or consistently decreasing order). Missing values are not allowed in coordinate variables. To avoid confusion with coordinate variables, CF does not permit a one-dimensional string-valued variable to have the same name as its dimension. - datetime
-
The set of numbers which together identify an instant of time, namely its year, month, day, hour, minute and second, where the second may have a fraction but the others are all integer.
- grid mapping variable
-
A variable used as a container for attributes that define a specific grid mapping. The type of the variable is arbitrary since it contains no data.
- interpolation variable
-
A variable used as a container for attributes that define a specific interpolation method for uncompressing tie point variables. The type of the variable is arbitrary since it contains no data.
- latitude dimension
-
A dimension of a netCDF variable that has an associated latitude coordinate variable.
- local apex group
-
The nearest (to a referring group) ancestor group in which a dimension of an out-of-group coordinate is defined. The word "apex" refers to position of this group at the vertex of the tree of groups formed by it, the referring group, and the group where a coordinate is located.
- longitude dimension
-
A dimension of a netCDF variable that has an associated longitude coordinate variable.
- most rapidly varying dimension
-
The dimension of a multidimensional variable for which elements are adjacent in storage. When netCDF is represented in CDL, the most rapidly varying dimension is the last one e.g.
x
infloat data(z,y,x)
. C and Python NumPy use the same order as C, also called "column-major order", but Fortran uses the opposite convention, also called "row-major order", so that when netCDF variables are accessed in Fortran the most rapidly varying dimension is the first one. - multidimensional coordinate variable
-
An auxiliary coordinate variable that is multidimensional.
- nearest item
-
The item (variable or group) that can be reached via the shortest traversal of the file from the referring group following the rules set forth in the Section 2.7, "Groups".
- out-of-group reference
-
A reference to a variable or dimension that is not contained in the referring group.
- path
-
Paths must follow the UNIX style path convention and may begin with either a '/', '..', or a word.
- quantization variable
-
A variable used as a container for attributes that define a specific quantization algorithm. The type of the variable is arbitrary since it contains no data.
- recommendation
-
Recommendations in this convention are meant to provide advice that may be helpful for reducing common mistakes. In some cases we have recommended rather than required particular attributes in order to maintain backwards compatibility with COARDS. An application must not depend on a dataset’s adherence to recommendations.
- referring group
-
The group in which a reference to a variable or dimension occurs.
- scalar coordinate variable
-
A scalar variable (i.e. one with no dimensions) that contains coordinate data. Depending on context, it may be functionally equivalent either to a size-one coordinate variable (Section 5.7, "Scalar Coordinate Variables") or to a size-one auxiliary coordinate variable (Section 6.1, "Labels" and Section 9.2, "Collections, instances, and elements").
- sibling group
-
Any group with the same parent group as the referring group
- spatiotemporal dimension
-
A dimension of a netCDF variable that is used to identify a location in time and/or space.
- tie point variable
-
A netCDF variable that contains coordinates that have been compressed by sampling. There is no relationship between the name of a tie point variable and the name(s) of its dimension(s).
- time dimension
-
A dimension of a netCDF variable that has an associated time coordinate variable.
- vertex dimension
-
The dimension of a boundary variable along which the vertices of each cell are ordered.
- vertical dimension
-
A dimension of a netCDF variable that has an associated vertical coordinate variable.
1.4. Overview
No variable or dimension names are standardized by this convention. Instead we follow the lead of the [NUG] and standardize only the names of attributes and some of the values taken by those attributes. Variable or dimension names can either be a single variable name or a path to a variable. The overview provided in this section will be followed with more complete descriptions in following sections. Appendix A, Attributes contains a summary of all the attributes used in this convention.
Files using this version of the CF Conventions must set the [NUG] defined attribute Conventions
to contain the string value "CF-1.12-draft
" to identify datasets that conform to these conventions.
The general description of a file’s contents should be contained in the following attributes: title
, history
, institution
, source
, comment
and references
(Section 2.6.2, "Description of file contents").
For backwards compatibility with COARDS none of these attributes is required, but their use is recommended to provide human readable documentation of the file contents.
Each variable in a netCDF file has an associated description which is provided by the attributes units
, long_name
, and standard_name
.
The units
, and long_name
attributes are defined in the [NUG] and the standard_name
attribute is defined in this document.
The units
attribute is required for all variables that represent dimensional quantities (except for boundary variables defined in Section 7.1, "Cell Boundaries").
The values of the units
attributes are character strings that are recognized by UNIDATA’s UDUNITS package [UDUNITS] (with exceptions allowed as discussed in Section 3.1, "Units").
The long_name
and standard_name
attributes are used to describe the content of each variable.
For backwards compatibility with COARDS neither is required, but use of at least one of them is strongly recommended.
The use of standard names will facilitate the exchange of climate and forecast data by providing unambiguous identification of variables most commonly analyzed.
Four types of coordinates receive special treatment by these conventions: latitude, longitude, vertical, and time. Every variable must have associated metadata that allows identification of each such coordinate that is relevant. Two independent parts of the convention allow this to be done. There are conventions that identify the variables that contain the coordinate data, and there are conventions that identify the type of coordinate represented by that data.
There are two methods used to identify variables that contain coordinate data.
The first is to use the [NUG]-defined "coordinate variables."
The use of coordinate variables is required for all dimensions that correspond to one dimensional space or time coordinates.
In cases where coordinate variables are not applicable, the variables containing coordinate data are identified by the coordinates
attribute.
Once the variables containing coordinate data are identified, further conventions are required to determine the type of coordinate represented by each of these variables.
Latitude, longitude, and time coordinates are identified solely by the value of their units
attribute.
Vertical coordinates with units of pressure may also be identified by the units
attribute.
Other vertical coordinates must use the attribute positive
which determines whether the direction of increasing coordinate value is up or down.
Because identification of a coordinate type by its units involves the use of an external package [UDUNITS], we provide the optional attribute axis
for a direct identification of coordinates that correspond to latitude, longitude, vertical, or time axes.
Latitude, longitude, and time are defined by internationally recognized standards, and hence, identifying the coordinates of these types is sufficient to locate data values uniquely with respect to time and a point on the earth’s surface.
On the other hand identifying the vertical coordinate is not necessarily sufficient to locate a data value vertically with respect to the earth’s surface.
In particular a model may output data on the parametric (usually dimensionless) vertical coordinate used in its mathematical formulation.
To achieve the goal of being able to spatially locate all data values, this convention provides a mapping, via the standard_name
and formula_terms
attributes of a parametric vertical coordinate variable, between its values and dimensional vertical coordinate values that can be uniquely located with respect to a point on the earth’s surface (Section 4.3.3, "Parametric Vertical Coordinate"; Appendix D, Parametric Vertical Coordinates).
It is often the case that data values are not representative of single points in time, space and other dimensions, but rather of intervals or multidimensional cells.
CF defines a bounds
attribute to specify the extent of intervals or cells.
Because both the [NUG] and [COARDS] define coordinate variables but not cells or bounds, many applications assume that gridpoints are always located at the centers of their cells.
This assumption does not hold in CF. If bounds are not provided, the location of the gridpoint within the cell is undefined, and nothing can be assumed about the location and extent of the cell.
A two-dimensional cell is analogous to a pixel in a raster graphic, but is a more general concept. Pixels in a raster are evenly spaced in each dimension and arranged in a logically rectangular array. Two-dimensional cells in a CF field do not necessarily satisfy either of those conditions, though they commonly do. Furthermore, as an alternative to cells in two dimensions, CF defines a convention for the case where each data value is associated with a geographical feature that is described by one or more points, lines or polygons.
When data that is representative of cells can be described by simple statistical methods (for instance, mean or maximum), those methods can be indicated using the cell_methods
attribute.
An important application of this attribute is to describe climatological and diurnal statistics.
Methods for reducing the total volume of data include both packing and compression.
Packing reduces the data volume by reducing the precision of the stored numbers.
It is implemented using the attributes add_offset
and scale_factor
which are defined in the [NUG].
Compression on the other hand loses no precision, but reduces the volume by not storing missing data.
The attribute compress
is defined for this purpose.
1.5. Relationship to the COARDS Conventions
These conventions generalize and extend the COARDS conventions [COARDS]. A major design goal has been to maintain backward compatibility with COARDS. Hence applications written to process datasets that conform to these conventions will also be able to process COARDS conforming datasets. We have also striven to maximize conformance to the COARDS standard so that datasets that only require the metadata that was available under COARDS will still be able to be processed by COARDS conforming applications. But because of the extensions that provide new metadata content, and the relaxation of some COARDS requirements, datasets that conform to these conventions will not necessarily be recognized by applications that adhere to the COARDS conventions. The features of these conventions that allow writing netCDF files that are not COARDS conforming are summarized below.
COARDS standardizes the description of grids composed of independent latitude, longitude, vertical, and time axes. In addition to standardizing the metadata required to identify each of these axis types, COARDS requires (time, vertical, latitude, longitude) as the CDL order for the dimensions of a variable, with longitude being the most rapidly varying dimension (the last dimension in CDL order). Because of I/O performance considerations it may not be possible for models to output their data in conformance with the COARDS requirement. The CF convention places no rigid restrictions on the order of dimensions, however we encourage data producers to make the extra effort to stay within the COARDS standard order. The use of non-COARDS axis ordering will render files inaccessible to some applications and limit interoperability. Often a buffering operation can be used to miminize performance penalties when axis ordering in model code does not match the axis ordering of a COARDS file.
COARDS addresses the issue of identifying dimensionless vertical coordinates, but does not provide any mechanism for mapping the dimensionless values to dimensional ones that can be located with respect to the earth’s surface.
For backwards compatibility we continue to allow (but do not require) the units
attribute of dimensionless vertical coordinates to take the values "level", "layer", or "sigma_level."
But we recommend that the standard_name
and formula_terms
attributes be used to identify the appropriate definition of the dimensionless vertical coordinate (see Section 4.3.3, "Parametric Vertical Coordinate").
The CF conventions define attributes which enable the description of data properties that are outside the scope of the COARDS conventions. These new attributes do not violate the COARDS conventions, but applications that only recognize COARDS conforming datasets will not have the capabilities that the new attributes are meant to enable. Briefly the new attributes allow:
-
Identification of quantities using standard names.
-
Description of dimensionless vertical coordinates.
-
Associating dimensions with auxiliary coordinate variables.
-
Linking data variables to scalar coordinate variables.
-
Associating dimensions with labels.
-
Description of intervals and cells.
-
Description of properties of data defined on intervals and cells.
-
Description of climatological statistics.
-
Data compression for variables with missing values.
1.6. UGRID Conventions
These conventions implicitly incorporate parts of the UGRID conventions for storing unstructured (or flexible mesh) data in netCDF files using mesh topologies [UGRID]. Only version 1.0 of the UGRID conventions is allowed. The UGRID conventions description is referenced from, rather than rewritten into, this document and the canonical description of how to store mesh topologies is only to be found at [UGRID]. A summary indicating how UGRID relates to other parts of the CF conventions, and which features of UGRID are excluded from CF, can be found in Section 5.9, "Mesh Topology Variables". To reduce the chance of ambiguities arising from their accidental re-use, all of the UGRID standardized attributes are specified in Appendix K, Mesh Topology Attributes and Appendix A, Attributes.
The UGRID conventions have their own conformance document, which should be used in conjunction with the CF conformance document when checking the validity of datasets.
2. NetCDF Files and Components
The components of a netCDF file are described in section 2 of the [NUG]. In this section we describe conventions associated with filenames and the basic components of a netCDF file. We also introduce new attributes for describing the contents of a file.
2.2. Data Types
Data variables must be one of the following data types: string
, char
, byte
, unsigned byte
, short
, unsigned short
, int
, unsigned int
, int64
, unsigned int64
, float
or real
, and double
(which are all the netCDF external data types supported by netCDF-4).
The string
type, which has variable length, is only available in files using the netCDF version 4 (netCDF-4) format.
The char
and string
types are not intended for numeric data.
One byte numeric data should be stored using the byte
or unsigned byte
data types.
It is possible to treat the byte
and short
types as unsigned by using the [NUG] convention of indicating the unsigned range using the valid_min
, valid_max
, or valid_range
attributes.
In many situations, any integer type may be used.
When the phrase "integer type" is used in this document, it should be understood to mean byte
, unsigned byte
, short
, unsigned short
, int
, unsigned int
, int64
, or unsigned int64
.
A text string can be stored either in a variable-length string
or in a fixed-length char
array.
In both cases, text strings must be represented in Unicode Normalization Form C (NFC, section 3.11 and Annex 15 of the Unicode standard) and encoded according to UTF-8.
A text string consisting only of ASCII characters is guaranteed to conform with this requirement, because the ASCII characters are a subset of Unicode, and their NFC UTF-8 encodings are the same as their one-byte ASCII codes (decimal 0-127, hexadecimal 00
-7F
).
Before version 1.12, CF did not require UTF-8 encoding, and did not provide or endorse any convention to record what encoding was used.
However, if the text string is stored in a char
variable, the encoding might be recorded by the _Encoding
attribute, although this is not a CF or NUG convention.
An n-dimensional array of strings may be implemented as a variable or an attribute of type string
with n dimensions (only n=1 is allowed for an attribute) or as a variable of type char
with n+1 dimensions, where the most rapidly varying dimension (the last dimension in CDL order) is large enough to contain the longest string in the variable.
For example, a char
variable containing the names of the months would be dimensioned (12,9) in order to accommodate "September", the month with the longest name.
The other strings, such as "May", would be padded with trailing NULL or space characters so that every array element is filled.
A string
variable to store the same information would be dimensioned (12), with each element of the array containing a string of the appropriate length.
The CDL example below shows one variable of each type.
dimensions: strings = 30 ; strlen = 10 ; variables: char char_variable(strings,strlen) ; char_variable:long_name = "strings of type char" ; string str_variable(strings) ; str_variable:long_name = "strings of type string" ;
The examples in this document that use string-valued variables alternate between these two forms.
2.3. Naming Conventions
It is recommended that variable, dimension, attribute and group names begin with a letter and be composed of letters, digits, and underscores.
By the word letters we mean the standard ASCII letters uppercase A
to Z
and lowercase a
to z
.
By the word digits we mean the standard ASCII digits 0
to 9
, and similarly underscores means the standard ASCII underscore _
.
Note that this is in conformance with the COARDS conventions, but is more restrictive than the netCDF interface which allows almost all Unicode characters encoded as multibyte UTF-8 characters (NUG Appendix B).
The netCDF interface also allows leading underscores in names, but the NUG states that this is reserved for system use.
Case is significant in netCDF names, but it is recommended that names should not be distinguished purely by case, i.e., if case is disregarded, no two names should be the same. It is also recommended that names should be obviously meaningful, if possible, as this renders the file more effectively self-describing.
This convention does not standardize any variable or dimension names.
Attribute names and their contents, where standardized, are given in English in this document and should appear in English in conforming netCDF files for the sake of portability.
Languages other than English are permitted for variables, dimensions, and non-standardized attributes.
The content of some standardized attributes are string values that are not standardized, and thus are not required to be in English.
For example, a description of what a variable represents may be given in a non-English language using the long_name
attribute (see Section 3.2, "Long Name") whose contents are not standardized, but a description given by the standard_name
attribute (see Section 3.3, "Standard Name") must be taken from the standard name table which is in English.
2.4. Dimensions
A variable may have any number of dimensions, including zero, and the dimensions must all have different names. COARDS strongly recommends limiting the number of dimensions to four, but we wish to allow greater flexibility. The dimensions of the variable define the axes of the quantity it contains. Dimensions other than those of space and time may be included. Several examples can be found in this document. Under certain circumstances, one may need more than one dimension in a particular quantity. For instance, a variable containing a two-dimensional probability density function might correlate the temperature at two different vertical levels, and hence would have temperature on both axes.
If any or all of the dimensions of a variable have the interpretations of "date or time" (T
), "height or depth" (Z
), "latitude" (Y
), or "longitude" (X
) then we recommend, but do not require (see Section 1.5, "Relationship to the COARDS Conventions"), those dimensions to appear in the relative order T
, then Z
, then Y
, then X
in the CDL definition corresponding to the file.
All other dimensions should, whenever possible, be placed to the left of the spatiotemporal dimensions.
Dimensions may be of any size, including unity. When a single value of some coordinate applies to all the values in a variable, the recommended means of attaching this information to the variable is by use of a dimension of size unity with a one-element coordinate variable. It is also acceptable to use a scalar coordinate variable which eliminates the need for an associated size one dimension in the data variable. The advantage of using either a coordinate variable or an auxiliary coordinate variable is that all its attributes can be used to describe the single-valued quantity, including boundaries. For example, a variable containing data for temperature at 1.5 m above the ground has a single-valued coordinate supplying a height of 1.5 m, and a time-mean quantity has a single-valued time coordinate with an associated boundary variable to record the start and end of the averaging period.
2.5. Variables
This convention does not standardize variable names.
NetCDF variables that contain coordinate data are referred to as coordinate variables, auxiliary coordinate variables, scalar coordinate variables, or multidimensional coordinate variables.
2.5.1. Missing data, valid and actual range of data
NUG Appendix A, Attribute Conventions
provide the _FillValue
, missing_value
, valid_min
, valid_max
, and valid_range
attributes to indicate missing data.
Missing data is allowed in data variables and auxiliary coordinate variables.
Generic applications should treat the data as missing where any auxiliary coordinate variables have missing values; special-purpose applications might be able to make use of the data.
Missing data is not allowed in coordinate variables.
The NUG conventions for missing data changed significantly between version 2.3 and version 2.4.
Since version 2.4 the NUG defines missing data as all values outside of the valid_range
, and specifies how the valid_range
should be defined from the _FillValue
(which has library specified default values) if it hasn’t been explicitly specified.
If only one missing value is needed for a variable then we recommend that this value be specified using the _FillValue
attribute.
Doing this guarantees that the missing value will be recognized by generic applications that follow either the before or after version 2.4 conventions.
The scalar attribute with the name _FillValue
and of the same type as its variable is recognized by the netCDF library as the value used to pre-fill disk space allocated to the variable.
This value is considered to be a special value that indicates undefined or missing data, and is returned when reading values that were not written.
The _FillValue
should be outside the range specified by valid_range
(if used) for a variable.
The netCDF library defines a default fill value for each data type (See the "Note on fill values" in NUG Appendix B, File Format Specifications).
The missing values of a variable with scale_factor
and/or add_offset
attributes (see Section 8.1, "Packed Data") are interpreted relative to the variable’s external values (a.k.a. the packed values, the raw values, the values stored in the netCDF file), not the values that result after the scale and offset are applied.
Applications that process variables that have attributes to indicate both a transformation (via a scale and/or offset) and missing values should first check that a data value is valid, and then apply the transformation.
Note that values that are identified as missing should not be transformed.
Since the missing value is outside the valid range it is possible that applying a transformation to it could result in an invalid operation.
For example, the default _FillValue
is very close to the maximum representable value of IEEE single precision floats, and multiplying it by 100 produces an "Infinity" (using single precision arithmetic).
This convention defines a two-element vector attribute actual_range
for variables containing numeric data.
If the variable is packed using the scale_factor
and add_offset
attributes (see Section 8.1, "Packed Data"), the elements of the actual_range
should have the type intended for the unpacked data.
The elements of actual_range
must be exactly equal to the minimum and the maximum data values which occur in the variable (when unpacked if packing is used), and both must be within the valid_range
if specified.
If the data is all missing or invalid, the actual_range
attribute cannot be used.
2.6. Attributes
This standard describes many attributes (some mandatory, others optional), but a file may also contain non-standard attributes. Such attributes do not represent a violation of this standard. Application programs should ignore attributes that they do not recognise or which are irrelevant for their purposes. Conventional attribute names should be used wherever applicable. Non-standard names should be as meaningful as possible. Before introducing an attribute, consideration should be given to whether the information would be better represented as a variable. In general, if a proposed attribute requires ancillary data to describe it, is multidimensional, requires any of the defined netCDF dimensions to index its values, or requires a significant amount of storage, a variable should be used instead. When this standard defines string attributes that may take various prescribed values, the possible values are generally given in lower case. However, applications programs should not be sensitive to case in these attributes. Several string attributes are defined by this standard to contain "blank-separated lists". Consecutive words in such a list are separated by one or more adjacent spaces. The list may begin and end with any number of spaces. See Appendix A, Attributes for a list of attributes described by this standard.
2.6.1. Identification of Conventions
Files that follow this version of the CF Conventions must indicate this by setting the [NUG] defined global attribute Conventions
to a string value that contains "CF-1.12-draft
".
The Conventions version number contained in that string can be used to find the web based versions of this document are from the netCDF Conventions web page.
Subsequent versions of the CF Conventions will not make invalid a compliant usage of this or earlier versions of the CF terms and forms.
It is possible for a netCDF file to adhere to more than one set of conventions, even when there is no inheritance relationship among the conventions. In this case, the value of the Conventions attribute may be a single text string containing a list of the convention names separated by blank space (recommended) or commas (if a convention name contains blanks). This is the Unidata recommended syntax from NetCDF Users Guide, Appendix A. If the string contains any commas, it is assumed to be a comma-separated list.
When CF is listed with other conventions, this asserts the same full compliance with CF requirements and interpretations as if CF was the sole convention. It is the responsibility of the data-writer to ensure that all common metadata is used with consistent meaning between conventions.
The UGRID conventions, which are fully incorporated into the CF conventions, do not need to be included in the Conventions
attribute.
2.6.2. Description of file contents
The following attributes are intended to provide information about where the data came from and what has been done to it. This information is mainly for the benefit of human readers. The attribute values are all character strings. For readability in ncdump outputs it is recommended to embed newline characters into long strings to break them into lines. For backwards compatibility with COARDS none of these global attributes is required.
The [NUG] defines title
and history
to be global attributes.
We wish to allow the newly defined attributes, i.e., institution
, source
, references
, and comment
, to be either global or assigned to individual variables.
When an attribute appears both globally and as a variable attribute, the variable’s version has precedence.
title
-
A succinct description of what is in the dataset.
institution
-
Specifies where the original data was produced.
source
-
The method of production of the original data. If it was model-generated,
source
should name the model and its version, as specifically as could be useful. If it is observational,source
should characterize it (e.g., "surface observation
" or "radiosonde
"). history
-
Provides an audit trail for modifications to the original data. Well-behaved generic netCDF filters will automatically append their name and the parameters with which they were invoked to the global history attribute of an input netCDF file. We recommend that each line begin by indicating the date and time of day that the program was executed.
references
-
Published or web-based references that describe the data or methods used to produce it.
comment
-
Miscellaneous information about the data or methods used to produce it.
2.6.3. External Variables
The global external_variables
attribute is a blank-separated list of the names of variables which are named by attributes in the file but which are not present in the file.
These variables are to be found in other files (called "external files") but CF does not provide conventions for identifying the files concerned.
The only attribute for which CF standardises the use of external variables is cell_measures
.
2.7. Groups
Groups provide a powerful mechanism to structure data hierarchically. This convention does not standardize group names. It may be of benefit to name groups in such a way that human readers can interpret them. However, files that conform to this standard shall not require software to interpret or decode information from group names. References to out-of-group variable and dimensions shall be found by applying the scoping rules outlined below.
2.7.1. Scope
The scoping mechanism is in keeping with the following principle:
"Dimensions are scoped such that they are visible to all child groups. For example, you can define a dimension in the root group, and use its dimension id when defining a variable in a sub-group."
Any variable or dimension can be referred to, as long as it can be found with one of the following search strategies:
-
Search by absolute path
-
Search by relative path
-
Search by proximity
These strategies are explained in detail in the following sections.
If any dimension of an out-of-group variable has the same name as a dimension of the referring variable, the two must be the same dimension (i.e. they must have the same netCDF dimension ID).
Search by absolute path
A variable or dimension specified with an absolute path (i.e., with a leading slash "/") is at the indicated location relative to the root group, as in a UNIX-style file convention.
For example, a coordinates
attribute of /g1/lat
refers to the lat
variable in group /g1
.
Search by relative path
As in a UNIX-style file convention, a variable or dimension specified with a relative path (i.e., containing a slash but not with a leading slash, e.g. child/lat
) is at the location obtained by affixing the relative path to the absolute path of the referring attribute.
For example, a coordinates
attribute of g1/lat
refers to the lat
variable in subgroup g1
of the current (referring) group.
Upward path traversals from the current group are indicated with the UNIX convention.
For example, ../g1/lat
refers to the lat
variable in the sibling group g1
of the current (referring) group.
Search by proximity
A variable or dimension specified with no path (for example, lat
) refers to the variable or dimension of that name, if there is one, in the referring group.
If not, the ancestors of the referring group are searched for it, starting from the direct ancestor and proceeding toward the root group, until it is found.
A special case exists for coordinate variables. Because coordinate variables must share dimensions with the variables that reference them, the ancestor search is executed only until the local apex group is reached. For coordinate variables that are not found in the referring group or its ancestors, a further strategy is provided, called lateral search. The lateral search proceeds downwards from the local apex group width-wise through each level of groups until the sought coordinate is found. The lateral search algorithm may only be used for [NUG] coordinate variables; it shall not be used for auxiliary coordinate variables.
Note
|
This use of the lateral search strategy to find them is discouraged. They are allowed mainly for backwards-compatibility with existing datasets, and may be deprecated in future versions of the standard. |
2.7.2. Application of attributes
The following attributes are optional for non-root groups. They are allowed in order to provide additional provenance and description of the subsidiary data. They do not override attributes from parent groups.
-
title
-
history
If these attributes are present, they may be applied additively to the parent attributes of the same name. If a file containing groups is modified, the user or application need only update these attributes in the root group, rather than traversing all groups and updating all attributes that are found with the same name. In the case of conflicts, the root group attribute takes precedence over per-group instances of these attributes.
The following attributes may only be used in the root group and shall not be duplicated or overridden in child groups:
-
Conventions
-
external_variables
Furthermore, per-variable attributes must be attached to the variables to which they refer. They may not be attached to a group, even if all variables within that group use the same attribute and value.
If attributes are present within groups without being attached to a variable, these attributes apply to the group where they are defined, and to that group’s descendants, but not to ancestor or sibling groups. If a group attribute is defined in a parent group, and one of the child group redefines the same attribute, the definition within the child group applies for the child and all of its descendants.
3. Description of the Data
The attributes described in this section are used to provide a description of the content and the units of measurement for each variable.
We continue to support the use of the units
and long_name
attributes as defined in COARDS.
We extend COARDS by adding the optional standard_name
attribute which is used to provide unique identifiers for variables.
This is important for data exchange since one cannot necessarily identify a particular variable based on the name assigned to it by the institution that provided the data.
The standard_name
attribute can be used to identify variables that contain coordinate data.
But since it is an optional attribute, applications that implement these standards must continue to be able to identify coordinate types based on the COARDS conventions.
3.1. Units
The units
attribute is required for all variables that represent dimensional quantities (except for boundary variables defined in Section 7.1, "Cell Boundaries" and climatology variables defined in Section 7.4, "Climatological Statistics").
The units
attribute is permitted but not required for dimensionless quantities (see Section 3.1.1, "Dimensionless units").
The value of the units
attribute is a string that can be recognized by the UDUNITS package [UDUNITS], with the exceptions that are given in Section 3.1.1, "Dimensionless units" and Section 3.1.3, "Scale factors and offsets".
Note that case is significant in the units
strings.
Note also that CF depends on UDUNITS only for the definition of legal units
strings.
CF does not assume or require that the UDUNITS software will be used for units
conversion.
In most units
conversions, the sole operation on the data is multiplication by a scale factor.
Special treatment is required in converting the units
of variables that involve temperature (Section 3.1.2, "Temperature units") and the units
of time coordinate variables (Section 4.4, "Time Coordinate").
The COARDS convention prohibits the unit degrees
altogether, but this unit is not forbidden by the CF convention because it may in fact be appropriate for a variable containing, say, solar zenith angle.
The unit degrees
is also allowed on coordinate variables such as the latitude and longitude coordinates of a transformed grid.
In this case the coordinate values are not true latitudes and longitudes, which must always be identified using the more specific forms of degrees
as described in Section 4.1, "Latitude Coordinate" and Section 4.2, "Longitude Coordinate".
3.1.1. Dimensionless units
A variable with no units
attribute is assumed to be dimensionless.
However, a units
attribute specifying a dimensionless unit may optionally be included.
The canonical unit (see also Section 3.3, "Standard Name") for dimensionless quantities that represent fractions, or parts of a whole, is 1
.
The UDUNITS package defines a few dimensionless units, such as percent
, ppm
(parts per million, 1e-6), and ppb
(parts per billion, 1e-9).
As an alternative to the canonical units
of 1
or some other unitless number, the units
for a dimensionless quantity may be given as a ratio of dimensional units, for instance mg kg-1
for a mass ratio of 1e-6, or microlitre litre-1
for a volume ratio of 1e-6. Data-producers are invited to consider whether this alternative would be more helpful to the users of their data.
The CF convention supports dimensionless units that are UDUNITS compatible, with one exception, concerning the dimensionless units defined by UDUNITS for volume ratios, such as ppmv
and ppbv
.
These units are allowed in the units
attribute by CF only if the data variable has no standard_name
.
These units are prohibited by CF if there is a standard_name
, because the standard_name
defines whether the quantity is a volume ratio, so the units
are needed only to indicate a dimensionless number.
Information describing a dimensionless physical quantity itself (e.g.
"area fraction" or "probability") does not belong in the units
attribute, but should be given in the long_name
or standard_name
attributes (see Section 3.2, "Long Name" and Section 3.3, "Standard Name"), in the same way as for physical quantities with dimensional units.
As an exception, to maintain backwards compatibility with COARDS, the text strings level
, layer
, and sigma_level
are allowed in the units
attribute, in order to indicate dimensionless vertical coordinates.
This use of units
is not compatible with UDUNITS, and is deprecated by this standard because conventions for more precisely identifying dimensionless vertical coordinates are available (see Section 4.3.2, "Dimensionless Vertical Coordinate").
The UDUNITS syntax that allows scale factors and offsets to be applied to a unit is not supported by this standard, except for case of specifying reference time, see section Section 4.4, "Time Coordinate".
The application of any scale factors or offsets to data should be indicated by the scale_factor
and add_offset
attributes.
Use of these attributes for data packing, which is their most important application, is discussed in detail in Section 8.1, "Packed Data".
3.1.2. Temperature units
The units
of temperature imply an origin (i.e. zero point) for the associated measurement scale.
When the temperature value is the degree of warmth with respect to the origin of the measurement scale, we call it an on-scale temperature.
When units
of on-scale temperature are converted, the data may require the addition of an offset as well as multiplication by a scale factor, because the physical meaning of a numerical value of zero for an on-scale temperature depends on the unit of measurement.
On-scale temperature is unique among quantities in the respect that the origin and the unit of measurement are both defined by the units
and therefore cannot be chosen independently.
For all other quantities, the origin and the unit of measurement are independent.
Converting the unit of measurement alone, without changing the origin, does not change the meaning of zero.
For example (using bold to indicate a numerical data value), 0 kilogram
is the same mass as 0 pound
, and 0 seconds since 1970-1-1
means the same as 0 days since 1970-1-1
, but 0 degC
is not the same temperature as 0 degF
(= -17.8 degC
), because these two temperature units
implicitly refer to measurement scales which have different origins.
On the other hand, when the temperature value is a temperature difference, which compares two on-scale temperatures with the same origin, the value of that origin is irrelevant as it cancels out when taking the difference.
Therefore to convert the units
of a temperature difference requires only multiplication by a scale factor, without the addition of an offset.
The units
attribute does not distinguish between on-scale temperatures and temperature differences.
This ambiguity also affects units of temperature raised to some power e.g. K^2
or multiplied by other units e.g. W m-2 K-1
, degF/foot
or degC m s-1
.
A standard_name
(Section 3.3, "Standard Name") or standard_name
modifier (Appendix C, Standard Name Modifiers) may clarify the intention, but they are optional.
Some statistical operations described by the cell_methods
attribute (Section 7.3, "Cell Methods"; Appendix E, Cell Methods) imply that temperature must be interpreted as temperature difference, but this attribute is optional too.
In order to convert the units
correctly, it is essential to know whether a temperature is on-scale or a difference.
Therefore this standard strongly recommends that any variable whose units
involve a temperature unit should also have a units_metadata
attribute to make the distinction.
This attribute must have one of the following three values: temperature: on_scale
, temperature: difference
, temperature: unknown
.
The units_metadata
attribute, standard_name
modifier (Appendix C, Standard Name Modifiers) and cell_methods
attribute (Appendix E, Cell Methods) must be consistent if present.
A variable must not have a units_metadata
attribute if it has no units
attribute or if its units
do not involve a temperature unit.
units_metadata
to distinguish temperature quantitiesvariables: float Tonscale; Tonscale:long_name="global-mean surface temperature"; Tonscale:standard_name="surface_temperature"; Tonscale:units="degC"; Tonscale:units_metadata="temperature: on_scale"; Tonscale:cell_methods="area: mean"; float Tdifference; Tdifference:long_name="change in global-mean surface temperature relative to pre-industrial"; Tdifference:standard_name="surface_temperature"; Tdifference:units="degC"; Tdifference:units_metadata="temperature: difference"; Tdifference:cell_methods="area: mean";
With temperature: unknown
, correct conversion of the units
cannot be guaranteed.
This value of units_metadata
indicates that the data-writer does not know whether the temperature is on-scale or a difference.
If the units_metadata
attribute is not present, the data-reader should assume temperature: unknown
.
The units_metadata
attribute was introduced in CF 1.11.
In data written according to versions before 1.11, temperature: unknown
should be assumed for all units
involving temperature, if it cannot be deduced from other metadata.
We note (for guidance only regarding temperature: unknown
, not as a CF convention) that the UDUNITS software assumes temperature: on_scale
for units
strings containing only a unit of temperature, and temperature: difference
for units
strings in which a unit of temperature is raised to any power other than unity, or multiplied or divided by any other unit.
With temperature: on_scale
, correct conversion can be guaranteed only for pure temperature units
.
If the quantity is an on-scale temperature multiplied by some other quantity, it is not possible to convert the data from the units
given to any other units
that involve a temperature with a different origin, given only the units
.
For instance, when temperature is on-scale, a value in kg degree_C m-2
can be converted to a value in kg K m-2
only if we know separately the values in degree_C
and kg m-2
of which it is the product.
3.1.3. Scale factors and offsets
UDUNITS recognises the SI prefixes shown in Prefixes for decimal multiples and submultiples of units for decimal multiples and submultiples of units, and allows them to be applied to non-SI units as well.
UDUNITS offers a syntax for indicating arbitrary scale factors and offsets to be applied to a unit.
(Note that this is different from the scale factors and offsets used for converting between units
, as discussed for temperature in Section 3.1.2, "Temperature units".)
This UDUNITS syntax for arbitrary transformation of units
is not supported by the CF standard, except for the case of specifying reference time (Section 4.4, "Time Coordinate").
The application of any scale factors or offsets to data should be indicated by the scale_factor
and add_offset
attributes.
Use of these attributes for data packing, which is their most important application, is discussed in detail in Section 8.1, "Packed Data".
Factor | Prefix | Abbreviation | Factor | Prefix | Abbreviation | |
---|---|---|---|---|---|---|
1e1 |
deca,deka |
da |
1e-1 |
deci |
d |
|
1e2 |
hecto |
h |
1e-2 |
centi |
c |
|
1e3 |
kilo |
k |
1e-3 |
milli |
m |
|
1e6 |
mega |
M |
1e-6 |
micro |
u |
|
1e9 |
giga |
G |
1e-9 |
nano |
n |
|
1e12 |
tera |
T |
1e-12 |
pico |
p |
|
1e15 |
peta |
P |
1e-15 |
femto |
f |
|
1e18 |
exa |
E |
1e-18 |
atto |
a |
|
1e21 |
zetta |
Z |
1e-21 |
zepto |
z |
|
1e24 |
yotta |
Y |
1e-24 |
yocto |
y |
3.2. Long Name
The long_name
attribute is defined by the [NUG] to contain a long descriptive name which may, for example, be used for labeling plots.
For backwards compatibility with COARDS this attribute is optional.
But it is highly recommended that either this or the standard_name
attribute defined in the next section be provided for all data variables and variables containing coordinate data, in order to make the file self-describing.
If a variable has no long_name
attribute then an application may use, as a default, the standard_name
if it exists, or the variable name itself.
3.3. Standard Name
A fundamental requirement for exchange of scientific data is the ability to describe precisely the physical quantities being represented.
To some extent this is the role of the long_name
attribute as defined in the [NUG].
However, usage of long_name
is completely ad-hoc.
For many applications it is desirable to have a more definitive description of the quantity, which allows users of data from different sources (some of which might be models and others observational) to determine whether quantities are in fact comparable.
For this reason each variable may optionally be given a "standard name", whose meaning is defined by this convention.
There may be several variables in a dataset with any given standard name, and these may be distinguished by other metadata, such as coordinates (Chapter 4, Coordinate Types) and cell_methods
(Section 7.3, "Cell Methods").
A standard name is associated with a variable via the attribute standard_name
which takes a string value comprised of a standard name optionally followed by one or more blanks and a standard name modifier (a string value from Appendix C, Standard Name Modifiers).
The set of permissible standard names is contained in the standard name table. The table entry for each standard name contains the following:
- standard name
-
The name used to identify the physical quantity. A standard name contains no whitespace and is case sensitive.
- canonical units
-
Representative units of the physical quantity. Unless it is dimensionless, a variable with a
standard_name
attribute must have units which are physically equivalent (not necessarily identical) to the canonical units, possibly modified by an operation specified by the standard name modifier (see below and Appendix C, Standard Name Modifiers) or by thecell_methods
attribute (see Section 7.3, "Cell Methods" and Appendix E, Cell Methods) or both.
Units of time coordinates (Section 4.4, "Time Coordinate"), whoseunits
attribute includes the wordsince
, are not physically equivalent to time units that do not includesince
in theunits
. To mark this distinction, the canonical unit given for quantities used for time coordinates iss since 1958-1-1
. The reference datetime in the canonical unit (the beginning of the day i.e. midnight on 1st January 1958 at 0degrees_east
) is not restrictive; the time coordinate variable’s ownunits
may contain any reference datetime (aftersince
) that is valid in its calendar. (We use1958-1-1
because it is the beginning of International Atomic Time, and a valid datetime in all CF calendars; see also Section 4.4.3, "Leap Seconds".) In both kinds of timeunits
attribute (with or withoutsince
), any unit for measuring time can be used i.e. any unit which is physically equivalent to the SI base unit of time, namely the second. - description
-
The description is meant to clarify the qualifiers of the fundamental quantities such as which surface a quantity is defined on or what the flux sign conventions are. We don’t attempt to provide precise definitions of fundumental physical quantities (e.g., temperature) which may be found in the literature. The description may define rules on the variable type, attributes and coordinates which must be complied with by any variable carrying that standard name (such as in Example 3.5).
The standard name table is located at
https://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml,
written in compliance with the XML format, as described in Appendix B, Standard Name Table Format.
Knowledge of the XML format is only necessary for application writers who plan to directly access the table.
A formatted text version of the table is provided at
https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html,
and this table may be consulted in order to find the standard name that should be assigned to a variable.
Some standard names (e.g. region
, Section 6.1.1, "Geographic Regions", and area_type
, Statistics applying to portions of cells) are used to indicate quantities which are permitted to take only certain standard values.
This is indicated in the definition of the quantity in the standard name table, accompanied by a list or a link to a list of the permitted values.
Standard names by themselves are not always sufficient to describe a quantity.
For example, a variable may contain data to which spatial or temporal operations have been applied.
Or the data may represent an uncertainty in the measurement of a quantity.
These quantity attributes are expressed as modifiers of the standard name.
Modifications due to common statistical operations are expressed via the cell_methods
attribute (see Section 7.3, "Cell Methods" and Appendix E, Cell Methods).
Other types of quantity modifiers are expressed using the optional modifier part of the standard_name
attribute.
The permissible values of these modifiers are given in Appendix C, Standard Name Modifiers.
standard_name
float psl(lat,lon) ; psl:long_name = "mean sea level pressure" ; psl:units = "hPa" ; psl:standard_name = "air_pressure_at_sea_level" ;
The description in the standard name table entry for air_pressure_at_sea_level
clarifies that "sea level" refers to the mean sea level, which is close to the geoid in sea areas.
3.4. Ancillary Data
When one data variable provides metadata about the individual values of another data variable it may be desirable to express this association by providing a link between the variables.
For example, instrument data may have associated measures of uncertainty.
The attribute ancillary_variables
is used to express these types of relationships.
It is a string attribute whose value is a blank separated list of variable names.
The nature of the relationship between variables associated via ancillary_variables
must be determined by other attributes.
The variables listed by the ancillary_variables
attribute will often have the standard name of the variable which points to them including a modifier (Appendix C, Standard Name Modifiers) to indicate the relationship.
The dimensions of an ancillary variable must be the same as or a subset of the dimensions of the variable to which it is related, but their order is not restricted, and with one exception:
If an ancillary variable of a data variable that has been compressed by gathering (Section 8.2, "Lossless Compression by Gathering") does not span the compressed dimension, then its dimensions may be any subset of the data variable’s uncompressed dimensions, i.e. any of the dimensions of the data variable except the compressed dimension, and any of the dimensions listed by the compress
attribute of the compressed coordinate variable.
float q(time) ; q:standard_name = "specific_humidity" ; q:units = "g/g" ; q:ancillary_variables = "q_error_limit q_detection_limit" ; float q_error_limit(time) q_error_limit:standard_name = "specific_humidity standard_error" ; q_error_limit:units = "g/g" ; float q_detection_limit(time) q_detection_limit:standard_name = "specific_humidity detection_minimum" ; q_detection_limit:units = "g/g" ;
Alternatively, ancillary_variables
may be used as status flags indicating the operational status of an instrument producing the data or as quality flags indicating the results of a quality control test, or some other quantitative quality assessment, performed against the measurements contained in the source variable.
In these cases, the flag variable will include a standard name that differs from that of the source variable and indicates the specific type of flag the variable represents.
The standard names table includes many names intended to be used in this situation, both general names meant to be used to flexibly represent any type of status or quality assessment, as well as names for specific quality control tests commonly applied to geophysical phenomena timeseries data. Several examples are listed below:
-
status_flag
andquality_flag
: general flag categories for instrument status or quality assessment -
climatology_test_quality_flag
,flat_line_test_quality_flag
,gap_test_quality_flag
,spike_test_quality_flag
: a subset of standard name flags used to indicate the results of commonly-used geophysical timeseries data quality control tests (consult the standard names table for a full list of published flags) -
aggregate_quality_flag
: flag indicating an aggregate summary of all quality tests performed on the data variable, both automated and manual (i.e. a master quality flag for a particular variable)
The following example illustrates the use of three of these flags to represent two independent quality control tests and an aggregate flag that combines the results of the two tests.
float salinity(time, z); salinity:units = "1"; salinity:long_name = "Salinity"; salinity:standard_name = "sea_water_practical_salinity"; salinity:ancillary_variables = "salinity_qc_generic salinity_qc_flat_line_test salinity_qc_agg"; int salinity_qc_generic(time, z); salinity_qc_generic:long_name = "Salinity Generic QC Process Flag"; salinity_qc_generic:standard_name = "quality_flag"; int salinity_qc_flat_line_test(time, z); salinity_qc_flat_line_test:long_name = "Salinity Flat Line Test Flag"; salinity_qc_flat_line_test:standard_name = "flat_line_test_quality_flag"; int salinity_qc_agg(time, z); salinity_qc_agg:long_name = "Salinity Aggregate Flag"; salinity_qc_agg:standard_name = "aggregate_quality_flag";
Note that the ancillary variables in this example are simplified to exclude flag_values
, flag_masks
and flag_meanings
attributes described in Section 3.5, "Flags" that they would ordinarily require
3.5. Flags
The attributes flag_values
, flag_masks
and flag_meanings
are intended to make variables that contain flag values self describing.
Status codes and Boolean (binary) condition flags may be expressed with different combinations of flag_values
and flag_masks
attribute definitions.
The flag_values
and flag_meanings
attributes describe a status flag consisting of mutually exclusive coded values.
The flag_values
attribute is the same type as the variable to which it is attached, and contains a list of the possible flag values.
The flag_meanings
attribute is a string whose value is a blank separated list of descriptive words or phrases, one for each flag value.
Each word or phrase should consist of characters from the alphanumeric set and the following five: '_', '-', '.', '+', '@'.
If multi-word phrases are used to describe the flag values, then the words within a phrase should be connected with underscores.
The following example illustrates the use of flag values to express a speed quality with an enumerated status code.
flag_values
byte current_speed_qc(time, depth, lat, lon) ; current_speed_qc:long_name = "Current Speed Quality" ; current_speed_qc:standard_name = "status_flag" ; current_speed_qc:_FillValue = -128b ; current_speed_qc:valid_range = 0b, 2b ; current_speed_qc:flag_values = 0b, 1b, 2b ; current_speed_qc:flag_meanings = "quality_good sensor_nonfunctional outside_valid_range" ;
Note that the data variable containing current speed has an ancillary_variables attribute with a value containing current_speed_qc.
The flag_masks and flag_meanings attributes describe a number of independent Boolean conditions using bit field notation by setting unique bits in each flag_masks value. The flag_masks attribute is the same type as the variable to which it is attached, and contains a list of values matching unique bit fields. The flag_meanings attribute is defined as above, one for each flag_masks value. A flagged condition is identified by performing a bitwise AND of the variable value and each flag_masks value; a non-zero result indicates a true condition. Thus, any or all of the flagged conditions may be true, depending on the variable bit settings. The following example illustrates the use of flag_masks to express six sensor status conditions.
flag_masks
byte sensor_status_qc(time, depth, lat, lon) ; sensor_status_qc:long_name = "Sensor Status" ; sensor_status_qc:standard_name = "status_flag" ; sensor_status_qc:_FillValue = 0b ; sensor_status_qc:valid_range = 1b, 63b ; sensor_status_qc:flag_masks = 1b, 2b, 4b, 8b, 16b, 32b ; sensor_status_qc:flag_meanings = "low_battery processor_fault memory_fault disk_fault software_fault maintenance_required" ;
A variable with standard name of region
, area_type
or any other standard name which requires string-valued values from a defined list may use flags together with flag_values
and flag_meanings
attributes to record the translation to the string values.
The following example illustrates this using integer flag values for a variable with standard name region
and flag_values
selected from the standardized region names (see section 6.1.1).
flag_values
int basin(lat, lon); standard_name: region; flag_values: 1, 2, 3; flag_meanings:"atlantic_arctic_ocean indo_pacific_ocean global_ocean"; data: basin: 1, 1, 1, 1, 2, ..... ;
The flag_masks
, flag_values
and flag_meanings
attributes, used together, describe a blend of independent Boolean conditions and enumerated status codes.
The flag_masks
and flag_values
attributes are both the same type as the variable to which they are attached.
A flagged condition is identified by a bitwise AND of the variable value and each flag_masks
value; a result that matches the flag_values
value indicates a true
condition.
Repeated flag_masks
define a bit field mask that identifies a number of status conditions with different flag_values
.
The flag_meanings
attribute is defined as above, one for each flag_masks
bit field and flag_values
definition.
Each flag_values
and flag_masks
value must coincide with a flag_meanings
value.
The following example illustrates the use of flag_masks
and flag_values
to express two sensor status conditions and one enumerated status code.
flag_masks
and flag_values
byte sensor_status_qc(time, depth, lat, lon) ; sensor_status_qc:long_name = "Sensor Status" ; sensor_status_qc:standard_name = "status_flag" ; sensor_status_qc:_FillValue = 0b ; sensor_status_qc:valid_range = 1b, 15b ; sensor_status_qc:flag_masks = 1b, 2b, 12b, 12b, 12b ; sensor_status_qc:flag_values = 1b, 2b, 4b, 8b, 12b ; sensor_status_qc:flag_meanings = "low_battery hardware_fault offline_mode calibration_mode maintenance_mode" ;
In this case, mutually exclusive values are blended with Boolean values to maximize use of the available bits in a flag value.
The table below represents the four binary digits (bits) expressed by the sensor_status_qc
variable in the previous example.
Bit 0 and Bit 1 are Boolean values indicating a low battery condition and a hardware fault, respectively. The next two bits (Bit 2 and Bit 3) express an enumeration indicating abnormal sensor operating modes. Thus, if Bit 0 is set, the battery is low and if Bit 1 is set, there is a hardware fault - independent of the current sensor operating mode.
Bit 3 (MSB) | Bit 2 | Bit 1 | Bit 0 (LSB) |
---|---|---|---|
H/W Fault |
Low Batt |
The remaining bits (Bit 2 and Bit 3) are decoded as follows:
Bit 3 | Bit 2 | Mode |
---|---|---|
0 |
1 |
offline_mode |
1 |
0 |
calibration_mode |
1 |
1 |
maintenance_mode |
The "12b" flag mask is repeated in the sensor_status_qc
flag_masks
definition to explicitly declare the recommended bit field masks to repeatedly AND with the variable value while searching for matching enumerated values.
An application determines if any of the conditions declared in the flag_meanings
list are true
by simply iterating through each of the flag_masks
and AND’ing them with the variable.
When a result is equal to the corresponding flag_values
element, that condition is true
.
The repeated flag_masks
enable a simple mechanism for clients to detect all possible conditions.
4. Coordinate Types
The commonest use of coordinate variables is to locate the data in space and time, but coordinates may be provided for any other continuous geophysical quantity (e.g. density, temperature, radiation wavelength, zenith angle of radiance, sea surface wave frequency) or discrete category (see Section 4.5, "Discrete Axis", e.g. area type, model level number, ensemble member number) on which the data variable depends.
Four types of coordinates receive special treatment by these conventions: latitude, longitude, vertical, and time.
We continue to support the special role that the units
and positive
attributes play in the COARDS convention to identify coordinate type.
As an extension to COARDS, we strongly recommend that a parametric (usually dimensionless) vertical coordinate variable should be associated, via standard_name
and formula_terms
attributes, with its explicit definition, which provides a mapping between its values and dimensional vertical coordinate values that can be uniquely located with respect to a point on the earth’s surface.
Because identification of a coordinate type by its units is complicated by requiring the use of an external package [UDUNITS], we provide two optional methods that yield a direct identification.
The attribute axis
may be attached to a coordinate variable and given one of the values X
, Y
, Z
or T
which stand for a longitude, latitude, vertical, or time axis respectively.
Alternatively the standard_name
attribute may be used for direct identification.
But note that these optional attributes are in addition to the required COARDS metadata.
To identify generic spatial coordinates we recommend that the axis
attribute be attached to these coordinates and given one of the values X
, Y
or Z
.
The values X
and Y
for the axis attribute should be used to identify horizontal coordinate variables.
If both X- and Y-axis are identified, X-Y-up
should define a right-handed coordinate system, i.e. rotation from the positive X direction to the positive Y direction is anticlockwise if viewed from above.
We strongly recommend that coordinate variables be used for all coordinate types whenever they are applicable.
The methods of identifying coordinate types described in this section apply both to coordinate variables and to auxiliary coordinate variables named by the coordinates
attribute (see Chapter 5, Coordinate Systems and Domain).
The values of a coordinate variable or auxiliary coordinate variable indicate the locations of the gridpoints. The locations of the boundaries between cells are indicated by bounds variables (see Section 7.1, "Cell Boundaries").
4.1. Latitude Coordinate
Variables representing latitude must always explicitly include the units
attribute; there is no default value.
The recommended value of the units
attribute is the string degrees_north
. Also accepted are degree_north
, degree_N
, degrees_N
, degreeN
, and degreesN
.
float lat(lat) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; lat:standard_name = "latitude" ;
Application writers should note that the UDUNITS package does not recognize the directionality implied by the "north" part of the unit specification.
It only recognizes its size, i.e., 1 degree is defined to be pi/180 radians.
Hence, determination that a coordinate is a latitude type should be done via a string match between the given unit and one of the acceptable forms of degrees_north
.
Optionally, the latitude type may be indicated additionally by providing the standard_name
attribute with the value latitude
, and/or the axis
attribute with the value Y
.
Coordinates of latitude with respect to a rotated pole should be given units of degrees
, not degrees_north
or equivalents, because applications which use the units to identify axes would have no means of distinguishing such an axis from real latitude, and might draw incorrect coastlines, for instance.
4.2. Longitude Coordinate
Variables representing longitude must always explicitly include the units
attribute; there is no default value.
The recommended value of the units
attribute is the string degrees_east
. Also accepted are degree_east
, degree_E
, degrees_E
, degreeE
, and degreesE
.
float lon(lon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; lon:standard_name = "longitude" ;
Application writers should note that the UDUNITS package has limited recognition of the directionality implied by the "east" part of the unit specification.
It defines degrees_east
to be pi/180 radians, and hence equivalent to degrees_north
.
We recommend the determination that a coordinate is a longitude type should be done via a string match between the given unit and one of the acceptable forms of degrees_east
.
Optionally, the longitude type may be indicated additionally by providing the standard_name
attribute with the value longitude
, and/or the axis
attribute with the value X
.
Coordinates of longitude with respect to a rotated pole should be given units of degrees
, not degrees_east
or equivalents, because applications which use the units to identify axes would have no means of distinguishing such an axis from real longitude, and might draw incorrect coastlines, for instance.
4.3. Vertical (Height or Depth) Coordinate
Variables representing dimensional height or depth axes must always explicitly include the units
attribute; there is no default value.
The direction of positive (i.e., the direction in which the coordinate values are increasing), whether up or down, cannot in all cases be inferred from the units.
The direction of positive is useful for applications displaying the data.
For this reason the attribute positive
as defined in the COARDS standard is required if the vertical axis units are not a valid unit of pressure (as determined by the UDUNITS package [UDUNITS]) — otherwise its inclusion is optional.
The positive
attribute may have the value up
or down
(case insensitive).
This attribute may be applied to either coordinate variables or auxiliary coordinate variables that contain vertical coordinate data.
For example, if an oceanographic netCDF file encodes the depth of the surface as 0 and the depth of 1000 meters as 1000 then the axis would use attributes as follows:
axis_name:units = "meters" ; axis_name:positive = "down" ;
If, on the other hand, the depth of 1000 meters were represented as -1000 then the value of the positive
attribute would have been up
.
If the units
attribute value is a valid pressure unit the default value of the positive
attribute is down
.
A vertical coordinate will be identifiable by:
-
units of pressure; or
-
the presence of the
positive
attribute with a value ofup
ordown
(case insensitive).
Optionally, the vertical type may be indicated additionally by providing the standard_name
attribute with an appropriate value, and/or the axis
attribute with the value Z
.
If both positive
and standard_name
are provided, it is recommended that they should be consistent.
For instance, if a depth of 1000 metres is represented by -1000 and positive
is up
, it would be inconsistent to give the standard_name
as depth
, whose definition (vertical distance below the surface) implies positive down.
If an application detects such an inconsistency, the user should be warned, and the positive
attribute should be used to determine the sign convention.
Recommendations: The positive
attribute should be consistent with the sign convention implied by the definition of the standard_name
, if both are provided.
4.3.1. Dimensional Vertical Coordinate
Variables representing dimensional vertical coordinates for depth or height must always explicitly include the units
attribute.
The acceptable units for a vertical (depth or height) coordinate variable must a UDUNITS [UDUNITS] representation of one of the following:
-
units of pressure. For vertical axes the most commonly used of these include
bar
,millibar
,decibar
,atmosphere (atm)
,pascal (Pa)
, andhPa
. -
units of length. For vertical axes the most commonly used of these include
meter (metre, m)
, andkilometer (km)
. -
other units that may under certain circumstances reference vertical position such as units of density or temperature.
Plural forms are also acceptable.
4.3.2. Dimensionless Vertical Coordinate
The units
attribute is not required for dimensionless coordinates.
For backwards compatibility with COARDS we continue to allow the units
attribute to take one of the values: level
, layer
, or sigma_level
.
These values are not recognized by the UDUNITS package, and are considered a deprecated feature in the CF standard.
4.3.3. Parametric Vertical Coordinate
In some cases dimensional vertical coordinates are a function of horizontal location as well as parameters which depend on vertical location, and therefore cannot be stored in the one-dimensional vertical coordinate variable, which is in most of these cases is dimensionless.
The standard_name
of the parametric (usually dimensionless) vertical coordinate variable can be used to find the definition of the associated computed (always dimensional) vertical coordinate in Appendix D, Parametric Vertical Coordinates.
The definition provides a mapping between the parametric vertical coordinate values and computed values that can positively and uniquely indicate the location of the data.
The formula_terms
attribute can be used to associate terms in the definitions with variables in a netCDF file, and the computed_standard_name
attribute can be used to supply the standard_name
of the computed vertical coordinate values computed according to the definition.
To maintain backwards compatibility with COARDS the use of these attributes is not required, but is strongly recommended.
Some of the definitions may be supplemented with information stored in the grid_mapping
variable about the datum used as a vertical reference (e.g. geoid, other geopotential datum or reference ellipsoid; see Section 5.6, "Horizontal Coordinate Reference Systems, Grid Mappings, and Projections" and Appendix F, Grid Mappings).
float lev(lev) ; lev:long_name = "sigma at layer midpoints" ; lev:positive = "down" ; lev:standard_name = "atmosphere_sigma_coordinate" ; lev:formula_terms = "sigma: lev ps: PS ptop: PTOP" ; lev:computed_standard_name = "air_pressure" ;
In this example the standard_name
value atmosphere_sigma_coordinate
identifies the following definition from Appendix D, Parametric Vertical Coordinates which specifies how to compute pressure at gridpoint (n,k,j,i)
where j
and i
are horizontal indices, k
is a vertical index, and n
is a time index:
p(n,k,j,i) = ptop + sigma(k)*(ps(n,j,i)-ptop)
The formula_terms
attribute associates the variable lev
with the term sigma
, the variable PS
with the term ps
, and the variable PTOP
with the term ptop
.
Thus the pressure at gridpoint (n,k,j,i)
would be calculated by
p(n,k,j,i) = PTOP + lev(k)*(PS(n,j,i)-PTOP)
The computed_standard_name
attribute indicates that the values in variable
p
would have a standard_name
of air_pressure
.
4.4. Time Coordinate
A time coordinate is a number which identifies an instant along the continuous physical dimension of time, whether in reality or in a model.
The instant can equivalently be identified by its datetime, which is a set of numbers comprising year, month, day, hour, minute and second, where the second may have a fraction but the others are all integer.
The time coordinate and the datetime are interconvertible given the calendar
attribute of the time coordinate variable (Section 4.4.2, "Calendar") and its units
attribute (containing the time unit of the coordinate values and the reference datetime, Section 4.4.1, "Time Coordinate Units").
Variables containing time coordinates must always explicitly include the units
attribute, formatted as described in Section 4.4.1, "Time Coordinate Units".
There is no default value for the units
.
A coordinate variable is identifiable as a time coordinate variable from its units
alone.
Optionally, a time coordinate variable may be indicated additionally by providing the standard_name
attribute with an appropriate value, and/or the axis
attribute with the value T
.
double time(time) ; time:axis = "T"; // optional time:standard_name = "time" ; // optional time:units = "days since 1990-1-1 0:0:0" ; // mandatory
4.4.1. Time Coordinate Units
The units
attribute of a time coordinate variable takes a string value that follows the formatting requirements of the [UDUNITS] package (e.g. Example of a time coordinate variable).
It must comprise a unit of measure that is physically equivalent to the SI base unit of time (i.e. the second), followed by the word since
and a reference datetime.
The format of the units
string implies that the time coordinate equals the length of the time interval from the instant identified by the reference datetime to the instant identified by the time coordinate.
This is exactly true in all cases except when leap seconds occur between the two intervals in the standard
, proleptic_gregorian
, and julian
calendars.
See Section 4.4.3, "Leap Seconds".
The acceptable units of measure for time are given by UDUNITS.
The most commonly used of these strings (and their abbreviations) are day
(d
), hour
(hr
, h
), minute
(min
) and second
(sec
, s
).
Plural forms are also acceptable.
UDUNITS defines a year
to be exactly 365.242198781 days (the interval between 2 successive passages of the sun through vernal equinox).
It is not a calendar year. UDUNITS defines a month
to be exactly year/12
, which is not a calendar month.
The CF standard follows UDUNITS in the definition of units, but we recommend that year
and month
should not be used, because of the potential for mistakes and confusion.
UDUNITS defines a minute
as 60 seconds
, an hour
as 3600 seconds
and a day
as 86400 seconds
.
These are not calendar units.
When a leap second is inserted into UTC, the minute, hour and day affected differ by one second from their usual durations according to clock time, but the UDUNITS and CF minute
, hour
and day
do not; they are fixed units of measure.
See also Section 4.4.3, "Leap Seconds".
UDUNITS permits a number of alternatives to the word since
in the units of time coordinates.
All the alternatives have exactly the same meaning in UDUNITS.
For compatibility with other software, CF strongly recommends that since
should be used.
The reference datetime string (appearing after the identifier since
) is required.
It may include date alone, or date and time, or date, time and time zone offset.
Its format is y-m-d [H:M:S [Z]], where […] indicates an optional element,
-
y is year, m month, d day, H hour and M minute, which are all integers of one or more digits, and y may be prefixed with a sign (but note that some CF calendars do not permit negative years; see Section 4.4.2, "Calendar"),
-
S is second, which may be integer or floating point (see Section 4.4.3, "Leap Seconds" regarding S>59),
-
Z is the time zone offset with respect to UTC. This is an interval of time, specified in one of the formats described below. Only numbers (digits,
+
,-
and:
) are allowed in Z, not time zone names or acronyms.
The default time zone offset is zero.
In a time zone with zero offset, time (approximately) equals mean solar time for 0 degrees_east
of longitude.
(Although this may be exact in a model, in reality the time with zero time zone offset differs by some seconds from mean solar time; see the discussion of UTC and leap seconds in Section 4.4.2, "Calendar".)
If both time and time zone offset are omitted the time is 00:00:00 (the beginning of the day i.e. midnight at 0 degrees_east
).
Thus, units = "days since 1990-1-1"
means the same as units = "days since 1990-1-1 0:0:0"
.
The time zone offset Z must be in one of the following four formats, any of which may be prefixed with a sign:
-
H, the hour alone, of one or two digits e.g.
-6
,2
,+11
, which is sufficient for many time zones. -
H:M, where H is hour and M minute, each of one or two digits, e.g.
5:30
. -
four digits, of which the first pair are the hours and the second the minutes e.g.
0530
. -
three digits, of which the first is the hour (0—9) e.g.
530
.
For example, seconds since 1992-10-8 15:15:42.5 -6:00
indicates seconds since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon, in a time zone where the datetime is six hours behind the default.
Subtracting the time zone offset from a given datetime converts it to the equivalent datetime with zero time zone offset e.g. 1989-12-31 18:00:00 -6
identifies the same instant as 1990-1-1 0:0:0
.
4.4.2. Calendar
The calendar defines the set of valid datetimes and their order.
Note that the CF meaning of "calendar" refers to datetimes, not to dates alone.
Datetimes which are not permitted in a given calendar are prohibited both in the time coordinate values and in the reference datetime string in the units
.
It is recommended that the calendar be specified by the calendar
attribute of the time coordinate variable.
The values currently defined for calendar
are listed below.
Because the calendars have different sets of valid dates, and different treatments of leap seconds (see below in this section, and Section 4.4.3, "Leap Seconds"), a given time coordinate value with given units
can represent different datetimes in different calendars; conversely, a given datetime is represented by different time coordinate values in different calendars.
Moreover, in different calendars a given datetime can identify a different instant in the continuous physical dimension of time.
The lengths of the months in the Gregorian calendar are used in all calendars except 360_day
, none
(see Section 4.4.4, "Time Coordinates with no Annual Cycle") and explicitly defined calendars (see Section 4.4.5, "Explicitly Defined Calendar").
The calendars differ in their treatment of leap years (when there are 29 days in February instead of 28).
Leap seconds are adjustments made at irregular and unpredictable intervals in Coordinated Universal Time (UTC).
In response to slight variations in the Earth’s rotation speed, positive or negative leap seconds are inserted in order to keep UTC close to mean solar time at 0 degrees_east
i.e. the time zone with the default (zero) time zone offset in UDUNITS and CF (see Section 4.4.1, "Time Coordinate Units").
When a single positive leap second is introduced at the end of a minute, that minute contains 61 seconds.
The net number of leap seconds added to UTC between 1958-1-1 and 2025-1-1 is 37.
The CF calendars differ in their treatment of leap seconds (see Section 4.4.3, "Leap Seconds").
In the julian
and the default standard
calendar, dates in years before year 0 (i.e. before 0-1-1 0:0:0) are not allowed, and the year in the reference datetime of the units must not be negative.
In these calendars, year zero has a special use to indicate a climatology (see Section 7.4, "Climatological Statistics"), but this use of year zero is deprecated.
In other calendars, year 0 is the year before year 1, and negative years are allowed.
standard
-
Mixed Gregorian/Julian calendar as defined by UDUNITS. This is the default. A deprecated alternative name for this calendar is
gregorian
. The Gregorian and Julian calendars have the same lengths of their months; they differ only in respect of the rules that decide which years are leap years. In thestandard
calendar, datetimes after and including 1582-10-15 0:0:0 are in the Gregorian calendar, in which a year is a leap year if either (i) it is divisible by 4 but not by 100 or (ii) it is divisible by 400. Datetimes before (and excluding) 1582-10-5 0:0:0 are in the Julian calendar, in which any year that is divisible by 4 is a leap year. Year 1 AD or CE in thestandard
calendar is also year 1 of thejulian
calendar. Negative years are invalid in time coordinates and reference datetimes in thestandard
calendar. In thestandard
calendar, 1582-10-15 0:0:0 is exactly 1 day later than 1582-10-4 0:0:0. Therefore datetimes in the range from (and including) 1582-10-5 0:0:0 until (but excluding) 1582-10-15 0:0:0 are invalid, and must not be used as reference inunits
. It is recommended that a reference datetime before the discontinuity should not be used for datetimes after the discontinuity, and vice-versa. See also Section 4.4.3, "Leap Seconds". proleptic_gregorian
-
A calendar with the Gregorian rules for leap years extended to dates before 1582-10-15. All dates consistent with these rules are allowed, both before and after 1582-10-15 0:0:0. See also Section 4.4.3, "Leap Seconds".
julian
-
Julian calendar, in which a year is a leap year if it is divisible by 4, even if it is also divisible by 100. Year 1 AD or CE in the
julian
calendar is also year 1 of thestandard
calendar. Negative years are invalid in time coordinates and reference datetimes in thejulian
calendar. See also Section 4.4.3, "Leap Seconds". utc
-
A Gregorian calendar with leap seconds as prescribed by UTC. Datetimes before 1958-01-01 0:0:0 are not allowed in this calendar. Datetimes in the future are not allowed in this calendar, because it is unknown when future leap seconds will occur. When a datetime is converted to a time coordinate value or vice-versa in this calendar, any leap seconds (positive or negative) must be counted that occurred in the interval between the datetime and the reference datetime in the
units
. For any given instant, theutc
datetime is behind thetai
datetime, where "behind" means the same as it does when describing a timezone to the west as being behind one to the east. The difference between the two datetimes for a given instant of time is the net number of leap seconds introduced since 1958-01-01. The difference was zero on that instant, when both calendars began. This means that a given datetime in theutc
calendar represents an instant that is earlier than the same datetime in thetai
calendar. See also Section 4.4.3, "Leap Seconds". tai
-
A Gregorian calendar without leap seconds that is based on International Atomic Time (TAI). Datetimes before 1958-01-01 0:0:0 are not allowed in this calendar. For any given instant, the
tai
datetime is ahead of theutc
datetime, where "ahead" means the same as it does when describing a timezone to the east as being ahead of one to the west. The difference between the two datetimes for a given instant of time is the net number of leap seconds introduced since 1958-01-01. The difference was zero on that instant, when both calendars began. This means that a given datetime in thetai
calendar represents an instant that is later than the same datetime in theutc
calendar. See also Section 4.4.3, "Leap Seconds". noleap
or365_day
-
A calendar with no leap years, i.e., all years are 365 days long, and there are no leap seconds.
all_leap
or366_day
-
A calendar in which every year is a leap year, i.e., all years are 366 days long, and there are no leap seconds.
360_day
-
A calendar in which all years are 360 days, and divided into 30 day months, and there are no leap seconds.
none
-
To be used when there is no annual cycle. See Section 4.4.4, "Time Coordinates with no Annual Cycle".
Any other value may be given to the calendar
attribute to describe an explicitly defined calendar. See Section 4.4.5, "Explicitly Defined Calendar".
4.4.3. Leap Seconds
This section describes how to deal properly with leap seconds. Most people ignore the existence of leap seconds, including many data producers and the CF standard before version 1.12. As a result, the time coordinates of two real-world observational datasets could disagree by some number of seconds if one has taken leap seconds into account and the other has not. Practically speaking, this means that if you are working with real-world data, and if it’s important for your time coordinates to be accurate to the second, you need to care about leap seconds. Otherwise, you need only to be aware that the difference between two time coordinates might not exactly equal the duration of the time interval between the two instants, but could be inaccurate by a number of seconds, if leap seconds are involved. Relatedly, two instants with the same time of day on different days, which would always be separated by a multiple of 86400 seconds if there were no leap seconds, will have a few more seconds between them if leap seconds intervene.
Each calendar defines a set of valid combinations of the six numbers year-month-day-hour-minute-second. We refer to this set as the calendar’s "set of datetimes". Fractions of seconds are allowed in all calendars in addition to the integer number of seconds. In this section, we use the word timeline to mean "continuous physical dimension of time". The valid datetimes identify discrete instants along the timeline, in that sense.
You need to know the set of datetimes defined by the calendar in order to compute time coordinate values from datetimes and vice-versa.
Ignoring fractional seconds in datetimes, a time coordinate value expressed in seconds equals the number of valid (integer-second) datetimes after (not including) the reference datetime in the units
up to (and including) the datetime that the time coordinate represents.
For instance, in units
of seconds since 2024-9-14 11:12:00
, the time coordinate for the datetime 2024-9-14 11:12:03
is 3
, because there are three datetimes (2024-9-14 11:12:01
, 2024-9-14 11:12:02
, 2024-9-14 11:12:03
) following 2024-9-14 11:12:00
up to and including 2024-9-14 11:12:03
.
The coordinate for 2024-9-14 11:11:58
is -2
, because there are two valid datetimes (2024-9-14 11:11:59
, 2024-9-14 11:11:58
) from 2024-9-14 11:12:00
to (and including) 2024-9-14 11:11:58
, and the count is negative because it goes backwards.
The signed difference between the fractional seconds of the datetime and the reference is added to the time coordinate after counting the seconds.
This paragraph may appear to be excessively elaborate in describing a usually obvious procedure, but it is necessary to be very careful about it when there are leap seconds.
The utc
calendar is the only calendar which includes leap seconds in its set of datetimes.
In all other calendars, datetimes within leap seconds are not valid.
Therefore reference datetimes in the units
attribute must not contain seconds equal to or greater than 60 unless the calendar
is utc
.
The standard
, proleptic_gregorian
, and julian
calendars each have two variants.
In one variant the timeline does not include leap seconds.
In the other variant, the timeline includes leap seconds, even though they are not included in the valid set of datetimes.
To resolve the ambiguity between the variants of these calendars, the units_metadata
attribute should be defined as well as the calendar
attribute, as described later in this section.
For standard
, proleptic_gregorian
, and julian
calendars, there are the following cases:
-
The calendar is being used for a timeline in which leap seconds do not exist. This is the case for a model simulation that defines every day as having a constant length of 86400 seconds.
-
The calendar is being used for a timeline in which leap seconds exist, and they are correctly accounted for in the datetimes represented by the time coordinates. This could be the case for observations from a platform with equipment which records UTC datetimes and has prior knowledge of when new leap seconds are to be introduced, so that it is able to apply a new leap second at the appropriate time. It could equally be the case for model whose timesteps include leap seconds.
-
The calendar is being used for a timeline in which leap seconds exist, but some or all leap seconds might not have not been correctly accounted for in the datetimes. This could be the case for observations from a platform whose time recording equipment has a delay in applying a new leap second.
-
It may be unknown whether leap seconds exist in the timeline.
Except in the utc
calendar, when a time coordinate value is calculated from a datetime, or the reverse, it is assumed that the coordinate value increases by exactly 60 seconds from the start of any minute (identified by year, month, day, hour, minute, all being integers) to the start of the next minute, because leap seconds are not valid datetimes.
In other words, leap seconds (positive or negative) are never counted in the standard
, proleptic_gregorian
, and julian
calendars.
When these calendars are being be used for timelines with leap seconds (i.e. cases 2 and 3 and perhaps case 4), the assumption of 60-second minutes has the following consequences:
-
It is impossible to identify any instant during a leap second (i.e. between the end of the 60th second of the last minute of one hour and the start of the first second of the next hour) by a time coordinate e.g.
2016-12-31 23:59:60.5
cannot be represented by a time coordinate value. -
A datetime in the excluded range must not be used as a reference datetime e.g.
seconds since 2016-12-31 23:59:60
is not a permitted value forunits
. -
The coordinate value does not count any leap seconds which occurred between the reference datetime and the datetime represented by the coordinate. For instance, 60
seconds after 23:59:00
always means 00:00:00 on the next day, even if there is a leap second at 23:59:60, which makes the actual interval 61 seconds between 23:59:00 and 00:00:00 on the next day.
Because of the last point, the difference between two coordinate values with the same units
string does not exactly equal the length of the interval between instants they represent if there were any leap seconds between them.
This discrepancy can happen in cases 2, 3 and 4 of the standard
, proleptic_gregorian
, and julian
calendars.
By contrast, in case 1 of those calendars (i.e. a timeline without leap seconds), and in all other calendars, the difference between two time coordinate values with the same units
string is always equal to the length of time between the instants they represent.
Furthermore, an inaccuracy results from converting a time coordinate to a datetime if the interval includes leap seconds which were not known when the time coordinate was calculated (possible in case 3 or 4).
It is important to be aware of these disadvantages of the standard
, proleptic_gregorian
and julian
calendars when used with timelines including leap seconds.
If it is essential for leap seconds to be counted in time coordinates, so that they exactly equal time intervals, you must use the utc
calendar.
For many applications of the standard
, proleptic_gregorian
, and julian
calendars, these inaccuracies are too small to matter, but there are some applications where it is necessary to know about them.
Therefore it is recommended that for the standard
, proleptic_gregorian
, and julian
calendars the appropriate treatment of leap seconds should be indicated by giving the time coordinate variable a units_metadata
attribute containing a leap_seconds
keyword with one of the permitted values none
, utc
or unknown
.
none
means that leap seconds do not exist in the timeline (i.e. case 1), utc
means that leap seconds exist in the timeline and the time coordinates correctly represent the datetimes (i.e. case 2), and unknown
means that the data-writer did not know or did not record whether the leap seconds exist in the timeline, nor how they are treated if they did exist (i.e. cases 3 and 4).
If the units_metadata
attribute is not present, or does not contain the leap_seconds
keyword, the data-reader should assume leap_seconds: unknown
.
A variable’s units_metadata
attribute may only contain the leap_seconds
keyword if the variable’s calendar is one of standard
, proleptic_gregorian
, or julian
.
units_metadata
and calendar
to define the treatment of leap secondsvariables: float time_tai ; time_tai:standard_name = "time" ; time_tai:long_name = "Satellite data" ; time_tai:calendar = "tai" ; time_tai:units = "seconds since 2016-12-31 23:59:58" ; float time_stdnone ; time_stdnone:standard_name = "time" ; time_stdnone:long_name = "Model data with no leap seconds" ; time_stdnone:calendar = "standard" ; time_stdnone:units = "seconds since 2016-12-31 23:59:58" ; time_stdnone:units_metadata = "leap_seconds: none" ; float time_stdutc ; time_stdutc:standard_name = "time" ; time_stdutc:long_name = "Model data with leap seconds or obs data with accurate UTC" ; time_stdutc:calendar = "standard" ; time_stdutc:units = "seconds since 2016-12-31 23:59:58" ; time_stdutc:units_metadata = "leap_seconds: utc" ; float time_utc ; time_utc:standard_name = "time" ; time_utc:long_name = "Time signal from UK National Physical Laboratory" ; time_utc:calendar = "utc" ; time_utc:units = "seconds since 2016-12-31 23:59:58" ; float time_unknown ; time_unknown:standard_name = "time" ; time_unknown:long_name = "Obs data with unreliable information on leap seconds" ; time_unknown:calendar = "standard" ; time_unknown:units = "seconds since 2016-12-31 23:59:58" ; time_unknown:units_metadata = "leap_seconds: unknown" ; data: // time coordinate variable and the datetime it represents time_tai = 2; // 2017-1-1 0:0:0 because no leap seconds in the timeline time_stdnone = 2; // 2017-1-1 0:0:0 because no leap seconds in the timeline time_stdutc = 2; // 2017-1-1 0:0:0 because the leap second is not counted time_utc = 2; // leap second 2016-12-31 23:59:60 time_unknown = 2; // unknown whether 2016-12-31 23:59:60 or 2017-1-1 0:0:0
This example shows five scalar time coordinate variables.
Although they all have the value 2 and the same units
attribute, they do not all refer to the same datetime, as shown in the comments on their data values, because they have different treatments of the leap second that was added to the UTC calendar at the end of 2016.
The first four of them correspond to the instants marked 2 seconds since 2016-12-31 23:59:58
in Figure 4.1.
The value of 2
seconds for time_stdnone
, time_utc
and time_tai
can be correctly interpreted as the length of the interval from the reference datetime 2016-12-31 23:59:58 to the datetime indicated in the comment.
In both time_stdnone
and time_stdutc
, the time coordinate represents 2017-1-1 0:0:0, because 2016-12-31 23:59:60 is not permitted in the standard
calendar, hence only two valid datetimes with integer seconds are counted (2016-12-31 23:59:59 and 2017-1-1 0:0:0).
However, the timeline for time_stdutc
does include the leap second, so the time interval from the reference datetime 2016-12-31 23:59:58 to 2017-1-1 0:0:0 is actually three seconds, not two as indicated by the time coordinate value.
This is an example of the standard
calendar not counting a leap second in the coordinate value, with the consequence that the difference between time coordinates does not exactly equal the duration of the interval.
An application may choose either to ignore this inaccuracy or to correct for it when calculating the length of intervals which include the leap second.
In the case of time_unknown
, we cannot convert the time coordinate to a datetime with certainty, because we do not know whether 2017-1-1 0:0:0 is two or three seconds after 2016-12-31 23:59:58.
units="seconds since 2016-12-31 23:59:58"
for various choices of the calendar
attribute and leap_seconds
keyword.This illustration shows that a given time coordinate value (the numbers in columns at the bottom right) can represent different datetimes in different calendars. However, the illustration cannot show another important point to keep in mind, that a given datetime may identify different instants in different calendars.
The diagonal lines depict the timelines of the calendars.
Along each line, a filled circle marks the instant on the timeline that begins each second in the set of datetimes allowed by the calendar.
There is no meaning in the slight left-right displacement of the circles at each second, which is done only so they can all be seen; they are supposed to be exactly coincident.
As explained in the text of this section, the time coordinate in seconds is the count of valid datetimes (= the number of circles) that occur along the timeline after the reference datetime 2016-12-31 23:59:58
(which is the first circle on the line in every case, hence with a count of zero as shown in the column below its group of circles), up to and including the datetime represented.
The instants marked 2 seconds since 2016-12-31 23:59:58
are the ones represented by the first four time coordinate variables of Example 4.5.
A leap second was added to the UTC calendar at the end of 2016.
The duration of the leap second is shown by the shading.
The utc
calendar is the only one in which datetimes in the leap second are valid; hence the black circle is the only marker of 2016-12-31 23:59:60
.
The grey timeline of the utc
variant of the standard
calendar includes the the leap second as well, but datetimes in the leap second are not valid in that calendar, so there is no grey circle for it.
The leap second does not appear in the timelines of the tai
calendar and the none
variant of the standard
calendar.
Their timelines (red and purple) skip over the leap second, and they have no circle for it.
For those timelines, please imagine the digram having the shaded rectangle cut out, and the cut edges joined, making the red and purple lines continuous, passing smoothly from 2016-12-31 23:59:00 to 2017-1-1 00:00:00 as for all the other seconds.
4.4.4. Time Coordinates with no Annual Cycle
The calendar
attribute may be set to none
in climate experiments that simulate a fixed time of year.
The time of year is indicated by the date in the reference time of the units
attribute.
The time coordinates that might apply in a perpetual July experiment are given in the following example.
variables: double time(time) ; time:long_name = "time" ; time:units = "days since 1-7-15 0:0:0" ; time:calendar = "none" ; data: time = 0., 1., 2., ...;
Here, all days simulate the conditions of 15th July, so it does not make sense to give them different dates. The time coordinates are interpreted as 0, 1, 2, etc. days since the start of the experiment.
4.4.5. Explicitly Defined Calendar
If none of the calendars defined in Section 4.4.2, "Calendar" applies (e.g., calendars appropriate to a different paleoclimate era), a calendar can be explicitly defined, in terms of permissible year-month-day combinations.
To do this, the lengths of each month are explicitly defined with the month_lengths
attribute of the time axis:
month_lengths
-
A vector of size 12, specifying the number of days in the months from January to December (in a non-leap year).
If leap years are included, then two other attributes of the time axis must also be defined:
leap_year
-
An example of a leap year. It is assumed that all years that differ from this year by a multiple of four are also leap years. If this attribute is absent, it is assumed there are no leap years.
leap_month
-
A value in the range 1-12, specifying which month is lengthened by a day in leap years (1=January). If this attribute is not present, February (2) is assumed. This attribute is ignored if
leap_year
is not specified.
When an explicitly defined calendar is being used, the calendar may be described by giving a value not defined in Section 4.4.2, "Calendar" to the calendar
attribute; alternatively, the attribute may be omitted.
double time(time) ; time:long_name = "time" ; time:units = "days since 1-1-1 0:0:0" ; time:calendar = "126 kyr B.P." ; time:month_lengths = 34, 31, 32, 30, 29, 27, 28, 28, 28, 32, 32, 34 ;
4.5. Discrete Axis
The spatiotemporal coordinates described in sections 4.1-4.4 are continuous variables, and other geophysical quantities may likewise serve as continuous coordinate variables, for instance density, temperature or radiation wavelength. By contrast, for some purposes there is a need for an axis of a data variable which indicates either an ordered list or an unordered collection, and does not correspond to any continuous coordinate variable. Consequently such an axis may be called “discrete”. A discrete axis has a dimension but might not have a coordinate variable. Instead, there might be one or more auxiliary coordinate variables with this dimension (see preamble to section 5). Following sections define various applications of discrete axes, for instance section 6.1.1 “Geographical regions”, section 7.3.3 “Statistics applying to portions of cells”, section 9.3 “Representation of collections of features in data variables”.
5. Coordinate Systems and Domain
A data variable’s dimensions are used to locate data values in time and space or as a function of other independent variables. This is accomplished by associating these dimensions with the relevant set of latitude, longitude, vertical, time and any non-spatiotemporal coordinates. This section presents two methods for making that association: the use of coordinate variables, and the use of auxiliary coordinate variables.
Any of a variable’s dimensions that is an independently varying latitude, longitude, vertical, or time dimension (see Section 1.3, "Terminology") and that has a size greater than one must have a corresponding coordinate variable, i.e., a one-dimensional variable with the same name as the dimension (see examples in Chapter 4, Coordinate Types). This is the only method of associating dimensions with coordinates that is supported by [COARDS].
Any longitude, latitude, vertical or time coordinate which depends on more than one spatiotemporal dimension must be identified by the coordinates
attribute of the data variable.
The value of the coordinates
attribute is a blank separated list of the names of auxiliary coordinate variables.
There is no restriction on the order in which the auxiliary coordinate variables appear in the coordinates
attribute string.
The dimensions of an auxiliary coordinate variable must be a subset of the dimensions of the variable with which the coordinate is associated, with three exceptions.
First, string-valued coordinates (Section 6.1, "Labels") will have a dimension for maximum string length if the coordinate variable has a type of char
rather than a type of string
.
Second, if an auxiliary coordinate variable of a data variable that has been compressed by gathering (Section 8.2, "Lossless Compression by Gathering") does not span the compressed dimension, then its dimensions may be any subset of the data variable’s uncompressed dimensions, i.e. any of the dimensions of the data variable except the compressed dimension, and any of the dimensions listed by the compress
attribute of the compressed coordinate variable.
Third, in the ragged array representations of data (Chapter 9, Discrete Sampling Geometries), special methods are needed to connect the data and coordinates.
We recommend that the name of a multidimensional coordinate variable should not match the name of any of its dimensions because that precludes supplying a coordinate variable for the dimension. This practice also avoids potential bugs in applications that determine coordinate variables by only checking for a name match between a dimension and a variable and not checking that the variable is one dimensional.
If the longitude, latitude, vertical or time coordinate is multi-valued, varies in only one dimension, and varies independently of other spatiotemporal coordinates, it is not permitted to store it as an auxiliary coordinate variable.
This is both to enhance conformance to COARDS and to facilitate the use of generic applications that recognize the [NUG] convention for coordinate variables.
An application that is trying to find the latitude coordinate of a variable should always look first to see if any of the variable’s dimensions correspond to a latitude coordinate variable.
If the latitude coordinate is not found this way, then the auxiliary coordinate variables listed by the coordinates
attribute should be checked.
Note that it is permissible, but optional, to list coordinate variables as well as auxiliary coordinate variables in the coordinates
attribute.
If the longitude, latitude, vertical or time coordinate is single-valued, it may be stored either as a coordinate variable with a dimension of size one, or as a scalar coordinate variable (Section 5.7, "Scalar Coordinate Variables").
If an axis
attribute is attached to an auxiliary coordinate variable, it can be used by applications in the same way the axis
attribute attached to a coordinate variable is used.
However, it is not permissible for a data variable to have both a coordinate variable and an auxiliary coordinate variable, or more than one of either type of variable, having an axis
attribute with any given value e.g. there must be no more than one axis
attribute for X
for any data variable.
Note that if the axis
attribute is not specified for an auxiliary coordinate variable, it may still be possible to determine if it is a spatiotemporal dimension from its own units or standard_name
, or from the units and standard_name
of the coordinate variable corresponding to its dimensions (see Chapter 4, Coordinate Types).
For instance, auxiliary coordinate variables which lie on the horizontal surface can be identified as such by their dimensions being horizontal.
Horizontal dimensions are those whose coordinate variables have an axis
attribute of X
or Y
, or a units
attribute indicating latitude and longitude.
To geo-reference data horizontally with respect to the Earth, a grid mapping variable may be provided by the data variable, using the grid_mapping
attribute.
If the coordinate variables for a horizontal grid are not longitude and latitude, then a grid_mapping variable provides the information required to derive longitude and latitude values for each grid location.
If no grid mapping variable is referenced by a data variable, then longitude and latitude coordinate values shall be supplied in addition to the required coordinates.
For example, the Cartesian coordinates of a map projection may be supplied as coordinate variables and, in addition, two-dimensional latitude and longitude variables may be supplied via the coordinates
attribute on a data variable.
The use of the axis
attribute with values X
and Y
is recommended for the coordinate variables (see Chapter 4, Coordinate Types).
It is sometimes not practical to specify the latitude-longitude location of data which is representative of geographic regions with complex boundaries. For this purpose, provision is made in Section 6.1.1, "Geographic Regions" for indicating the region by a standardized name.
5.1. Independent Latitude, Longitude, Vertical, and Time Axes
When each of a variable’s spatiotemporal dimensions is a latitude, longitude, vertical, or time dimension, then each axis is identified by a coordinate variable.
dimensions: lat = 18 ; lon = 36 ; pres = 15 ; time = 4 ; variables: float xwind(time,pres,lat,lon) ; xwind:long_name = "zonal wind" ; xwind:units = "m/s" ; float lon(lon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(lat) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; float pres(pres) ; pres:long_name = "pressure" ; pres:units = "hPa" ; double time(time) ; time:long_name = "time" ; time:units = "days since 1990-1-1 0:0:0" ;
xwind(n,k,j,i)
is associated with the coordinate values lon(i)
, lat(j)
, pres(k)
, and time(n)
.
5.2. Two-Dimensional Latitude, Longitude, Coordinate Variables
The latitude and longitude coordinates of a horizontal grid that was not defined as a Cartesian product of latitude and longitude axes, can sometimes be represented using two-dimensional coordinate variables.
These variables are identified as coordinates by use of the coordinates
attribute.
dimensions: xc = 128 ; yc = 64 ; lev = 18 ; variables: float T(lev,yc,xc) ; T:long_name = "temperature" ; T:units = "K" ; T:coordinates = "lon lat" ; float xc(xc) ; xc:axis = "X" ; xc:long_name = "x-coordinate in Cartesian system" ; xc:units = "m" ; float yc(yc) ; yc:axis = "Y" ; yc:long_name = "y-coordinate in Cartesian system" ; yc:units = "m" ; float lev(lev) ; lev:long_name = "pressure level" ; lev:units = "hPa" ; float lon(yc,xc) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(yc,xc) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ;
T(k,j,i)
is associated with the coordinate values lon(j,i)
, lat(j,i)
, and lev(k)
.
The vertical coordinate is represented by the coordinate variable lev(lev)
and the latitude and longitude coordinates are represented by the auxiliary coordinate variables lat(yc,xc)
and lon(yc,xc)
which are identified by the coordinates
attribute.
Note that coordinate variables are also defined for the xc
and yc
dimensions.
This faciliates processing of this data by generic applications that don’t recognize the multidimensional latitude and longitude coordinates.
5.3. Reduced Horizontal Grid
A "reduced" longitude-latitude grid is one in which the points are arranged along constant latitude lines with the number of points on a latitude line decreasing toward the poles.
Storing this type of gridded data in two-dimensional arrays wastes space, and results in the presence of missing values in the 2D coordinate variables.
We recommend that this type of gridded data be stored using the compression scheme described in Section 8.2, "Lossless Compression by Gathering".
Compression by gathering preserves structure by storing a set of indices that allows an application to easily scatter the compressed data back to two-dimensional arrays.
The compressed latitude and longitude auxiliary coordinate variables are identified by the coordinates
attribute.
dimensions: londim = 128 ; latdim = 64 ; rgrid = 6144 ; variables: float PS(rgrid) ; PS:long_name = "surface pressure" ; PS:units = "Pa" ; PS:coordinates = "lon lat" ; float lon(rgrid) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(rgrid) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; int rgrid(rgrid); rgrid:compress = "latdim londim";
PS(n)
is associated with the coordinate values lon(n)
, lat(n)
.
Compressed grid index (n)
would be assigned to 2D index (j,i)
(C index conventions) where
j = rgrid(n) / 128 i = rgrid(n) - 128*j
Notice that even if an application does not recognize the compress
attribute, the grids stored in this format can still be handled, by an application that recognizes the coordinates
attribute.
5.4. Timeseries of Station Data
This section has been superseded by the treatment of time series as a type of discrete sampling geometry in Chapter 9.
5.5. Trajectories
This section has been superseded by the treatment of time series as a type of discrete sampling geometry in Chapter 9.
5.6. Horizontal Coordinate Reference Systems, Grid Mappings, and Projections
A grid mapping variable may be referenced by a data variable in order to explicitly declare the coordinate reference system (CRS) used for the horizontal spatial coordinate values. For example, if the horizontal spatial coordinates are latitude and longitude, the grid mapping variable can be used to declare the figure of the earth (WGS84 ellipsoid, sphere, etc.) they are based on. If the horizontal spatial coordinates are easting and northing in a map projection, the grid mapping variable declares the map projection CRS used and provides the information needed to calculate latitude and longitude from easting and northing.
When the horizontal spatial coordinate variables are not longitude and latitude, it is required that further information is provided to geo-locate the horizontal position. A grid mapping variable provides this information.
If no grid mapping variable is provided and the coordinate variables for a horizontal grid are not longitude and latitude, then it is required that the latitude and longitude coordinates are supplied via the coordinates
attribute.
Such coordinates may be provided in addition to the provision of a grid mapping variable, but that is not required.
A grid mapping variable provides the description of the mapping via a collection of attached attributes.
It is of arbitrary type since it contains no data.
Its purpose is to act as a container for the attributes that define the mapping.
The one attribute that all grid mapping variables must have is grid_mapping_name
, which takes a string value that contains the mapping’s name.
The other attributes that define a specific mapping depend on the value of grid_mapping_name
.
The valid values of grid_mapping_name
along with the attributes that provide specific map parameter values are described in Appendix F, Grid Mappings.
The grid mapping variables are associated with the data and coordinate variables by the grid_mapping
attribute.
This attribute is attached to data variables so that variables with different mappings may be present in a single file.
The attribute takes a string value with two possible formats.
In the first format, it is a single word, which names a grid mapping variable.
In the second format, it is a blank-separated list of words <gridMappingVariable>: <coordinatesVariable> [<coordinatesVariable> …] [<gridMappingVariable>: <coordinatesVariable>…]
, which identifies one or more grid mapping variables, and with each grid mapping associates one or more coordinatesVariables, i.e. coordinate variables or auxiliary coordinate variables.
Where an extended <gridMappingVariable>: <coordinatesVariable> [<coordinatesVariable>]
entity is defined, then the order of the <coordinatesVariable>
references within the definition provides an explicit order for these coordinate value variables, which is used if they are to be combined into individual coordinate tuples.
This order is only significant if crs_wkt
is also specified within the referenced grid mapping variable.
Explicit 'axis order' is important when the grid mapping variable contains an attribute crs_wkt
as it is mandated by the OGC CRS-WKT standard that coordinate tuples with correct axis order are provided as part of the reference to a Coordinate Reference System.
Using the simple form, where the grid_mapping
attribute is only the name of a grid mapping variable, 2D latitude and longitude coordinates for a projected coordinate reference system use the same geographic coordinate reference system (ellipsoid and prime meridian) as the projection is projected from.
The grid_mapping
variable may identify datums (such as the reference ellipsoid, the geoid or the prime meridian) for horizontal or vertical coordinates.
Therefore a grid mapping variable may be needed when the coordinate variables for a horizontal grid are longitude and latitude.
The grid_mapping_name
of latitude_longitude
should be used in this case.
The expanded form of the grid_mapping
attribute is required if one wants to store coordinate information for more than one coordinate reference system.
In this case each coordinate or auxiliary coordinate is defined explicitly with respect to no more than one grid_mapping
variable.
This syntax may be used to explicitly link coordinates and grid mapping variables where only one coordinate reference system is used.
In this case, all coordinates and auxiliary coordinates of the data variable not named in the grid_mapping
attribute are unrelated to any grid mapping variable.
All coordinate names listed in the grid_mapping
attribute must be coordinate variables or auxiliary coordinates of the data variable.
In order to make use of a grid mapping to directly calculate latitude and longitude values it is necessary to associate the coordinate variables with the independent variables of the mapping.
This is done by assigning a standard_name
to the coordinate variable.
The appropriate values of the standard_name
depend on the grid mapping and are given in Appendix F, Grid Mappings.
dimensions: rlon = 128 ; rlat = 64 ; lev = 18 ; variables: float T(lev,rlat,rlon) ; T:long_name = "temperature" ; T:units = "K" ; T:coordinates = "lon lat" ; T:grid_mapping = "rotated_pole" ; char rotated_pole ; rotated_pole:grid_mapping_name = "rotated_latitude_longitude" ; rotated_pole:grid_north_pole_latitude = 32.5 ; rotated_pole:grid_north_pole_longitude = 170. ; float rlon(rlon) ; rlon:long_name = "longitude in rotated pole grid" ; rlon:units = "degrees" ; rlon:standard_name = "grid_longitude"; float rlat(rlat) ; rlat:long_name = "latitude in rotated pole grid" ; rlat:units = "degrees" ; rlat:standard_name = "grid_latitude"; float lev(lev) ; lev:long_name = "pressure level" ; lev:units = "hPa" ; float lon(rlat,rlon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(rlat,rlon) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ;
A CF compliant application can determine that rlon and rlat are longitude and latitude values in the rotated grid by recognizing the standard names grid_longitude
and grid_latitude
.
Note that the units of the rotated longitude and latitude axes are given as degrees
.
This should prevent a COARDS compliant application from mistaking the variables rlon
and rlat
to be actual longitude and latitude coordinates.
The entries for these names in the standard name table indicate the appropriate sign conventions for the units of degrees
.
dimensions: y = 228; x = 306; time = 41; variables: int Lambert_Conformal; Lambert_Conformal:grid_mapping_name = "lambert_conformal_conic"; Lambert_Conformal:standard_parallel = 25.0; Lambert_Conformal:longitude_of_central_meridian = 265.0; Lambert_Conformal:latitude_of_projection_origin = 25.0; double y(y); y:units = "km"; y:long_name = "y coordinate of projection"; y:standard_name = "projection_y_coordinate"; double x(x); x:units = "km"; x:long_name = "x coordinate of projection"; x:standard_name = "projection_x_coordinate"; double lat(y, x); lat:units = "degrees_north"; lat:long_name = "latitude coordinate"; lat:standard_name = "latitude"; double lon(y, x); lon:units = "degrees_east"; lon:long_name = "longitude coordinate"; lon:standard_name = "longitude"; int time(time); time:long_name = "forecast time"; time:units = "hours since 2004-06-23T22:00:00Z"; float Temperature(time, y, x); Temperature:units = "K"; Temperature:long_name = "Temperature @ surface"; Temperature:missing_value = 9999.0; Temperature:coordinates = "lat lon"; Temperature:grid_mapping = "Lambert_Conformal";
An application can determine that x
and y
are the projection coordinates by recognizing the standard names projection_x_coordinate
and projection_y_coordinate
.
The grid mapping variable Lambert_Conformal
contains the mapping parameters as attributes, and is associated with the Temperature
variable via its grid_mapping
attribute.
dimensions: lat = 18 ; lon = 36 ; variables: double lat(lat) ; double lon(lon) ; float temp(lat, lon) ; temp:long_name = "temperature" ; temp:units = "K" ; temp:grid_mapping = "crs" ; int crs ; crs:grid_mapping_name = "latitude_longitude" crs:semi_major_axis = 6371000.0 ; crs:inverse_flattening = 0 ;
dimensions: lat = 18 ; lon = 36 ; variables: double lat(lat) ; double lon(lon) ; float temp(lat, lon) ; temp:long_name = "temperature" ; temp:units = "K" ; temp:grid_mapping = "crs" ; int crs ; crs:grid_mapping_name = "latitude_longitude"; crs:longitude_of_prime_meridian = 0.0 ; crs:semi_major_axis = 6378137.0 ; crs:inverse_flattening = 298.257223563 ;
dimensions: z = 100; y = 100000 ; x = 100000 ; variables: double x(x) ; x:standard_name = "projection_x_coordinate" ; x:long_name = "Easting" ; x:units = "m" ; double y(y) ; y:standard_name = "projection_y_coordinate" ; y:long_name = "Northing" ; y:units = "m" ; double z(z) ; z:standard_name = "height_above_reference_ellipsoid" ; z:long_name = "height_above_osgb_newlyn_datum_masl" ; z:units = "m" ; double lat(y, x) ; lat:standard_name = "latitude" ; lat:units = "degrees_north" ; double lon(y, x) ; lon:standard_name = "longitude" ; lon:units = "degrees_east" ; float temp(z, y, x) ; temp:standard_name = "air_temperature" ; temp:units = "K" ; temp:coordinates = "lat lon" ; temp:grid_mapping = "crsOSGB: x y crsWGS84: lat lon" ; float pres(z, y, x) ; pres:standard_name = "air_pressure" ; pres:units = "Pa" ; pres:coordinates = "lat lon" ; pres:grid_mapping = "crsOSGB: x y crsWGS84: lat lon" ; int crsOSGB ; crsOSGB:grid_mapping_name = "transverse_mercator"; crsOSGB:semi_major_axis = 6377563.396 ; crsOSGB:inverse_flattening = 299.3249646 ; crsOSGB:longitude_of_prime_meridian = 0.0 ; crsOSGB:latitude_of_projection_origin = 49.0 ; crsOSGB:longitude_of_central_meridian = -2.0 ; crsOSGB:scale_factor_at_central_meridian = 0.9996012717 ; crsOSGB:false_easting = 400000.0 ; crsOSGB:false_northing = -100000.0 ; crsOSGB:unit = "metre" ; int crsWGS84 ; crsWGS84:grid_mapping_name = "latitude_longitude"; crsWGS84:longitude_of_prime_meridian = 0.0 ; crsWGS84:semi_major_axis = 6378137.0 ; crsWGS84:inverse_flattening = 298.257223563 ;
5.6.1. Use of the CRS Well-known Text Format
An optional grid mapping attribute called crs_wkt
may be used to specify multiple coordinate system properties in so-called well-known text format (usually abbreviated to CRS WKT or OGC WKT).
The CRS WKT format is widely recognised and used within the geoscience software community.
As such it represents a versatile mechanism for encoding information about a variety of coordinate reference system parameters in a highly compact notational form.
The translation of CF coordinate variables to/from OGC Well-Known Text (WKT) format is shown in Examples 5.11 and 5.12 below and described in detail in
https://github.com/cf-convention/cf-conventions/wiki/Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements.
The crs_wkt
attribute should comprise a text string that conforms to the WKT syntax as specified in reference [OGC_WKT-CRS].
If desired the text string may contain embedded newline characters to aid human readability.
However, any such characters are purely cosmetic and do not alter the meaning of the attribute value.
It is envisaged that the value of the crs_wkt
attribute typically will be a single line of text, one intended primarily for machine processing.
Other than the requirement to be a valid WKT string, the CF convention does not prescribe the content of the crs_wkt
attribute since it will necessarily be context-dependent.
Where a crs_wkt
attribute is added to a grid_mapping
, the extended syntax for the grid_mapping
attribute enables the list of variables containing coordinate values being referenced to be explicitly stated and the CRS WKT Axis order to be explicitly defined.
The explicit definition of WKT CRS Axis order is expected by the OGC standards for referencing by coordinates.
Software implementing these standards are likely to expect to receive coordinate value tuples, with the correct coordinate value order, along with the coordinate reference system definition that those coordinate values are defined with respect to.
The order of the <coordinatesVariable>
references within the grid_mapping
attribute definition defines the order of elements within a derived coordinate value tuple.
This enables an application reading the data from a file to construct an array of coordinate value tuples, where each tuple is ordered to match the specification of the coordinate reference system being used whilst the array of tuples is structured according to the netCDF definition.
It is the responsibility of the data producer to ensure that the <coordinatesVariable>
list is consistent with the CRS WKT definition of CS AXIS, with the correct number of entries in the correct order (note: this is not a conformance requirement as CF conformance is not dependent on CRS WKT parsing).
For example, a file has two coordinate variables, lon and lat, and a grid mapping variable crs
with an associated crs_wkt
attribute; the WKT definition defines the AXIS order as ["latitude", "longitude"]
.
The grid_mapping
attribute is thus given a value crs:lat lon
to define that where coordinate pairs are required, these shall be ordered (lat, lon), to be consistent with the provided crs_wkt
string (and not order inverted).
A 2-D array of (lat, lon) tuples can then be explicitly derived from the combination of the lat and lon variables.
The crs_wkt
attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes.
If data producers omit the single-property grid mapping attributes in favour of the crs_wkt
attribute, software which cannot interpret crs_wkt
will be unable to use the grid_mapping
information.
Therefore the CRS should be described as thoroughly as possible with the single-property grid mapping attributes as well as by crs_wkt
.
In cases where CRS property values can be represented by both a single-property grid mapping attribute and the crs_wkt
attribute, the grid mapping should be provided, and if both are provided, the onus is on data producers to ensure that their property values are consistent.
Therefore information from either one (or both) may be read in by the user without needing to check both.
However, if the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed.
For example, if the semi-major axis length of the ellipsoid defined by the grid mapping attribute semi_major_axis
disagrees with the crs_wkt
attribute (via the WKT SPHEROID[…]
element), the value of this attribute cannot be interpreted accurately.
Naturally if the two values are equal then no ambiguity arises.
Likewise, in those cases where the value of a CRS WKT element should be used consistently across the CF-netCDF community (names of projections and projection parameters, for example) then, the values shown in https://github.com/cf-convention/cf-conventions/wiki/Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements should be preferred; these are derived from the OGP/EPSG registry of geodetic parameters, which is considered to represent the definitive authority as regards CRS property names and values.
Examples 5.11 illustrates how the coordinate system properties specified via the crs
grid mapping variable in Example 5.9 might be expressed using a crs_wkt
attribute.
Example 5.12 also illustrates the addition of the crs_wkt
attribute, but here the attribute is added to the crs
variable of a simplified variant of Example 5.10.
For brevity in Example 5.11, only the grid mapping variable and its grid_mapping_name
and crs_wkt
attributes are included; all other elements are as per the Example 5.9.
Names of projection. PARAMETERs
follow the spellings used in the EPSG geodetic parameter registry.
Example 5.12 illustrates how certain WKT elements - all of which are optional - can be used to specify CRS properties not covered by existing CF grid mapping attributes, including:
-
use of the
VERT_DATUM
element to specify vertical datum information -
use of additional
PARAMETER
elements (albeit not essential ones in this example) to define the location of the false origin of the projection -
use of
AUTHORITY
elements to specify object identifier codes assigned by an external authority, OGP/EPSG in this instance
... float data(latitude, longitude) ; data:grid_mapping = "crs: latitude, longitude" ; ... int crs ; crs:grid_mapping_name = "latitude_longitude"; crs:longitude_of_prime_meridian = 0.0 ; crs:semi_major_axis = 6378137.0 ; crs:inverse_flattening = 298.257223563 ; crs:crs_wkt = GEODCRS["WGS 84", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84",6378137,298.257223563, LENGTHUNIT["metre",1.0]]], PRIMEM["Greenwich",0], CS[ellipsoidal,3], AXIS["(lat)",north,ANGLEUNIT["degree",0.0174532925199433]], AXIS["(lon)",east,ANGLEUNIT["degree",0.0174532925199433]], AXIS["ellipsoidal height (h)",up,LENGTHUNIT["metre",1.0]]] ...
Note: To enhance readability of these examples, the WKT value has been split across multiple lines and embedded quotation marks (") left unescaped - in real netCDF files such characters would need to be escaped.
In CDL, within the CRS WKT definition string, newlines would need to be encoded within the string as \n
and double quotes as \"
.
Also for readability, we have dropped the quotation marks which would delimit the entire crs_wkt
string.
This pseudo CDL will not parse directly.
dimensions: lat = 648 ; lon = 648 ; y = 18 ; x = 36 ; variables: double x(x) ; x:standard_name = "projection_x_coordinate" ; x:units = "m" ; double y(y) ; y:standard_name = "projection_y_coordinate" ; y:units = "m" ; float temp(y, x) ; temp:long_name = "temperature" ; temp:units = "K" ; temp:coordinates = "lat lon" ; temp:grid_mapping = "crs: x y" ; int crs ; crs:grid_mapping_name = "transverse_mercator" ; crs:longitude_of_central_meridian = -2. ; crs:false_easting = 400000. ; crs:false_northing = -100000. ; crs:latitude_of_projection_origin = 49. ; crs:scale_factor_at_central_meridian = 0.9996012717 ; crs:longitude_of_prime_meridian = 0. ; crs:semi_major_axis = 6377563.396 ; crs:inverse_flattening = 299.324964600004 ; crs:projected_coordinate_system_name = "OSGB 1936 / British National Grid" ; crs:geographic_coordinate_system_name = "OSGB 1936" ; crs:horizontal_datum_name = "OSGB_1936" ; crs:reference_ellipsoid_name = "Airy 1830" ; crs:prime_meridian_name = "Greenwich" ; crs:towgs84 = 375., -111., 431., 0., 0., 0., 0. ; crs:crs_wkt = "COMPOUNDCRS["OSGB 1936 / British National Grid + ODN", PROJCRS["OSGB 1936 / British National Grid", BASEGEODCRS["OSGB 1936", DATUM["OSGB 1936", ELLIPSOID["Airy 1830", 6377563.396, 299.3249646, LENGTHUNIT["metre",1.0]] ], PRIMEM ["Greenwich", 0], UNIT ["degree", 0.0174532925199433] ], CONVERSION["OSGB", METHOD["Transverse Mercator"], PARAMETER["False easting", 400000, LENGTHUNIT["metre",1.0]], PARAMETER["False northing", -100000, LENGTHUNIT["metre",1.0]], PARAMETER["Longitude of natural origin", -2.0, ANGLEUNIT["degree",0.0174532925199433]], PARAMETER["Latitude of natural origin", 49.0, ANGLEUNIT["degree",0.0174532925199433]], PARAMETER["Longitude of false origin", -7.556, ANGLEUNIT["degree",0.0174532925199433]], PARAMETER["Latitude of false origin", 49.766, ANGLEUNIT["degree",0.0174532925199433]], PARAMETER["Scale factor at natural origin", 0.9996012717, SCALEUNIT["Unity",1.0]] ], CS[Cartesian, 2], AXIS["easting (X)",east], AXIS["northing (Y)",north], LENGTHUNIT["metre",1.0], ID["EPSG",27700] ], VERTCRS["Newlyn", VDATUM["Ordnance Datum Newlyn"], CS[vertical,1], AXIS["gravity-related height (H)",up], LENGTHUNIT["metre",1.0], ID["EPSG",5701] ] ]" ; ...
Note: There are unescaped double quotes and newlines and the quotation marks which would delimit the entire crs_wkt
string are missing in this example.
This is to enhance readability, but it means that this pseudo CDL will not parse directly.
The preceding two example (5.11 and 5.12) may be combined, if the data provider desires to provide explicit latitude and longitude coordinates as well as projection coordinates and to provide CRS WKT referencing for both sets of coordinates. This is demonstrated in example 5.13.
... double x(x) ; x:standard_name = "projection_x_coordinate" ; x:units = "m" ; double y(y) ; y:standard_name = "projection_y_coordinate" ; y:units = "m" ; double lat(y, x) ; lat_standard_name = "latitude" ; lat:units = "degrees_north" ; double lon(y, x) ; lon_standard_name = "longitude" ; lon:units = "degrees_east" ; float temp(y, x) ; temp:long_name = "temperature" ; temp:units = "K" ; temp:coordinates = "lat lon" ; temp:grid_mapping = "crs_osgb: x y crs_wgs84: latitude longitude" ; ... int crs_wgs84 ; crs_wgs84:grid_mapping_name = "latitude_longitude"; crs_wgs84:crs_wkt = ... int crs_osgb ; crs_osgb:grid_mapping_name = "transverse_mercator" ; crs_osgb:crs_wkt = ... ...
Note: There are unescaped double quotes and newlines and the quotation marks which would delimit the entire crs_wkt
string are missing in this example.
This is to enhance readability, but it means that this pseudo CDL will not parse directly.
5.7. Scalar Coordinate Variables
When a variable has an associated coordinate which is single-valued, that coordinate may be represented as a scalar variable (i.e. a data variable which has no netCDF dimensions).
Since there is no associated dimension these scalar coordinate variables should be attached to a data variable via the coordinates
attribute.
The use of scalar coordinate variables is a convenience feature which avoids adding size one dimensions to variables. A numeric scalar coordinate variable has the same information content and can be used in the same contexts as a size one numeric coordinate variable. Similarly, a string-valued scalar coordinate variable has the same meaning and purposes as a size one string-valued auxiliary coordinate variable (Section 6.1, "Labels"). Note however that use of this feature with a latitude, longitude, vertical, or time coordinate will inhibit COARDS conforming applications from recognizing them.
Once a name is used for a scalar coordinate variable it can not be used for a 1D coordinate variable. For this reason we strongly recommend against using a name for a scalar coordinate variable that matches the name of any dimension in the file.
If a data variable has two or more scalar coordinate variables, they are regarded as though they were all independent coordinate variables with dimensions of size one. If two or more single-valued coordinates are not independent, but have related values (this might be the case, for instance, for time and forecast period, or vertical coordinate and model level number, Section 6.2, "Alternative Coordinates"), they should be stored as coordinate or auxiliary coordinate variables of the same size one dimension, not as scalar coordinate variables.
dimensions: lat = 180 ; lon = 360 ; time = UNLIMITED ; variables: double atime atime:standard_name = "forecast_reference_time" ; atime:units = "hours since 1999-01-01 00:00" ; double time(time); time:standard_name = "time" ; time:units = "hours since 1999-01-01 00:00" ; double lon(lon) ; lon:long_name = "station longitude"; lon:units = "degrees_east"; double lat(lat) ; lat:long_name = "station latitude" ; lat:units = "degrees_north" ; double p500 p500:long_name = "pressure" ; p500:units = "hPa" ; p500:positive = "down" ; float height(time,lat,lon); height:long_name = "geopotential height" ; height:standard_name = "geopotential_height" ; height:units = "m" ; height:coordinates = "atime p500" ; data: time = 6., 12., 18., 24. ; atime = 0. ; p500 = 500. ;
In this example both the analysis time and the single pressure level are represented using scalar coordinate variables.
The analysis time is identified by the standard name forecast_reference_time
while the valid time of the forecast is identified by the standard name time
.
5.8. Domain Variables
A domain describes data locations and cell properties. It defines cells that span a collection of dimensions with cell coordinates, cell measures, and coordinate reference systems.
A data variable defines its domain via its own attributes, but a domain variable provides the description of a domain in the absence of any data values. The variable should be a scalar (i.e. it has no dimensions) of arbitrary type, and the value of its single element is immaterial. It acts as a container for the attributes that define the domain. The purpose of a domain variable is to provide domain information to applications that have no need of data values at the domain’s locations, thus removing any ambiguity when retrieving a domain from a dataset. Ancillary variables and cell methods are not part of the domain, because they are only defined in relation to data values.
The domain variable supports the same attributes as are allowed on a data variable for describing a domain, with exactly the same meanings and syntaxes, as described in Appendix A, Attributes. If an attribute is needed by a particular data variable to describe its domain, then that attribute would also be needed by the equivalent domain variable.
The dimensions of the domain must be stored with the dimensions
attribute, and the presence of a dimensions
attribute will identify the variable as a domain variable.
Therefore the dimensions
attribute must not be present on any variables that are to be interpreted as data variables.
It is necessary to list these dimensions, rather than inferring them from the contents of the other attributes, as it can not be guaranteed that the referenced variables span all of the required dimensions (as could be the case for a discrete axis, for instance).
The value of the dimensions
attribute is a blank separated list of the dimension names.
There is no restriction on the order in which the dimensions appear in the dimensions
attribute string.
If a domain has no named dimensions then the value of the dimensions
attribute must be an empty string, as could be the case if the dimensions of the domain are all defined implicitly by scalar coordinate variables.
The dimensions listed by the dimensions
attribute constrain the dimensions that may be spanned by variables referenced from any of the other attributes, in the same way that the array dimensions perform that role for a data variable.
For instance, all variables named by the cell_measures
attribute (Section 7.2, "Cell Measures") of a domain variable must span a subset of zero or more of the dimensions given by the dimensions
attribute.
It is optional for coordinate variables to be listed by a domain variable’s coordinates
attribute.
Any coordinate variable that shares its name with a dimension given by the dimensions
attribute will be considered as part of the domain definition.
It is recommended that a domain variable has a long_name
attribute to describe its contents.
It is recommended that a domain variable does not have any of the attributes marked in Appendix A, Attributes as applicable to data variables except those which are also marked as applicable to domain variables.
Multiple domain variables may exist in a file with, or without, data variables. Note that the data variable attributes describing its domain can not be replaced by a reference to a domain variable.
dimensions: lat = 18 ; lon = 36 ; pres = 15 ; time = 4 ; variables: char domain ; domain:dimensions = "time pres lat lon" ; domain:long_name = "Domain with independent coordinate variables" ; float lon(lon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(lat) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; float pres(pres) ; pres:long_name = "pressure" ; pres:units = "hPa" ; double time(time) ; time:long_name = "time" ; time:units = "days since 1990-1-1 0:0:0" ;
In this example the data variable xwind
from the Independent coordinate variables example has been replaced by the domain variable domain
.
dimensions: rlon = 128 ; rlat = 64 ; lev = 18 ; variables: char domain ; domain:dimensions = "lev rlat rlon" ; domain:coordinates = "lon lat time" ; domain:grid_mapping = "rotated_pole" ; domain:long_name = "Domain with grid mapping and scalar coordinate" ; char rotated_pole ; rotated_pole:grid_mapping_name = "rotated_latitude_longitude" ; rotated_pole:grid_north_pole_latitude = 32.5 ; rotated_pole:grid_north_pole_longitude = 170. ; double time time:standard_name = "time" ; time:units = "days since 2000-12-01 00:00" ; float rlon(rlon) ; rlon:long_name = "longitude in rotated pole grid" ; rlon:units = "degrees" ; rlon:standard_name = "grid_longitude" ; float rlat(rlat) ; rlat:long_name = "latitude in rotated pole grid" ; rlat:units = "degrees" ; rlat:standard_name = "grid_latitude" ; float lev(lev) ; lev:long_name = "pressure level" ; lev:units = "hPa" ; float lon(rlat,rlon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(rlat,rlon) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ;
dimensions: cell = 2562 ; // number of grid cells time = 12 ; nv = 6 ; // maximum number of cell vertices variables: char domain ; domain:dimensions = "time cell" ; domain:coordinates = "lon lat" ; domain:cell_measures = "area: cell_area" ; domain:long_name = "Domain with cell measures" ; float lon(cell) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; lon:bounds = "lon_vertices" ; float lat(cell) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; lat:bounds = "lat_vertices" ; float time(time) ; time:long_name = "time" ; time:units = "days since 1979-01-01" ; float cell_area(cell) ; cell_area:long_name = "area of grid cell" ; cell_area:standard_name = "cell_area" ; cell_area:units = "m2" float lon_vertices(cell, nv) ; float lat_vertices(cell, nv) ;
In this example the data variable PS
from the Cell areas for a spherical geodesic grid example has been replaced by the domain variable domain
.
dimensions: variables: char domain ; domain:dimensions = "" ; domain:coordinates = "t" ; domain:long_name = "Domain with no explicit dimensions" ; double t ; t:standard_name = "time" ; t:units = "days since 2021-01-01" ;
dimensions: instance = 2 ; node = 5 ; time = 4 ; variables: char domain ; domain:dimensions = "instance time" ; domain:coordinates = "lat lon" ; domain:grid_mapping = "datum" ; domain:geometry = "geometry_container" ; domain:long_name = "Domain with a geometry variable" ; int time(time) ; double lat(instance) ; lat:units = "degrees_north" ; lat:standard_name = "latitude" ; lat:nodes = "y" ; double lon(instance) ; lon:units = "degrees_east" ; lon:standard_name = "longitude" ; lon:nodes = "x" ; int datum ; datum:grid_mapping_name = "latitude_longitude" ; datum:longitude_of_prime_meridian = 0.0 ; datum:semi_major_axis = 6378137.0 ; datum:inverse_flattening = 298.257223563 ; int geometry_container ; geometry_container:geometry_type = "line" ; geometry_container:node_count = "node_count" ; geometry_container:node_coordinates = "x y" ; int node_count(instance) ; double x(node) ; x:units = "degrees_east" ; x:standard_name = "longitude" ; x:axis = "X" ; double y(node) ; y:units = "degrees_north" ; y:standard_name = "latitude" ; y:axis = "Y" ;
In this example the data variable someData
from the Timeseries with geometry. example has been replaced by the domain variable domain
.
dimensions: station = 23 ; obs = UNLIMITED ; name_strlen = 23 ; variables: char domain ; domain:dimensions = "obs" ; domain:coordinates = "time lat lon alt station_name" ; domain:long_name = "Domain with a discrete sampling geometry" ; float lon(station) ; lon:standard_name = "longitude" ; lon:long_name = "station longitude" ; lon:units = "degrees_east" ; float lat(station) ; lat:standard_name = "latitude" ; lat:long_name = "station latitude" ; lat:units = "degrees_north" ; float alt(station) ; alt:long_name = "vertical distance above the surface" ; alt:standard_name = "height" ; alt:units = "m" ; alt:positive = "up" ; alt:axis = "Z" ; char station_name(station, name_strlen) ; station_name:long_name = "station name" ; station_name:cf_role = "timeseries_id" ; int station_info(station) ; station_info:long_name = "some kind of station info" ; int stationIndex(obs) ; stationIndex:long_name = "which station this obs is for" ; stationIndex:instance_dimension = "station" ; double time(obs) ; time:standard_name = "time" ; time:long_name = "time of measurement" ; time:units = "days since 1970-01-01 00:00:00" ; attributes: :featureType = "timeSeries" ;
In this example the data variables humidity
and temp
from the Timeseries of station data in the indexed ragged array representation. example have been replaced by the domain variable domain
.
5.9. Mesh Topology Variables
A mesh topology variable defines the geospatial topology of cells arranged in two or three dimensions in real space but indexed by a single dimension. It explicitly describes the topological relationships between cells, i.e. spatial relationships which do not depend on the cell locations, via a mesh of connected nodes. A mesh topology variable may provide the topology for one or more domains, defined at the nodes, edges, or faces of the mesh. See the Domain topology construct and Cell connectivity construct descriptions in the CF data model for more details, including on how the mesh relates to the cells of the domain.
The canonical definitions of mesh topology variables and location index set variables are given externally by the UGRID conventions [UGRID], but their standardized attributes, many of which are optional, are listed in Appendix K, Mesh Topology Attributes and Appendix A, Attributes. Some features of the UGRID conventions [UGRID] are not currently recognized by the CF conventions: mesh topology volume cells (that are used to describe fully three-dimensional unstructured mesh topologies); and the "boundary node connectivity" variable (that specifies an index variable identifying the nodes that define where boundary condtions have been provided).
A data or domain variable may use one of a mesh topology variable’s domains by referencing the mesh topology variable with the mesh
attribute; along with the identity of required domain provided by the location
attribute (see example A two-dimensional UGRID mesh topology variable).
The variables containing the coordinate values for cells indexed by the mesh topology are defined by the mesh topology variable but are equivalent to one-dimensional auxiliary coordinate variables, and so may also be provided by the data or domain variable’s coordinates
attribute.
Note that the mesh topology variable allows cell bounds to be provided without any cell coordinate values, via its node_coordinates
attribute.
A location index set variable defines a subset of locations of a mesh topology variable, e.g. only special locations like weirs and gates.
It is provided as a space saving device to prevent the need to redefine parts of an existing mesh topology variable, and as such is logically equivalent to a mesh topology variable.
A data or domain variable references a location index set variable via its location_index_set
attribute.
dimensions: node = 5 ; // Number of mesh nodes edge = 6 ; // Number of mesh edges face = 2 ; // Number of mesh faces two = 2 ; // Number of nodes per edge four = 4 ; // Maximum number of nodes per face time = 12 ; variables: // Mesh topology variable integer mesh ; mesh:cf_role = "mesh_topology" ; mesh:long_name = "Topology of a 2-d unstructured mesh" ; mesh:topology_dimension = 2 ; mesh:node_coordinates = "mesh_node_x mesh_node_y" ; mesh:edge_node_connectivity = "mesh_edge_nodes" ; mesh:face_node_connectivity = "mesh_face_nodes" ; // Mesh node coordinates double mesh2_node_x(node) ; mesh_node_x:standard_name = "longitude" ; mesh_node_x:units = "degrees_east" ; double mesh2_node_y(node) ; mesh_node_y:standard_name = "latitude" ; mesh_node_y:units = "degrees_north" ; // Mesh connectivity variables integer mesh_face_nodes(face, four) ; mesh_face_nodes:long_name = "Maps each face to its 3 or 4 corner nodes" ; integer mesh_edge_nodes(edge, two) ; mesh_edge_nodes:long_name = "Maps each edge to the 2 nodes it connects" ; // Coordinate variables float time(time) ; time:standard_name = "time" ; time:units = "days since 2004-06-01" ; // Data at mesh faces double volume_at_faces(time, face) ; volume_at_faces:standard_name = "air_density" ; volume_at_faces:units = "kg m-3" ; volume_at_faces:mesh = "mesh" ; volume_at_faces:location = "face" ; // Data at mesh edges double flux_at_edges(time, edge) ; fluxe_at_edges:standard_name = "northward_wind" ; fluxe_at_edges:units = "m s-1" ; fluxe_at_edges:mesh = "mesh" fluxe_at_edges:location = "edge" ; // Data at mesh nodes double height_at_nodes(time, node) ; height_at_nodes:standard_name = "sea_surface_height_above_geoid" ; height_at_nodes:units = "m" ; height_at_nodes:mesh = "mesh" ; height_at_nodes:location = "node" ;
A two-dimensional UGRID mesh topology variable for the mesh depicted in Figure I.5, with data variables defined at face, edge and node elements of the mesh. All optional attributes have been omitted.
6. Labels and Alternative Coordinates
6.1. Labels
Character strings can be used to provide a name or label for each element of an axis. This is particularly useful for discrete axes (section 4.5). For instance, if a data variable contains time series of observational data from a number of observing stations, it may be convenient to provide the names of the stations as labels for the elements of the station dimension (Section H.2, "Time Series Data"). There are several other uses for labels in CF. For instance, Northward heat transport in Atlantic Ocean shows the use of labels to indicate geographic regions.
Character strings labelling the elements of an axis are regarded as string-valued auxiliary coordinate variables.
The coordinates
attribute of the data variable names the variable that contains the string array.
An application processing the variables listed in the coordinates
attribute can recognize a string-valued auxiliary coordinate variable because it has a type of char
or string
.
If the variable has a type of char
, the inner dimension (last dimension in CDL terms) is the maximum length of each string, and the other dimensions are axis dimensions.
If an auxiliary coordinate variable has a type of string
and has no dimensions, or has a type of char
and has only one dimension (the maximum length of the string), it is a string-valued scalar coordinate variable (see Section 5.7, "Scalar Coordinate Variables").
As such, it has the same information content and can be used in the same contexts as a string-valued auxiliary coordinate variable of a size one dimension.
This is a convenience feature.
6.1.1. Geographic Regions
When data is representative of geographic regions which can be identified by names but which have complex boundaries that cannot practically be specified using longitude and latitude boundary coordinates, a labeled axis should be used to identify the regions.
We recommend that the names be chosen from the list of standardized region names whenever possible.
To indicate that the label values are standardized the variable that contains the labels must be given the standard_name
attribute with the value region
.
Suppose we have data representing northward heat transport across a set of zonal slices in the Atlantic Ocean. Note that the standard names to describe this quantity do not include location information. That is provided by the latitude coordinate and the labeled axis:
dimensions: times = 20 ; lat = 5 lbl = 1 ; variables: float n_heat_transport(time,lat,lbl); n_heat_transport:units="W"; n_heat_transport:coordinates="geo_region"; n_heat_transport:standard_name="northward_ocean_heat_transport"; double time(time) ; time:long_name = "time" ; time:units = "days since 1990-1-1 0:0:0" ; float lat(lat) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; string geo_region(lbl) ; geo_region:standard_name="region" data: geo_region = "atlantic_ocean" ; lat = 10., 20., 30., 40., 50. ;
6.1.2. Taxon Names and Identifiers
A taxon is a named level within a biological classification, such as a class, genus and species. Quantities dependent on taxa have generic standard names containing the phrase "organisms_in_taxon", and the taxa are identified by auxiliary coordinate variables.
The taxon auxiliary coordinate variables are string-valued.
The plain-language name of the taxon must be contained in a variable with standard_name
of biological_taxon_name
.
A Life Science Identifier (LSID) may be contained in a variable with standard_name
of biological_taxon_lsid
.
This is a URN with the syntax "urn:lsid:<Authority>:<Namespace>:<ObjectID>[:<Version>]".
This includes the reference classification in the <Authority> element and these are restricted by the LSID governance.
It is strongly recommended in CF that the authority chosen is World Register of Marine Species (WoRMS) for oceanographic data and Integrated Taxonomic Information System (ITIS) for freshwater and terrestrial data.
WoRMS LSIDs are built from the WoRMS AphiaID taxon identifier such as "urn:lsid:marinespecies.org:taxname:104464" for AphiaID 104464.
This may be converted to a URL by adding prefixes such as https://www.lsid.info/.
ITIS LSIDs are built from the ITIS Taxonomic Serial Number (TSN), such as "urn:lsid:itis.gov:itis_tsn:180543".
The biological_taxon_name
auxiliary coordinate variable included for human readability is mandatory.
The biological_taxon_lsid
auxliary coordinate variable included for software agent readability is optional, but strongly recommended.
If both are present then each biological_taxon_name
coordinate must exactly match the name resolved from the biological_taxon_lsid
coordinate.
If LSIDs are available for some taxa in a dataset then the biological_taxon_lsid
auxiliary coordinate variable should be included and missing data given for those taxa that do not have an identifier.
A skeleton example for taxonomic abundance time series.
dimension: time = 100 ; string80 = 80 ; taxon = 2 ; variables: float time(time); time:standard_name = "time" ; time:units = "days since 2019-01-01" ; float abundance(time,taxon) ; abundance:standard_name = "number_concentration_of_biological_taxon_in_sea_water" ; abundance:coordinates = "taxon_lsid taxon_name" ; char taxon_name(taxon,string80) ; taxon_name:standard_name = "biological_taxon_name" ; char taxon_lsid(taxon,string80) ; taxon_lsid:standard_name = "biological_taxon_lsid" ; data: time = // 100 values ; abundance = // 200 values ; taxon_name = "Calanus finmarchicus", "Calanus helgolandicus" ; taxon_lsid = "urn:lsid:marinespecies.org:taxname:104464", "urn:lsid:marinespecies.org:taxname:104466" ;
6.2. Alternative Coordinates
In some situations a dimension may have alternative sets of coordinates values. Since there can only be one coordinate variable for the dimension (the variable with the same name as the dimension), any alternative sets of values have to be stored in auxiliary coordinate variables. For such alternative coordinate variables, there are no mandatory attributes, but they may have any of the attributes allowed for coordinate variables.
Levels on a vertical axis may be described by both the physical coordinate and the ordinal model level number.
float xwind(sigma,lat); xwind:coordinates="model_level"; float sigma(sigma); // physical height coordinate sigma:long_name="sigma"; sigma:positive="down"; int model_level(sigma); // model level number at each height model_level:long_name="model level number"; model_level:positive="up";
7. Data Representative of Cells
When gridded data does not represent the point values of a field but instead represents some characteristic of the field within cells of non-zero size, a complete description of the variable should include metadata that describes the domain or extent of each cell, and the characteristic of the field that the cell values represent. The commonest cases are one-dimensional cells along spatiotemporal axes, for instance cells along a time axis for consecutive months whose values contain monthly means. The methods presented in Section 7.1, "Cell Boundaries" and Section 7.3, "Cell Methods" describe cases in which each grid point is associated with a cell consisting of a single one-dimensional interval, a single two-dimensional polygonal area, or in general a single n-dimensional volume in the n-dimensional space described by its coordinate variables.
It is possible for a single data value to be the result of an operation whose domain is a disjoint set of intervals or areas. This is true for many types of climatological statistic; for example, the mean January temperature for the years 1971-2000 is computed from the 30 individual months of January, which are a set of discontiguous time-intervals. Climatological statistics are of such importance that we provide special methods for describing their associated computational domains in Section 7.4, "Climatological Statistics". As an alternative to n-dimensional volumes with bounds, we provide Section 7.5, "Geometries", for the case of geospatial applications in which each data value pertains to a single real-world feature, such as a river, watershed or country, represented by one or more points, lines or polygons.
7.1. Cell Boundaries
To delimit the cells, the bounds
attribute may be added to the appropriate coordinate variable(s).
The value of bounds
is the name of the variable that contains the vertices of the cell boundaries.
We refer to this type of variable as a "boundary variable."
If cell boundaries are provided, it is recommended that each gridpoint should lie somewhere within or upon the boundaries of its own cell.
If cell boundaries are not provided (using the bounds
attribute), an application can make no assumption about the location or extent of the cells.
Without a boundary variable, it is unknown whether adjacent cells are contiguous, separated by a gap, or overlapping.
If the data value pertains to the gridpoint alone, rather than to an interval, area or n-dimensional volume of non-zero size, it is recommended to indicate this with a cell_methods
entry of point
(Section 7.3, "Cell Methods").
In that case, the cell is irrelevant to the data and the bounds are arbitrary.
Nonetheless, the bounds may still be included, for instance because the grid is shared by other data variables that pertain to cells, or to provide some indication of cells to generic applications for graphical purposes.
A cell of truly zero size can be indicated by giving it coincident boundaries.
A boundary variable must have one more dimension than its associated coordinate or auxiliary coordinate variable. We refer to the additional dimension as the "vertex dimension". The vertex dimension must be the most rapidly varying dimension (the last dimension in CDL order), and its size is the maximum number of cell vertices.
The vertex dimension must be of size two if the associated variable is one-dimensional (Section 7.1.2, "Bounds for one-dimensional coordinate variables"), and of size greater than two if the associated variable has more than one dimension (Section 7.1.1, "Bounds for horizontal coordinate variables with four-sided cells").
For grids constructed from cells that do not all have the same number of sides (e.g., a grid with some rectangular cells and some triangular cells), the vertex dimension must be at least as large as the maximum number of cell vertices (Section 7.1.3, "Bounds for coordinate variables with p-sided cells in two spatial dimensions").
For cells with fewer vertices than the size of vertex dimension, the unneeded elements must appear as the last elements in the vertex dimension and must be assigned the _FillValue
.
CF can currently describe boundaries for cells which have one or two spatial dimensions, but does not provide conventions to describe the boundaries of cells with three spatial dimensions.
Such conventions are under consideration in [UGRID].
A boundary variable inherits the values of some attributes from its parent coordinate variable. If a coordinate variable has any of the attributes marked "BI" (for "inherit") in the "Use" column of Appendix A, Attributes, they are assumed to apply to its bounds variable as well. It is recommended that BI attributes not be included on a boundary variable. If a BI attribute is included, it must also be present in the parent variable, and it must exactly match the parent attribute’s data type and value. A bounds variable may have any of the attributes marked "BO" for ("own") in the "Use" column of Appendix A, Attributes. These attributes take precedence over any corresponding attributes of the parent variable. In these cases, the parent variable’s attribute does not apply to the bounds variable, regardless of whether the latter has its own attribute.
7.1.1. Bounds for one-dimensional coordinate variables
For a one-dimensional coordinate variable of size N, the boundary variable is an array of shape (N,2). The bounds for cell i are the elements B(i,0) and B(i,1) of the boundary variable B. Element C(i) of the coordinate variable C should lie between the boundaries of the cell, or upon one of them i.e. B(i,0) - C(i) and B(i,1) - C(i) should not have the same sign, though one of them could be zero (Figure 7.1).
If N > 1, the bounds of each cell must be ordered consistently with the coordinates i.e. B(i,0) < B(i,1) for all i if C(i) < C(i + 1), and B(i,0) > B(i,1) for all i if C(i) > C(i + 1).
If any two cells are contiguous, their shared boundary must be represented identically in each instance where it occurs in the boundary variable. This means that in the common case of N non-overlapping contiguous intervals, N - 1 of the boundaries are duplicated, because they are shared by adjacent intervals. This representation has the advantage that it is general enough to handle, without modification, non-contiguous intervals, as well as intervals on an axis using the unlimited dimension.
dimensions: time = 60; nv = 2; // number of vertices variables: float time(time); time:standard_name = "time"; time:units = "days since 2024-11-8 09:00:00"; time:bounds = "time_bnds"; float time_bnds(time,nv);
The boundary variable time_bnds
associates a time point i
with the time interval whose boundaries are time_bnds(i,0)
and time_bnds(i,1)
.
The instant time(i)
should be contained within the interval, or be at one end of it.
For instance, with i=2
we might have time(2)=10.5
, time_bnds(2,0)=10.0
, time_bnds(2,1)=11.0
.
If the times are increasing e.g. time(3)
= 11.5
> 10.5
= time(2)
, which implies time(i+1)
> time(i)
for all i
because coordinates must be monotonic, the bounds must also be increasing for all i
, e.g. timebnd(2,1)
>= timebnd(2,0)
.
If adjacent intervals are contiguous, the shared endpoint must be identical.
For example, if the interval i=3
begins at 11.0
days, when interval i=2
ends, the values in timebnd(3,0)
and timebnd(2,1)
must be exactly the same.
lonbnd(i,0)
and lonbnd(i,1)
as well as of latbnd(i,0)
and latbnd(i,1)
in the case of one-dimensional horizontal coordinate axes. Tuples (lon(i),lat(j))
represent grid cell centers. The four grid cell vertices are given by (lonbnd(i,0),latbnd(j,0))
, (lonbnd(i,1),latbnd(j,0))
, (lonbnd(i,1),latbnd(j,1))
and (lonbnd(i,0),latbnd(j,1))
.7.1.2. Bounds for horizontal coordinate variables with four-sided cells
There is a common case of a rectangular horizontal grid, with four-sided cells, whose two axes are not latitude and longitude (e.g. it uses a map projection from Section 5.6, "Horizontal Coordinate Reference Systems, Grid Mappings, and Projections" or a curvilinear grid, such as the tripolar ocean grid).
In that case, two-dimensional auxiliary coordinate variables in latitude lat(n,m)
and longitude lon(n,m)
may be provided as well.
Since the sides of the cells do not generally have constant latitude or longitude, all four vertices must be specified individually.
Therefore the boundary variables for the two-dimensional auxiliary coordinate variables are given in the form latbnd(n,m,4)
and lonbnd(n,m,4)
, where the trailing index runs over the four vertices of the cells.
dimensions: imax = 128; jmax = 64; nv = 4; variables: float lat(jmax,imax); lat:long_name = "latitude"; lat:units = "degrees_north"; lat:bounds = "lat_bnds"; float lon(jmax,imax); lon:long_name = "longitude"; lon:units = "degrees_east"; lon:bounds = "lon_bnds"; float lat_bnds(jmax,imax,nv); float lon_bnds(jmax,imax,nv);
The boundary variables lat_bnds
and lon_bnds
associate a gridpoint (j,i)
with the cell determined by the vertices (lat_bnds(j,i,n),lon_bnds(j,i,n))
, n=0,..,3
.
The gridpoint location, (lat(j,i),lon(j,i))
, should be contained within this region.
The vertices must be ordered such that, when visiting the vertices in order, the four-sided perimeter of the cell is traversed anticlockwise on the lon-lat surface as seen from above.
If i-j-upward is a right-handed coordinate system (like lon-lat-upward), this can be arranged as in Figure 7.2.
Let us call the side of cell (j,i)
facing cell (j,i-1)
the "i-1
" side, the side facing cell (j,i+1)
the "i+1
" side, and similarly for "j-1
" and "j+1
".
Then we can refer to the vertex formed by sides i-1
and j-1
as (j-1,i-1)
.
With this notation, the four vertices are indexed as follows: 0=(j-1,i-1)
, 1=(j-1,i+1)
, 2=(j+1,i+1)
, 3=(j+1,i-1)
.
lonbnd(j,i,0)
to lonbnd(j,i,3)
and of latbnd(j,i,0)
and latbnd(j,i,3)
in the case of two-dimensional horizontal coordinate axes. Tuples (lon(j,i),lat(j,i))
represent grid cell centers and tuples (lonbnd(j,i,n),latbnd(j,i,n))
represent the grid cell vertices.The bounds can be used to decide whether cells are contiguous via the following relationships.
In these equations the variable bnd
is used generically to represent either the latitude or longitude boundary variable.
For 0 < j < n and 0 < i < m, If cells (j,i) and (j,i+1) are contiguous, then bnd(j,i,1)=bnd(j,i+1,0) bnd(j,i,2)=bnd(j,i+1,3) If cells (j,i) and (j+1,i) are contiguous, then bnd(j,i,3)=bnd(j+1,i,0) and bnd(j,i,2)=bnd(j+1,i,1)
7.1.3. Bounds for coordinate variables with p-sided cells in two spatial dimensions
In the general case of a grid composed of polygonal cells in two spatial dimensions with p
sides and vertices, or a mixture of polygons where p
is the maximum number of sides and vertices, the grid could have one, two or more dimensions, depending on how it is organised logically (e.g. as a 1-D list or a 2-D rectangular arrangement).
The boundary variables for the auxiliary coordinate variables are dimensioned (…,m,p)
, giving coordinates for the p
vertices of each cell, where (…,m)
are the dimensions of the auxiliary coordinate variables.
If the cells are in a horizontal plane, the vertices must be traversed anticlockwise in the lon-lat plane as viewed from above.
The starting vertex is not specified.
The case of a 2-D horizontal coordinate variables with 4-sided cells (Section 7.1.1, "Bounds for horizontal coordinate variables with four-sided cells") is a particular case, with p=4
for boundary variables dimensioned (n,m,p)
, where n
and m
are horizontal dimensions.
See also Section 7.5, "Geometries" for conventions describing horizontal cells with more complicated geometry and topology.
7.1.4. Boundaries and Formula Terms
If a parametric coordinate variable with a formula_terms
attribute (section 4.3.2) also has a bounds
attribute, its boundary variable must have a formula_terms
attribute too.
In this case the same terms would appear in both (as specified in Appendix D), since the transformation from the parametric coordinate values to physical space is realized through the same formula.
For any term that depends on the vertical dimension, however, the variable names appearing in the formula terms would differ from those found in the formula_terms
attribute of the coordinate variable itself because the boundary variables for formula terms are two-dimensional while the formula terms themselves are one-dimensional.
Whenever a formula_terms
attribute is attached to a boundary variable, the formula terms may additionally be identified using a second method: variables appearing in the vertical coordinates' formula_terms
may be declared to be coordinate, scalar coordinate or auxiliary coordinate variables, and those coordinates may have bounds
attributes that identify their boundary variables.
In that case, the bounds
attribute of a formula terms variable must be consistent with the formula_terms
attribute of the boundary variable.
Software digesting legacy datasets (constructed prior to version 1.7 of this standard) may have to rely in some cases on the first method of identifying the formula term variables and in other cases, on the second.
Starting from version 1.7, however, the first method will be sufficient.
formula_terms
when a parametric coordinate variable has bounds.float eta(eta) ; eta:long_name = "eta at full levels" ; eta:positive = "down" ; eta:standard_name = " atmosphere_hybrid_sigma_pressure_coordinate" ; eta:formula_terms = "a: A b: B ps: PS p0: P0" ; eta:bounds="eta_bnds" ; float eta_bnds(eta, 2) ; eta_bnds:formula_terms = "a: A_bnds b: B_bnds ps: PS p0: P0" ; // This attribute is mandatory float A(eta) ; A:long_name = "'a' coefficient for vertical coordinate at full levels" ; A:units = "Pa" ; A:bounds = "A_bnds" ; // This attribute is included for the optional second method float B(eta) ; B:long_name = "'b' coefficient for vertical coordinate at full levels" ; B:units = "1" ; B:bounds = "B_bnds" ; // This attribute is included for the optional second method float A_bnds(eta, 2) ; float B_bnds(eta, 2) ; float PS(lat, lon) ; PS:units = "Pa" ; float P0 ; P0:units = "Pa" ; float temp(eta, lat, lon) ; temp:standard_name = "air_temperature" ; temp:units = "K"; temp:coordinates = "A B" ; // This attribute is included for the optional second method
7.2. Cell Measures
For some calculations, information is needed about the size, shape or location of the cells that cannot be deduced from the coordinates and bounds without special knowledge that a generic application cannot be expected to have. For instance, in computing the mean of several cell values, it is often appropriate to "weight" the values by area. When computing an area-mean each grid cell value is multiplied by the grid-cell area before summing, and then the sum is divided by the sum of the grid-cell areas. Area weights may also be needed to map data from one grid to another in such a way as to preserve the area mean of the field. The preservation of area-mean values while regridding may be essential, for example, when calculating surface heat fluxes in an atmospheric model with a grid that differs from the ocean model grid to which it is coupled.
In many cases the areas can be calculated from the cell bounds, but there are exceptions.
Consider, for example, a spherical geodesic grid composed of contiguous, roughly hexagonal cells.
The vertices of the cells can be stored in the variable identified by the bounds
attribute, but the cell perimeter is not uniquely defined by its vertices (because the vertices could, for example, be connected by straight lines, or, on a sphere, by lines following a great circle, or, in general, in some other way).
Thus, given the cell vertices alone, it is generally impossible to calculate the area of a grid cell.
This is why it may be necessary to store the grid-cell areas in addition to the cell vertices.
In other cases, the grid cell-volume might be needed and might not be easily calculated from the coordinate information. In ocean models, for example, it is not uncommon to find "partial" grid cells at the bottom of the ocean. In this case, rather than (or in addition to) indicating grid cell area, it may be necessary to indicate volume.
To indicate extra information about the spatial properties of a variable’s grid cells, a cell_measures
attribute may be defined for a variable.
This is a string attribute comprising a list of blank-separated pairs of words of the form "measure: name
".
For the moment, "area
" and "volume
" are the only defined measures, but others may be supported in future.
The "name" is the name of the variable containing the measure values, which we refer to as a "measure variable".
The dimensions of a measure variable must be the same as or a subset of the dimensions of the variable to which it is related, but their order is not restricted, and with one exception:
If a cell measure variable of a data variable that has been compressed by gathering (Section 8.2, "Lossless Compression by Gathering") does not span the compressed dimension, then its dimensions may be any subset of the data variable’s uncompressed dimensions, i.e. any of the dimensions of the data variable except the compressed dimension, and any of the dimensions listed by the compress
attribute of the compressed coordinate variable.
In the case of area, for example, the field itself might be a function of longitude, latitude, and time, but the variable containing the area values would only include longitude and latitude dimensions (and the dimension order could be reversed, although this is not recommended).
The variable must have a units
attribute and may have other attributes such as a standard_name
.
For rectangular longitude-latitude grids, the area of grid cells can be calculated from the bounds: the area of a cell is proportional to the product of the difference in the longitude bounds of the cell and the difference between the sine of each latitude bound of the cell.
In this case supplying grid-cell areas via the cell_measures
attribute is unnecessary because it may be assumed that applications can perform this calculation, using their own value for the radius of the Earth.
A variable referenced by cell_measures
is not required to be present in the file containing the data variable.
If the cell_measures
variable is located in another file (an "external file"), rather than in the file where it is referenced, it must be listed in the external_variables
attribute of the referencing file (Section 2.6.3).
dimensions: cell = 2562 ; // number of grid cells time = 12 ; nv = 6 ; // maximum number of cell vertices variables: float PS(time,cell) ; PS:units = "Pa" ; PS:coordinates = "lon lat" ; PS:cell_measures = "area: cell_area" ; float lon(cell) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; lon:bounds="lon_vertices" ; float lat(cell) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; lat:bounds="lat_vertices" ; float time(time) ; time:long_name = "time" ; time:units = "days since 1979-01-01 0:0:0" ; float cell_area(cell) ; cell_area:long_name = "area of grid cell" ; cell_area:standard_name="cell_area"; cell_area:units = "m2" float lon_vertices(cell,nv) ; float lat_vertices(cell,nv) ;
7.3. Cell Methods
To describe the characteristic of a field that is represented by cell values, we define the cell_methods
attribute of the variable.
This is a string attribute comprising a list of blank-separated words of the form "name: method".
Each "name: method" pair indicates that for an axis identified by name, the cell values representing the field have been determined or derived by the specified method.
For example, if data values have been generated by computing time means, then this could be indicated with cell_methods="t: mean"
, assuming here that the name of the time dimension variable is "t".
In the specification of this attribute, name can be a dimension of the variable, a scalar coordinate variable, a valid standard name, or the word "area
".
(See Section 7.3.4, "Cell methods when there are no coordinates" concerning the use of standard names in cell_methods.)
The values of method should be selected from the list in Appendix E, Cell Methods, which includes point
, sum
, mean
, among others.
Case is not significant in the method name.
Some methods (e.g., variance
) imply a change of units of the variable, as is indicated in Appendix E, Cell Methods.
It must be remembered that the method applies only to the axis designated in cell_methods
by name, and different methods may apply to other axes.
If, for instance, a precipitation value in a longitude-latitude cell is given the method maximum
for these axes, it means that it is the maximum within these spatial cells, and does not imply that it is also the maximum in time.
Furthermore, it should be noted that if any method other than "point
" is specified for a given axis, then bounds
should also be provided for that axis (except for the relatively rare exceptions described in Section 7.3.4, "Cell methods when there are no coordinates").
The default interpretation for variables that do not have the cell_methods
attribute specified depends on whether the quantity is extensive (which depends on the size of the cell) or intensive (which does not).
Suppose, for example, the quantities "accumulated precipitation" and "precipitation rate" each have a time axis.
A variable representing accumulated precipitation is extensive in time because it depends on the length of the time interval over which it is accumulated.
For correct interpretation, it therefore requires a time interval to be completely specified via a boundary variable (i.e., via a bounds
attribute for the time axis).
In this case the default interpretation is that the cell method is a sum over the specified time interval.
This can be (optionally) indicated explicitly by setting the cell method to sum
.
A precipitation rate on the other hand is intensive in time and could equally well represent either an instantaneous value or a mean value over the time interval specified by the cell.
In this case the default interpretation for the quantity would be "instantaneous" (which, optionally, can be indicated explicitly by setting the cell method to point
).
More often, however, cell values for intensive quantities are means, and this should be indicated explicitly by setting the cell method to mean
and specifying the cell bounds.
Because the default interpretation for an intensive quantity differs from that of an extensive quantity and because this distinction may not be understood by some users of the data, it is recommended that every data variable include for each of its dimensions and each of its scalar coordinate variables the cell_methods
information of interest (unless this information would not be meaningful).
It is especially recommended that cell_methods
be explicitly specified for each spatio-temporal dimension and each spatio-temporal scalar coordinate variable.
Consider 12-hourly timeseries of pressure, temperature and precipitation from a number of stations, where pressure is measured instantaneously, maximum temperature for the preceding 12 hours is recorded, and precipitation is accumulated in a rain gauge. For a period of 48 hours from 6 a.m. on 19 April 1998, the data is structured as follows:
dimensions: time = UNLIMITED; // (5 currently) station = 10; nv = 2; variables: float pressure(time,station); pressure:long_name = "pressure"; pressure:units = "kPa"; pressure:cell_methods = "time: point"; float maxtemp(time,station); maxtemp:long_name = "temperature"; maxtemp:units = "K"; maxtemp:cell_methods = "time: maximum"; float ppn(time,station); ppn:long_name = "depth of water-equivalent precipitation"; ppn:units = "mm"; ppn:cell_methods = "time: sum"; double time(time); time:long_name = "time"; time:units = "h since 1998-4-19 6:0:0"; time:bounds = "time_bnds"; double time_bnds(time,nv); data: time = 0., 12., 24., 36., 48.; time_bnds = -12.,0., 0.,12., 12.,24., 24.,36., 36.,48.;
Note that in this example the time axis values coincide with the end of each interval.
It is sometimes desirable, however, to use the midpoint of intervals as coordinate values for variables that are representative of an interval.
An application may simply obtain the midpoint values by making use of the boundary data in time_bnds
.
7.3.1. Statistics for more than one axis
If more than one cell method is to be indicated, they should be arranged in the order they were applied.
The left-most operation is assumed to have been applied first.
Suppose, for example, that within each grid cell a quantity varies in both longitude and time and that these dimensions are named "lon" and "time", respectively.
Then values representing the time-average of the zonal maximum are labeled cell_methods="lon: maximum time: mean"
(i.e. find the largest value at each instant of time over all longitudes, then average these maxima over time); values of the zonal maximum of time-averages are labeled cell_methods="time: mean lon: maximum"
.
If the methods could have been applied in any order without affecting the outcome, they may be put in any order in the cell_methods
attribute.
If a data value is representative of variation over a combination of axes, a single method should be prefixed by the names of all the dimensions involved (listed in any order, since in this case the order must be immaterial).
Dimensions should be grouped in this way only if there is an essential difference from treating the dimensions individually.
For instance, the standard deviation of topographic height within a longitude-latitude gridbox could have cell_methods="lat: lon: standard_deviation"
.
(Note also, that in accordance with the recommendation of the following paragraph, this could be equivalently and preferably indicated by cell_methods="area: standard_deviation"
.)
This is not the same as cell_methods="lon: standard_deviation lat: standard_deviation"
, which would mean finding the standard deviation along each parallel of latitude within the zonal extent of the gridbox, and then the standard deviation of these values over latitude.
To indicate variation over horizontal area, it is recommended that instead of specifying the combination of horizontal dimensions, the special string "area
" be used.
The common case of an area-mean can thus be indicated by cell_methods="area: mean"
(rather than, for example, "lon: lat: mean
").
The horizontal coordinate variables to which "area
" refers are in this case not explicitly indicated in cell_methods
but can be identified, if necessary, from attributes attached to the coordinate variables, scalar coordinate variables, or auxiliary coordinate variables, as described in Chapter 4, Coordinate Types.
7.3.2. Recording the spacing of the original data and other information
To indicate more precisely how the cell method was applied, extra information may be included in parentheses ( )
at the end of the word list describing the method, after the operation and any where
, over
and within
phrases.
This information includes standardized and non-standardized parts.
Currently the only standardized information is to provide the typical interval between the original data values to which the method was applied, in the situation where the present data values are statistically representative of original data values which had a finer spacing.
The syntax is (interval
: value unit), where value is a numerical value and unit is a string that can be recognized by UNIDATA’s UDUNITS package [UDUNITS].
The unit will usually be dimensionally equivalent to the unit of the corresponding dimension, but this is not required (which allows, for example, the interval for a standard deviation calculated from points evenly spaced in distance along a parallel to be reported in units of length even if the zonal coordinate of the cells is given in degrees).
Recording the original interval is particularly important for standard deviations.
For example, the standard deviation of daily values could be indicated by cell_methods="time: standard_deviation (interval: 1 day)"
and of annual values by cell_methods="time: standard_deviation (interval: 1 year)"
.
If the cell method applies to a combination of axes, they may have a common original interval e.g. cell_methods="lat: lon: standard_deviation (interval: 10 km)"
.
Alternatively, they may have separate intervals, which are matched to the names of axes by position e.g. cell_methods="lat: lon: standard_deviation (interval: 0.1 degree_N interval: 0.2 degree_E)"
, in which 0.1 degree applies to latitude and 0.2 degree to longitude.
If there is both standardized and non-standardized information, the non-standardized follows the standardized information and the keyword comment:
.
If there is no standardized information, the keyword comment:
should be omitted.
For instance, an area-weighted mean over latitude could be indicated as lat: mean (area-weighted)
or lat: mean (interval: 1 degree_north comment: area-weighted)
.
A dimension of size one may be the result of "collapsing" an axis by some statistical operation, for instance by calculating a variance from time series data.
We strongly recommend that dimensions of size one be retained (or scalar coordinate variables be defined) to enable documentation of the method (through the cell_methods
attribute) and its domain (through the bounds
attribute).
The variance of the diurnal cycle on 1 January 1990 has been calculated from hourly instantaneous surface air temperature measurements. The time dimension of size one has been retained.
dimensions: lat=90; lon=180; time=1; nv=2; variables: float TS_var(time,lat,lon); TS_var:long_name="surface air temperature variance" TS_var:units="K2"; TS_var:cell_methods="time: variance (interval: 1 hr comment: sampled instantaneously)"; float time(time); time:units="days since 1990-01-01 00:00:00"; time:bounds="time_bnds"; float time_bnds(time,nv); data: time=.5; time_bnds=0.,1.;
Notice that a parenthesized comment in the cell_methods
attribute provides the nature of the samples used to calculate the variance.
7.3.3. Statistics applying to portions of cells
By default, the statistical method indicated by cell_methods
is assumed to have been evaluated over the entire horizontal area of the cell.
Sometimes, however, it is useful to limit consideration to only a portion of a cell (e.g. a mean over the sea-ice area).
Cell portions are referred to by means of standardised area_type
strings, maintained in the area-type table, using one of two conventions.
The first convention is a method that can be used for the common case of a single area-type.
In this case, the cell_methods
attribute may include a string of the form "name: method where
type".
Here name could, for example, be area
and type may be any of the standardised area_type
strings.
As an example, if the method were mean
and the area_type
were sea_ice
, then the data would represent a mean over only the sea ice portion of the grid cell.
If the data writer expects type to be interpreted as one of the standard area_type
strings, then none of the variables in the netCDF file should be given a name identical to that of the string (because the second convention, described in the next paragraph, takes precedence).
The second convention is the more general.
In this case, the cell_methods
entry is of the form "name: method where
typevar".
Here typevar is a string-valued auxiliary coordinate variable or string-valued scalar coordinate variable (see Section 6.1, "Labels") with a standard_name
of area_type
.
The variable typevar contains the name(s) of the selected portion(s) of the grid cell to which the method is applied.
These name(s) must be a subset of the standardised area_type
strings.
This convention can accommodate cases in which a method is applied to more than one area type and the result is stored in a single data variable (with a dimension which ranges across the various area types).
It provides a convenient way to store output from land surface models, for example, since they deal with many area types within each surface gridbox (e.g., vegetation
, bare_ground
, snow
, etc.).
dimensions: lat=73; lon=96; maxlen=20; ls=2; variables: float surface_temperature(lat,lon); surface_temperature:cell_methods="area: mean where land"; float surface_upward_sensible_heat_flux(ls,lat,lon); surface_upward_sensible_heat_flux:coordinates="land_sea"; surface_upward_sensible_heat_flux:cell_methods="area: mean where land_sea"; char land_sea(ls,maxlen); land_sea:standard_name="area_type"; data: land_sea="land","sea";
If the method is mean
, various ways of calculating the mean can be distinguished in the cell_methods
attribute with a string of the form "mean where type1 [over type2]".
Here, type1 can be any of the possibilities allowed for typevar or type (as specified in the two paragraphs preceding above Example).
The same options apply to type2, except it is not allowed to be the name of an auxiliary coordinate variable with a dimension greater than one (ignoring the possible dimension accommodating the maximum string length).
A cell_methods
attribute with a string of the form "mean where type1 over type2" indicates the mean is calculated by summing over the type1 portion of the cell and dividing by the area of the type2 portion.
In particular, a cell_methods
string of the form "mean where all_area_types over type2" indicates the mean is calculated by summing over all types of area within the cell and dividing by the area of the type2 portion.
(Note that all_area_types
is one of the valid strings permitted for a variable with the standard_name
area_type
.)
If "over type2" is omitted, the mean is calculated by summing over the type1 portion of the cell and dividing by the area of this portion.
variables: float sea_ice_thickness(lat,lon); sea_ice_thickness:cell_methods="area: mean where sea_ice over sea"; sea_ice_thickness:standard_name="sea_ice_thickness"; sea_ice_thickness:units="m"; float snow_thickness(lat,lon); snow_thickness:cell_methods="area: mean where sea_ice over sea"; snow_thickness:standard_name="lwe_thickness_of_surface_snow_amount"; snow_thickness:units="m";
In the case of sea-ice thickness, the phrase “where sea_ice” could be replaced by “where all_area_types” without changing the meaning since the integral of sea-ice thickness over all area types is obviously the same as the integral over the sea-ice area only. In the case of snow thickness, “where sea_ice” differs from “where all_area_types” because “where sea_ice” excludes snow on land from the average.
7.3.4. Cell methods when there are no coordinates
To provide an indication that a particular cell method is relevant to the data without having to provide a precise description of the corresponding cell, the "name" that appears in a "name: method" pair may be an appropriate standard_name
(which identifies the dimension) or the string, "area" (rather than the name of a scalar coordinate variable or a dimension with a coordinate variable).
This convention cannot be used, however, if the name of a dimension or scalar coordinate variable is identical to name.
There are two situations where this convention is useful.
First, it allows one to provide some indication of the method when the cell coordinate range cannot be precisely defined.
For example, a climatological mean might be based on any data that exists, and, in general, the data might not be available over the same time periods everywhere.
In this case, the time range would not be well defined (because it would vary, depending on location), and it could not be precisely specified through a time dimension’s bounds.
Nevertheless, useful information can be conveyed by a cell_methods
entry of "time: mean
" (where time
, it should be noted, is a valid standard_name
).
(As required by this convention, it is assumed here that for the data referred to by this cell_methods
attribute, "time" is not a dimension or coordinate variable.)
Second, for a few special dimensions, this convention allows one to indicate (without explicitly defining the coordinates) that the method applies to the domain covering the entire permitted range of those dimensions. This is allowed only for longitude, latitude, and area (indicating a combination of horizontal coordinates). For longitude, the domain is indicated according to this provision by the string "longitude" (rather than the name of a longitude coordinate variable), and this implies that the method applies to all possible longitudes (i.e., from 0E to 360E). For latitude, the string "latitude" is used and implies the method applies to all possible latitudes (i.e., from 90S to 90N). For area, the string "area" is used and implies the method applies to the whole world.
In the second case if, in addition, the data variable has a dimension with a corresponding labeled axis that specifies a geographic region (Section 6.1.1, "Geographic Regions"), the implied range of longitude and latitude is the valid range for each specified region, or in the case of area
the domain is the geographic region.
For example, there could be a cell_methods
entry of "longitude: mean
", where longitude
is not the name of a dimension or coordinate variable (but is one of the special cases given above).
That would indicate a mean over all longitudes.
Note, however, that if in addition the data variable had a scalar coordinate variable with a standard_name
of region
and a value of atlantic_ocean
, it would indicate a mean over longitudes that lie within the Atlantic Ocean, not all longitudes.
We recommend that whenever possible, cell bounds should be supplied by giving the variable a dimension of size one and attaching bounds to the associated coordinate variable.
7.4. Climatological Statistics
Climatological statistics may be derived from corresponding portions of the annual cycle in a set of years, e.g., the average January temperatures in the climatology of 1961-1990, where the values are derived by averaging the 30 Januarys from the separate years. Portions of the climatological cycle are specified by references to dates within the calendar year. However, a calendar year is not a well-defined unit of time, because it differs between leap years and other years, and among calendars. Nonetheless for practical purposes we wish to compare statistics for months or seasons from different calendars, and to make climatologies from a mixture of leap years and other years. Hence we provide special conventions for indicating dates within the climatological year. Climatological statistics may also be derived from corresponding portions of a range of days, for instance the average temperature for each hour of the average day in April 1997. In addition the two concepts may be used at once, for instance to indicate not April 1997, but the average April of the five years 1995-1999.
Climatological variables have a climatological time axis.
Like an ordinary time axis, a climatological time axis may have a dimension of unity (for example, a variable containing the January average temperatures for 1961-1990), but often it will have several elements (for example, a climatological time axis with a dimension of 12 for the climatological average temperatures in each month for 1961-1990, a dimension of 3 for the January mean temperatures for the three decades 1961-1970, 1971-1980, 1981-1990, or a dimension of 24 for the hours of an average day).
Intervals of climatological time are conceptually different from ordinary time intervals; a given interval of climatological time represents a set of subintervals which are not necessarily contiguous.
To indicate this difference, a climatological time coordinate variable does not have a bounds
attribute.
Instead, it has a climatology
attribute, which names a variable with dimensions (n,2), n being the dimension of the climatological time axis.
Using the units and calendar of the time coordinate variable, element (i,0) of the climatology variable specifies the beginning of the first subinterval and element (i,1) the end of the last subinterval used to evaluate the climatological statistics with index i in the time dimension.
The time coordinates should be values that are representative of the climatological time intervals, such that an application which does not recognise climatological time will nonetheless be able to make a reasonable interpretation.
For compatibility with the COARDS standard, a climatological time coordinate in the default standard
and julian
calendars may be indicated by setting the datetime reference string in the time coordinate’s units
attribute to midnight at 0 degrees_east
on 1 January in year 0 (i.e., since 0-1-1
).
This convention is deprecated because it does not provide any information about the intervals used to compute the climatology, and there may be inconsistencies among software packages in the interpretation of the time coordinates with a reference time of year 0.
Use of year 0 for this purpose is impossible in all other calendars, because year 0 is a valid year.
A climatological axis may use different statistical methods to represent variation among years, within years and within days.
For example, the average January temperature in a climatology is obtained by averaging both within years and over years.
This is different from the average January-maximum temperature and the maximum January-average temperature.
For the former, we first calculate the maximum temperature in each January, then average these maxima; for the latter, we first calculate the average temperature in each January, then find the largest one.
As usual, the statistical operations are recorded in the cell_methods
attribute, which may have two or three entries for the climatological time dimension.
Valid values of the cell_methods
attribute must be in one of the forms from the following list.
The intervals over which various statistical methods are applied are determined by decomposing the date and time specifications of the climatological time bounds of a cell, as recorded in the variable named by the climatology
attribute.
(The date and time specifications must be calculated from the time coordinates expressed in units of "time interval since reference date and time".)
In the descriptions that follow we use the abbreviations y, m, d, H, M, and S for year, month, day, hour, minute, and second respectively.
The suffix 0 indicates the earlier bound and 1 the latter.
- time: method1
within years
time: method2over years
-
method1 is applied to the time intervals (mdHMS0-mdHMS1) within individual years and method2 is applied over the range of years (y0-y1).
- time: method1
within days
time: method2over days
-
method1 is applied to the time intervals (HMS0-HMS1) within individual days and method2 is applied over the days in the interval (ymd0-ymd1).
- time: method1
within days
time: method2over days
time: method3over years
-
method1 is applied to the time intervals (HMS0-HMS1) within individual days and method2 is applied over the days in the interval (md0-md1), and method3 is applied over the range of years (y0-y1).
The methods which can be specified are those listed in Appendix E, Cell Methods and each entry in the cell_methods
attribute may also, as usual, contain non-standardised information in parentheses after the method.
For instance, a mean over ENSO years might be indicated by "time: mean over years (ENSO years)
".
When considering intervals within years, if the earlier climatological time bound is later in the year than the later climatological time bound, it implies that the time intervals for the individual years run from each year across January 1 into the next year e.g. DJF intervals run from December 1 0:00 to March 1 0:00. Analogous situations arise for daily intervals running across midnight from one day to the next.
When considering intervals within days, if the earlier time of day is equal to the later time of day, then the method is applied to a full 24 hour day.
We have tried to make the examples in this section easier to understand by translating all time coordinate values to date and time formats. This is not currently valid CDL syntax.
This example shows the metadata for the average seasonal-minimum temperature for the four standard climatological seasons MAM JJA SON DJF, made from data for March 1960 to February 1991.
dimensions: time=4; nv=2; variables: float temperature(time,lat,lon); temperature:long_name="surface air temperature"; temperature:cell_methods="time: minimum within years time: mean over years"; temperature:units="K"; double time(time); time:climatology="climatology_bounds"; time:units="days since 1960-1-1"; double climatology_bounds(time,nv); data: // time coordinates translated to datetime format time="1960-4-16", "1960-7-16", "1960-10-16", "1961-1-16" ; climatology_bounds="1960-3-1", "1990-6-1", "1960-6-1", "1990-9-1", "1960-9-1", "1990-12-1", "1960-12-1", "1991-3-1" ;
Average January precipitation totals are given for each of the decades 1961-1970, 1971-1980, 1981-1990.
dimensions: time=3; nv=2; variables: float precipitation(time,lat,lon); precipitation:long_name="precipitation amount"; precipitation:cell_methods="time: sum within years time: mean over years"; precipitation:units="kg m-2"; double time(time); time:climatology="climatology_bounds"; time:units="days since 1901-1-1"; double climatology_bounds(time,nv); data: // time coordinates translated to datetime format time="1965-1-15", "1975-1-15", "1985-1-15" ; climatology_bounds="1961-1-1", "1970-2-1", "1971-1-1", "1980-2-1", "1981-1-1", "1990-2-1" ;
Hourly average temperatures are given for April 1997.
dimensions: time=24; nv=2; variables: float temperature(time,lat,lon); temperature:long_name="surface air temperature"; temperature:cell_methods="time: mean within days time: mean over days"; temperature:units="K"; double time(time); time:climatology="climatology_bounds"; time:units="hours since 1997-4-1"; double climatology_bounds(time,nv); data: // time coordinates translated to datetime format time="1997-4-1 0:30", "1997-4-1 1:30", ... "1997-4-1 23:30" ; climatology_bounds="1997-4-1 0:00", "1997-4-30 1:00", "1997-4-1 1:00", "1997-4-30 2:00", ... "1997-4-1 23:00", "1997-5-1 0:00" ;
Number of frost days during NH winter 2007-2008, and maximum length of spells of consecutive frost days.
A "frost day" is defined as one during which the minimum temperature falls below freezing point (0 degC).
This is described as a climatological statistic, in which the minimum temperature is first calculated within each day, and then the number of days or spell lengths meeting the specified condition are evaluated.
In this operation, the standard name is also changed; the original data are air_temperature
.
variables: float n1(lat,lon); n1:standard_name="number_of_days_with_air_temperature_below_threshold"; n1:coordinates="threshold time"; n1:cell_methods="time: minimum within days time: sum over days"; float n2(lat,lon); n2:standard_name="spell_length_of_days_with_air_temperature_below_threshold"; n2:coordinates="threshold time"; n2:cell_methods="time: minimum within days time: maximum over days"; float threshold; threshold:standard_name="air_temperature"; threshold:units="degC"; double time; time:climatology="climatology_bounds"; time:units="days since 2000-6-1"; double climatology_bounds(time,nv); data: // time coordinates translated to datetime format time="2008-1-16 6:00"; climatology_bounds="2007-12-1 6:00", "2008-3-1 6:00"; threshold=0.;
This is a modified version of the previous example, "Temperature for each hour of the average day". It now applies to April from a 1961-1990 climatology.
variables: float temperature(time,lat,lon); temperature:long_name="surface air temperature"; temperature:cell_methods="time: mean within days ", "time: mean over days time: mean over years"; temperature:units="K"; double time(time); time:climatology="climatology_bounds"; time:units="days since 1961-1-1"; double climatology_bounds(time,nv); data: // time coordinates translated to datetime format time="1961-4-1 0:30", "1961-4-1 1:30", ..., "1961-4-1 23:30" ; climatology_bounds="1961-4-1 0:00", "1990-4-30 1:00", "1961-4-1 1:00", "1990-4-30 2:00", ... "1961-4-1 23:00", "1990-5-1 0:00" ;
Maximum of daily precipitation amounts for each of the three months June, July and August 2000 are given. The first daily total applies to 6 a.m. on 1 June to 6 a.m. on 2 June, the 30th from 6 a.m. on 30 June to 6 a.m. on 1 July. The maximum of these 30 values is stored under time index 0 in the precipitation array.
dimensions: time=3; nv=2; variables: float precipitation(time,lat,lon); precipitation:long_name="Accumulated precipitation"; precipitation:cell_methods="time: sum within days time: maximum over days"; precipitation:units="kg"; double time(time); time:climatology="climatology_bounds"; time:units="days since 2000-6-1"; double climatology_bounds(time,nv); data: // time coordinates translated to datetime format time="2000-6-16", "2000-7-16", "2000-8-16" ; climatology_bounds="2000-6-1 6:00:00", "2000-7-1 6:00:00", "2000-7-1 6:00:00", "2000-8-1 6:00:00", "2000-8-1 6:00:00", "2000-9-1 6:00:00" ;
7.5. Geometries
For many geospatial applications, data values are associated with a geometry, which is a spatial representation of a real-world feature, for instance a time-series of areal average precipitation over a watershed. Polygonal cells with an arbitrary number of vertices can be described using Section 7.1, "Cell Boundaries", but in that case every cell must have the same number of vertices and must be a single polygon ring. In contrast, each geometry may have a different number of nodes, the geometries may be lines (as alternatives to points and polygons), and they may be multipart, i.e., include several disjoint parts. While line and point geometries don’t describe an interval along a dimension as the traditional cell bounds described above do, they do describe the extent of a geometry or real-world feature so are included in this section. The approach described here specifies how to encode such geometries following the pattern in 9.3.3 Contiguous ragged array representation and attach them to variables in a way that is consistent with the cell bounds approach.
All geometries are made up of one or more nodes. The geometry type specifies the set of topological assumptions to be applied to relate the nodes (see Table 7.1). For example, multipoint and line geometries are nearly the same except nodes are interpreted as being connected for lines. Lines and polygons are also nearly the same except that the first and last nodes are assumed to be connected for polygons. Note that CF does not require the first and last node to be identical but allows them to be coincident if desired. Polygons that have holes, such as waterbodies in a land unit, are encoded as a collection of polygon ring parts, each identified as exterior or interior polygons. Multipart geometries, such as multiple lines representing the same river or multiple islands representing the same jurisdiction, are encoded as collections of unconnected points, lines, or polygons that are logically grouped into a single geometry.
Any data variable can be given a geometry
attribute that indicates the geometry for the quantity held in the variable.
One of the dimensions of the data variable must be the number of geometries to which the data applies.
As shown in Example 7.15, if the data variable has a discrete sampling geometry, the number of geometries is the length of the instance dimension (Section 9.2).
dimensions: instance = 2 ; node = 5 ; time = 4 ; variables: int time(time) ; time:units = "days since 2000-01-01" ; double lat(instance) ; lat:units = "degrees_north" ; lat:standard_name = "latitude" ; lat:nodes = "y" ; double lon(instance) ; lon:units = "degrees_east" ; lon:standard_name = "longitude" ; lon:nodes = "x" ; int datum ; datum:grid_mapping_name = "latitude_longitude" ; datum:longitude_of_prime_meridian = 0.0 ; datum:semi_major_axis = 6378137.0 ; datum:inverse_flattening = 298.257223563 ; int geometry_container ; geometry_container:geometry_type = "line" ; geometry_container:node_count = "node_count" ; geometry_container:node_coordinates = "x y" ; int node_count(instance) ; double x(node) ; x:units = "degrees_east" ; x:standard_name = "longitude" ; x:axis = "X" ; double y(node) ; y:units = "degrees_north" ; y:standard_name = "latitude" ; y:axis = "Y" ; double someData(instance, time) ; someData:coordinates = "time lat lon" ; someData:grid_mapping = "datum" ; someData:geometry = "geometry_container" ; // global attributes: :featureType = "timeSeries" ; data: time = 1, 2, 3, 4 ; lat = 30, 50 ; lon = 10, 60 ; someData = 1, 2, 3, 4, 1, 2, 3, 4 ; node_count = 3, 2 ; x = 30, 10, 40, 50, 50 ; y = 10, 30, 40, 60, 50 ;
The time series variable, someData, is associated with line geometries via the geometry attribute. The first line geometry is comprised of three nodes, while the second has two nodes. Client applications unaware of CF geometries can fall back to the lat and lon variables to locate feature instances in space. In this example, lat and lon coordinates are identical to the first node in each line geometry, though any representative point could be used.
A geometry container variable acts as a container for attributes that describe a set of geometries.
The geometry
attribute of the data variable contains the name of a geometry container variable.
The geometry container variable must hold geometry_type
and node_coordinates
attributes.
The grid_mapping
and coordinates
attributes can be carried by the geometry container variable provided they are also carried by the data variables associated with the container.
The geometry_type
attribute indicates the type of geometry present.
Its allowable values are: point, line, polygon.
Multipart geometries are allowed for all three geometry types.
For example, polygon geometries could include single part geometries like the State of Colorado and multipart geometries like the State of Hawaii.
The node_coordinates
attribute contains the blank-separated names of the variables that contain geometry node coordinates (one variable for each spatial dimension).
The geometry node coordinate variables must each have an axis
attribute whose allowable values are X, Y, and Z.
If a coordinates
attribute is carried by the geometry container variable or its parent data variable, then those coordinate variables that have a meaningful correspondence with node coordinates are indicated as such by a nodes
attribute that names the corresponding node coordinates, but only if the grid_mapping
associated with the geometry node variables is the same as that of the coordinate variables.
If a different grid mapping is used, then the provided coordinates must not have the nodes
attribute.
Whether linked to normal CF space-time coordinates with a nodes
attribute or not, inclusion of such coordinates is recommended to maintain backward compatibility with software that has not implemented geometry capabilities.
The geometry node coordinate variables must all have the same single dimension, which is the total number of nodes in all the geometries. The nodes must be stored consecutively for each geometry and in the order of the geometries, and within each multipart geometry the nodes must be stored consecutively for each part and in the order of the parts. Polygon exterior rings must be stored before any interior rings they may contain. Nodes for polygon exterior rings must be ordered using the right-hand rule, e.g., anticlockwise in the lon-lat plane as viewed from above. Polygon interior rings must be in clockwise order. They are put in opposite orders to facilitate calculation of area and consistency with the typical implementation pattern.
When more than one geometry instance is present, the geometry container variable must have a node_count
attribute that contains the name of a variable indicating the count of nodes per geometry.
The node count is the total number of nodes in all the parts.
The exception is when all geometries are single part point geometries, in which case a node count is not needed since each geometry contains a single node.
However in that case, the dimension of the node coordinate variables must be one of the dimensions of the data variable (because it serves also as the instance dimension for geometries).
For multipart lines, multipart polygons, and polygons with holes, the geometry container variable must have a part_node_count
attribute that indicates a variable of the count of nodes per geometry part.
Note that because multipoint geometries always have a single node per part, the part_node_count
is not required for point geometry types.
The single dimension of the part node count variable must equal the total number of parts in all the geometries.
For polygon geometries with holes, the geometry container variable must have an interior_ring
attribute that contains the name of a variable that indicates if the polygon parts are interior rings (i.e., holes) or not.
This interior ring variable must contain the value 0 to indicate an exterior ring polygon and 1 to indicate an interior ring polygon.
The single dimension of the interior ring variable must be the same dimension as that of the part node count variable.
The geometry types included in this convention are listed in Table 7.1.
geometry_type | Dimensionality | Description of Geometry Instance | Additional required attributes on geometry container variable |
---|---|---|---|
point |
0 |
A collection of one or more points, where a point is a single location in space |
node_count (if multipart geometries are present) |
line |
1 |
A collection of one or more lines, where a line is an ordered set of data points connected by linearly interpolating between points |
node_count, part_node_count (if multipart geometries are present) |
polygon |
2 |
A collection of one or more polygons, where a polygon is a planar surface comprised of an exterior ring and zero or more interior rings (i.e., holes), where a ring is a closed line (i.e., the last point in the line is assumed to be connected to the first point) |
node_count, part_node_count (if holes or multipart geometries are present), interior_ring (if holes are present) |
This example demonstrates all potential attributes and variables for encoding geometries.
dimensions: node = 12 ; instance = 2 ; part = 4 ; time = 4 ; variables: int time(time) ; time:units = "days since 2000-01-01" ; double x(node) ; x:units = "degrees_east" ; x:standard_name = "longitude" ; x:axis = "X" ; double y(node) ; y:units = "degrees_north" ; y:standard_name = "latitude" ; y:axis = "Y" ; double lat(instance) ; lat:units = "degrees_north" ; lat:standard_name = "latitude" ; lat:nodes = "y" ; double lon(instance) ; lon:units = "degrees_east" ; lon:standard_name = "longitude" ; lon:nodes = "x" ; float geometry_container ; geometry_container:geometry_type = "polygon" ; geometry_container:node_count = "node_count" ; geometry_container:node_coordinates = "x y" ; geometry_container:grid_mapping = "datum" ; geometry_container:coordinates = "lat lon" ; geometry_container:part_node_count = "part_node_count" ; geometry_container:interior_ring = "interior_ring" ; int node_count(instance) ; int part_node_count(part) ; int interior_ring(part) ; float datum ; datum:grid_mapping_name = "latitude_longitude" ; datum:semi_major_axis = 6378137. ; datum:inverse_flattening = 298.257223563 ; datum:longitude_of_prime_meridian = 0. ; double someData(instance, time) ; someData:coordinates = "time lat lon" ; someData:grid_mapping = "datum" ; someData:geometry = "geometry_container" ; // global attributes: :featureType = "timeSeries" ; data: time = 1, 2, 3, 4 ; x = 20, 10, 0, 5, 10, 15, 20, 10, 0, 50, 40, 30 ; y = 0, 15, 0, 5, 10, 5, 20, 35, 20, 0, 15, 0 ; lat = 25, 7 ; lon = 10, 40 ; node_count = 9, 3 ; part_node_count = 3, 3, 3, 3 ; interior_ring = 0, 1, 0, 0 ; someData = 1, 2, 3, 4, 1, 2, 3, 4 ;
8. Reduction of Dataset Size
There are three methods for reducing dataset size: packing, lossless compression, and lossy compression. By packing we mean altering the data in a way that reduces its precision (but has no other effect on accuracy). By lossless compression we mean techniques that store the data more efficiently and result in no loss of precision or accuracy. By lossy compression we mean techniques that either store the data more efficiently and retain its precision but result in some loss in accuracy, or techniques that intentionally reduce data precision to improve the efficiency of subsequent lossless compression.
Lossless compression only works in certain circumstances, e.g., when a variable contains a significant amount of missing or repeated data values.
In this case it is possible to make use of standard utilities, e.g., UNIX compress
or GNU gzip
, to compress the entire file after it has been written.
In this section we offer an alternative compression method that is applied on a variable by variable basis.
This has the advantage that only one variable need be uncompressed at a given time.
The disadvantage is that generic utilities that don’t recognize the CF conventions will not be able to operate on compressed variables.
8.1. Packed Data
At the current time the netCDF interface does not provide for packing data.
However a simple packing may be achieved through the use of the optional [NUG] defined attributes scale_factor
and add_offset
.
After the data values of a variable have been read, they are to be multiplied by the scale_factor
, and have add_offset
added to them.
If both attributes are present, the data are scaled before the offset is added.
When scaled data are written, the application should first subtract the offset and then divide by the scale factor.
The units of a variable should be representative of the unpacked data.
This standard is more restrictive than the [NUG] with respect to the use of the scale_factor
and add_offset
attributes; ambiguities and precision problems related to data type conversions are resolved by these restrictions.
When packed data is written, the scale_factor
and add_offset
attributes must be of the same type as the unpacked data, which must be either float
or double
. Data of type float
must be packed into one of these types: byte
, unsigned byte
, short
, unsigned short
. Data of type double
must be packed into one of these types: byte
, unsigned byte
, short
, unsigned short
, int
, unsigned int
.
When packed data is read, it should be unpacked to the type of the scale_factor
and add_offset
attributes, which must have the same type if both are present. For guidance only, we suggest that packed data which does not conform to the rules of this section regarding the types of the data variable and attributes should be unpacked to double
type, in order to minimise the risk of loss of precision.
When data to be packed contains missing values the attributes that indicate missing values (_FillValue
, valid_min
, valid_max
, valid_range
) must be of the same data type as the packed data.
See Section 2.5.1, "Missing data, valid and actual range of data" for a discussion of how applications should treat variables that have attributes indicating both missing values and transformations defined by a scale and/or offset.
8.2. Lossless Compression by Gathering
To save space in the netCDF file, it may be desirable to eliminate points from data arrays that are invariably missing. Such a compression can operate over one or more adjacent axes, and is accomplished with reference to a list of the points to be stored. The list is constructed by considering a mask array that only includes the axes to be compressed, and then mapping this array onto one dimension without reordering. The list is the set of indices in this one-dimensional mask of the required points. In the compressed array, the axes to be compressed are all replaced by a single axis, whose dimension is the number of wanted points. The wanted points appear along this dimension in the same order they appear in the uncompressed array, with the unwanted points skipped over. Compression and uncompression are executed by looping over the list.
The list is stored as the coordinate variable for the compressed axis of the data variable.
Thus, the list variable and its dimension have the same name.
If any auxiliary coordinate variable has all the dimensions to be compressed, adjacent and in the same order as in the data variable, and if the auxiliary coordinate variable has missing data at all the points which are to be eliminated from the data variable, then the affected dimensions can optionally be replaced by the list dimension for the auxiliary coordinate variable just as for the data variable.
The list variable has a string attribute compress
, containing a blank-separated list of the dimensions which were affected by the compression in the order of the CDL declaration of the uncompressed array.
The presence of this attribute identifies the list variable as such.
The list, the original dimensions and coordinate variables (including boundary variables), and the compressed variables with all the attributes of the uncompressed variables are written to the netCDF file.
The uncompressed variables can be reconstituted exactly as they were using this information.
The list variable must not have an associated boundary variable.
We eliminate sea points at all depths in a longitude-latitude-depth array of soil temperatures.
In this case, only the longitude and latitude axes would be affected by the compression.
We construct a list landpoint(landpoint)
containing the indices of land points.
dimensions: lat=73; lon=96; landpoint=2381; depth=4; variables: int landpoint(landpoint); landpoint:compress="lat lon"; float landsoilt(depth,landpoint); landsoilt:long_name="soil temperature"; landsoilt:units="K"; float depth(depth); float lat(lat); float lon(lon); data: landpoint=363, 364, 365, ...;
Since landpoint(0)=363
, for instance, we know that landsoilt(*,0)
maps on to point 363 of the original data with dimensions (lat,lon)
.
This corresponds to indices (3,75)
, i.e., 363 = 3*96 + 75
.
We compress a longitude-latitude-depth field of ocean salinity by eliminating points below the sea-floor. In this case, all three dimensions are affected by the compression, since there are successively fewer active ocean points at increasing depths.
variables: float salinity(time,oceanpoint); int oceanpoint(oceanpoint); oceanpoint:compress="depth lat lon"; float depth(depth); float lat(lat); float lon(lon); double time(time);
This information implies that the salinity field should be uncompressed to an array with dimensions (depth,lat,lon)
.
In A single timeseries with time-varying deviations from a nominal point spatial location, two auxiliary coordinate variables are compressed as described in this section, although their data variable is not.
8.3. Lossy Compression by Coordinate Subsampling
For some applications the coordinates of a data variable can require considerably more storage than the data itself. Space may be saved in the netCDF file by storing a subsample of the coordinates that describe the data. The uncompressed coordinate and auxiliary coordinate variables can be reconstituted by interpolation, from the subsampled coordinate values to the domain of the data (i.e. the target domain). This process will likely result in a loss in accuracy (as opposed to precision) in the uncompressed variables, due to rounding and approximation errors in the interpolation calculations, but it is assumed that these errors will be small enough to not be of concern to users of the uncompressed dataset. The creator of the compressed dataset can control the accuracy of the reconstituted coordinates through the degree of subsampling and the choice of interpolation method, see Appendix J, Coordinate Interpolation Methods.
The subsampled coordinates are called tie points and are stored in tie point coordinate variables.
In addition to the tie point coordinate variables themselves, metadata defining the coordinate interpolation method is stored in attributes of the data variable and of the associated interpolation variable. The partitioning of metadata between the data variable and the interpolation variable has been designed to minimise redundancy and maximise the reusability of the interpolation variable within a dataset.
The metadata that define the interpolation formula and its inputs are complete, so that the results of the coordinate reconstitution process are well defined and of a predictable accuracy.
8.3.1. Tie Points and Interpolation Subareas
Reconstitution of the uncompressed coordinate and auxiliary coordinate variables is based on interpolation. To accomplish this, the target domain is segmented into smaller interpolation subareas, for each of which the interpolation method is applied independently. For one-dimensional interpolation, an interpolation subarea is defined by two tie points, one at each end of the interpolation subarea; for two-dimensional interpolation, an interpolation subarea is defined by four tie points, one at each corner of a rectangular area aligned with the domain axes; etc. For the reconstitution of the uncompressed coordinate and auxiliary coordinate variables within an interpolation subarea, the interpolation method is permitted to access its defining tie points, and no others.
As an interpolation method relies on the regularity and continuity of the coordinate values within each interpolation subarea, special attention must be given to the case when uncompressed coordinates contain discontinuities. A discontinuity could be an overlap or a gap in the coordinates' coverage, or a change in cell size or cell alignment. As an example, such discontinuities are common in remote sensing data and may be caused by combinations of the instrument scan motion, the motion of the sensor platform and changes in the instrument scan mode. When discontinuities are present, the domain is first divided into multiple continuous areas, each of which is free of discontinuities. When no discontinuities are present, the whole domain is a single continuous area. Following this step, each continuous area is segmented into interpolation subareas. The processes of generating interpolation subareas for a domain without discontinuities and for a domain with discontinuities is illustrated in Figure 8.1, and described in more detail in Appendix J, Coordinate Interpolation Methods.
For each interpolated dimension, i.e. a target domain dimension for which coordinate interpolation is required, the locations of the tie point coordinates are defined by a corresponding tie point index variable, which also indicates the locations of the continuous areas (Section 8.3.7, "Tie Point Index Mapping").
The interpolation subareas within a continuous area do not overlap, ensuring that each coordinate of an interpolated dimension is computed from a unique interpolation subarea.
These interpolation subareas, however, share the tie point coordinates that define their common boundaries.
Such a shared tie point coordinate can only be located in one of a pair of adjacent interpolation subareas, which is always the first of the pair in index space.
For instance, in Figure 8.1, the interpolation subarea labelled (0,0)
contains all four of its tie point coordinates, and the interpolation subarea (0,1)
only contains two of them.
When applied for a given interpolation subarea, interpolation methods (such as those described in Appendix J, Coordinate Interpolation Methods) must ensure that reconstituted coordinate points are only generated inside the interpolation subarea being processed, even if some of the tie point coordinates lie outside of that interpolation subarea.
Adjacent interpolation subareas that are in different continuous areas never share tie point coordinates, as consequence of the grid discontinuity between them. This results in a different number of tie point coordinates in the two cases shown in Figure 8.1.
For each interpolated dimension, the number of interpolation subareas is equal to the number of tie points minus the number of continuous areas.
Tie point coordinate variables for both coordinate and auxiliary coordinate variables must be defined as numeric data types and are not allowed to have missing values.