Dear Alejandro,
The two CMIP variables which I'm talking about are cfadDbze94 currently defined as "CFAD (Cloud Frequency Altitude Diagrams) are joint height - radar reflectivity (or lidar scattering ratio) distributions." and cfadLidarsr532, which has the same definition. If they are not joint distributions we clearly have a problem with these definitions.
>From your reply I understand now that these are univariate distributions giving the frequency of different radar reflectivities in different height bands. Coming from radar/lidar instruments (or an emulator of these instruments), there are multiple observations in each GCM-scale height band. Presumably, there are also multiple profiles in the GCM-scale grid square, so that we have a frequency distribution over sub-grid scale variability in the vertical and the horizontal? Or is it actually evaluated at a spatial point?
If this is the case, you are right and we just need to correct the definitions in the CMIP tables (though there is still a case for introducing a frequencs_distribution for other variables, but that should ne another thread). I would favour a slightly more verbose and explicit definition, e.g.
"CFAD (Cloud Frequency Altitude Diagrams) are frequency distributions of radar reflectivity (or lidar scattering ratio) as a function of altitude. cfadDbze94 is defined as the simulated relative frequency of radar reflectivity in sampling volumes defined by altitude bins and model grid cells."
Note that I'm using "altitude" rather than "height" to match the standard names: in the CF Convention, "altitude" means height above the geoid, and "height" means height above the surface.
Is that an accurate definition?
regards,
Martin
Dear Martin,
Thanks for your detailed explanation. I'd like to add a bit more information. These variables are not joint distributions, they are 1D distributions for different ranges of Z. The question is, does "histogram_of_X[_over_Z]" mean that the Z coordinate has to be completely collapsed? It is not clear to that the current definition implies that. If Z is not completely collapsed, you can then end up with a function of the form frequency(lat,lon,X,Z2), where the coordinate Z is only partially collapsed into bins described by Z2. I'm using here Z2 to explicitly show when the Z coordinate represents bins. This would look like a joint histogram, but it is not. I think that your proposal of dropping "_over_Z" from the standard name works for a joint distribution, but not for a collection of 1D distributions along Z, unless there is a way of distinguishing between both cases with the use of attributes.
Another detail is that these histograms provide relative frequencies (values between 0 and 1, not counts), not absolute frequencies. Is that inconsistent with the current definition of histogram in CF?
Regards,
Alejandro
> -----Original Message-----
> From: martin.juckes at stfc.ac.uk<http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata> [mailto:martin.juckes at stfc.ac.uk<http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>]
> Sent: 12 October 2016 19:05
> To: cf-metadata at cgd.ucar.edu<http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
> Cc: Bodas-Salcedo, Alejandro
> Subject: Usage of histogram_of_X_over_Z
>
> Hello,
>
> There are two standard names of the form histogram_of_..... in the CF Standard
> Name list (at version 36):
> histogram_of_backscattering_ratio_over_height_above_reference_ellipsoid and
> histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid
> . Both of these where used in CMIP5 and set to be used in CMIP6, but the usage
> does not appear to match the standard name desecriptions.
>
> The possible confusion is over the role of different coordinates. The CF definitions
> say ''"histogram_of_X[_over_Z]" means histogram (i.e. number of counts for each
> range of X) of variations (over Z) of X.' This implies to me that you start with a
> function of Z and possibly other coordinates and end up with a function of X and the
> other coordinates. E.g. if the source data is X(lat,lon,Z), then the histogram data will
> be of the form frequency(lat,lon,X).
>
> In the two CMIP5/CMIP6 draft variables (cfadLidarsr532, cfadDbze94) using these
> standard names the "Z" coordinate which is included in the standard name
> ("height_above_reference_ellipsoid") is one of the coordinates of the histogram data
> variable. Both these variables appear to be joint distributions (frequency of X and Y
> values) over sub-grid variability as a function of latitude, longitude and time.
>
> I've been reviewing these existing definitions in some detail because there are some
> new distribution variables in the request and I'd like to make sure that we have a
> consistent approach.
>
> If we need to described a variable which carries a joint distribution of X and Y, then
> the variable will have to use X and Y as coordinates, so perhaps we can simplify the
> process by leaving them out of the standard name. Similarly the "over_Z" part of the
> name would be better expressed as a cell_methods construct. This line of reasoning
> suggests using a new standard name such as "frequency_distribution" (units "1").
> The only difficulty is that the frequency distribution might be a function of the
> quantities X and Y (scattering ratio and cloud top height for cfadLidarsr532) and also
> of latitude, longitude and time. There should be some way of distinguishing the
> different roles of these 5 coordinates: is is the distribution of X and Y as a function of
> latitude, longitude and time. I think this could be done conveniently by introducing a
> single new attribute, e.g. "bin_coords: X Y".
>
> "frequency_distribution" could be used for single or joint distributions.
>
> My questions to the list are:
> (1) am I missing something in my interpretation of the existing histogram_of_...
> names?
> (2) if not, is the adoption of a "frequency_distribution" standard name an appropriate
> way forward?
>
> regards,
> Martin
>
> regards,
> Martin
Received on Thu Oct 13 2016 - 06:04:57 BST