Opened 13 years ago

Closed 10 years ago

#33 closed enhancement (fixed)

cell_methods for statistical indices

Reported by: Heinke Owned by: jonathan
Priority: medium Milestone:
Component: cf-conventions Version:
Keywords: count Cc:

Description

  1. Title

New cell_method 'count over days'

  1. Moderator

Jonathan Gregory

  1. Proposing

New entry for table E.1. Cell Methods:

name='count over days' Units=1 description='the number of occurences of a condition specified in the standard name'

The Units are Integer.

4.Use Cases

For statistic indices we like to add the following new standard_names:

number_of_days_with_air_temperature_below_threshold

examples:

frost_days min<0 degC = fd (IPCC AR4 and AR5)

ice days max<0 degC

frost days where no snow

number_of_days_with_air_temperature_above_threshold examples:

summer days max>25 degC

tropical nights min>20 degC

number_of_days_with_lwe_thickness_of_precipitation_amount_above_threshold

examples:

heavy precipitation >10 mm = r10 (IPCC AR4 and AR5)

very heavy precipitation >20 mm

wet days > 1mm

number_of_days_with_wind_speed_above_threshold

examples:

strong breeze days max > 10,5 m/s

strong gale days max > 20,5 m/s

hurricane days max > 32,5 m/s

To specify the standard_name the CDLs should have the following form:

/* number of days with daily minimum below 0 degC = frost_days = fd (IPCC AR4 and AR5) */ float n1(lat,lon);

n1:standard_name="number_of_days_with_temperature_below_threshold"; n1:coordinates="threshold time"; n1:cell_methods="time: count over days";

float threshold;

threshold:standard_name="air_temperature"; threshold:units="degC"; threshold:cell_methods="time: minimun within days"

data:

threshold=0.;

Best regards Heinke

Change History (10)

comment:1 in reply to: ↑ description Changed 13 years ago by jonathan

Dear Heinke

Thanks for making this proposal. Sorry for slow response. I either didn't receive or accidentally deleted (more likely) the email, and I have only just noticed the ticket.

I would suggest that the addition proposed to Table E.1 is actually just count (as indicated by the definition you have given). The over days part is already allowed by the syntax for climatological time.

I think both the within days and over days belong in the cell_methods for the variable. This too is the existing syntax for climatological time. So I think the example should read as below, and I propose it should be added to Section 7.4

Example 7.11. Number of frost days during NH winter 2007-2008. A "frost day" is defined as one during which the minimum temperature falls below freezing point (0 degC). This is described as a climatological statistic, in which the minimum temperature is first calculated within each day, and then the number of days meeting the specified condition is counted. In this operation, the standard name is also changed; the original data are air_temperature.

variables:
  float n1(lat,lon);
    n1:standard_name="number_of_days_with_air_temperature_below_threshold";
    n1:coordinates="threshold time";
    n1:cell_methods="time: minimum within days time: count over days";
  float threshold;
    threshold:standard_name="air_temperature";
    threshold:units="degC";
  double time;
    time:climatology="climatology_bounds";
    time:units="days since 2000-6-1";
  double climatology_bounds(time,nv);
data: // time coordinates translated to date/time format
  time="2008-1-16 6:00";
  climatology_bounds="2007-12-1 6:00", "2000-8-2 6:00";
  threshold=0.; 

The required standard names will also need to be proposed. Would you like to include an initial list in this ticket?

As moderator, I invite comments on this proposal. You will probably have noticed the discussion Heinke, Alison and I have had on the subject of statistical indices on the CF email list over the last many months - years, even.

Best wishes

Jonathan

comment:2 Changed 13 years ago by Heinke

Dear Jonathan,

I either didn't receive your email and I have only just noticed your respond to the ticket. Is there any option to enable the email ?

I would suggest that the addition proposed to Table E.1 is actually just count (as indicated by the definition >you have given). The over days part is already allowed by the syntax for climatological time.

I agree.

Thank you for improving my example and cell method. This looks good for me.

The required standard names will also need to be proposed. Would you like to include an initial list in this >ticket?

I included the initial list in this ticket. But it is no problem to do it again and I add somethink.

number_of_days_with_air_temperature_below_threshold number_of_days_with_air_temperature_above_threshold number_of_days_with_lwe_thickness_of_precipitation_amount_above_threshold number_of_days_with_wind_speed_above_threshold

maximum_spell_length_of_days_with_air_temperature_below_threshold

(example: consecutive frost days)

maximum_spell_length_of_days_with_air_temperature_above_threshold

(example: consecutive summer days)

maximum_spell_length_of_days_with_lwe_thickness_of_precipitation_amount_above_threshold

(example: consecutive dry days IPCC AR4 and AR5 cdd)

maximum_spell_length_of_days_with_lwe_thickness_of_precipitation_amount_below_threshold

(example: consecutive wet days)

season_length_with_air_temperature_threshold

(example: growing season length IPCC AR4 and AR5 gsl threshold start: T(day) > 5 deg C for > 5 d and end T(day) < 5 deg C for > 5 d The 5 days could be fixed for this standard name( part of name?) or need a variable )

This is only the first part of statistic indices. For the 'percentile' statistic indices and some special parameters (intra-annual extreme temperature range IPCC etr) we need a new approach.

Best wishes

Heinke

comment:3 Changed 13 years ago by jonathan

Dear Heinke

Thanks for the examples. But, having seen them, I am now having second thoughts about whether this is the right approach. Sorry to be discouraging!

The "maximum spell length" is more complicated than just "count". It implies applying the condition (wrt threshold), identifying all the spell lengths, then finding the maximum one. We could not describe this as "count"; we would need a different cell method for it of an unprecedented sort, that converts any quantity to a new quantity with units of time. I am not sure cell_methods is the correct way to describe this. The season length is an even more complicated quantity, and it's not clear to me we could describe it with cell_methods at all.

I wonder if, after all, we should use standard_names alone for these complex quantities, but continue with the idea of naming thresholds (such as air temperature) in the standard name while providing their values in (scalar) coordinate variables. That might be sufficient. In that case we could use your standard names but not introduce a new kind of cell method and we wouldn't need this ticket after all.

If anyone else has an opinion, that would be helpful.

Best wishes

Jonathan

comment:4 Changed 13 years ago by Heinke

Dear Jonathan,

I wonder if, after all, we should use standard_names alone for these complex quantities, but continue with the idea of naming >thresholds (such >as air temperature) in the standard name while providing their values in (scalar) coordinate variables. That might be >sufficient. In that case we could use your standard names but not introduce a new kind of cell method and we wouldn't need this ticket after all.

You are confusing me. Do you think that we should use this description below I have made first without a new cell method ? (I would change it a little bit n1:cell_methods="time:minimum within days" and no threshold:cell_methods)

/* number of days with daily minimum below 0 degC = frost_days = fd

(IPCC AR4 and AR5) */

float n1(lat,lon);

n1:standard_name="number_of_days_with_variable_below_threshold";[[BR]] n1:coordinates="threshold time";
n1:cell_methods="time: sum over days";

float threshold;

threshold:standard_name="air_temperature";[[BR]] threshold:units="degC";[[BR]] threshold:cell_methods="time:minimun within days"

data:

threshold=0.;

If yes, I would agree with you. It is not so complex and containes the information to construct the statistic formula. I made the ticket for the new cell method 'count' because you asked me to do so. I can live with both. But we need a decision.

The "maximum spell length" is more complicated than just "count".

This is true, but maximum_spell_length_of_days_with_air_temperature_below_threshold has the meaning: This is the greatest number of consecutive days with air_temperature_below_threshold in a given time period. The part of the formula greatest number of consecutive days is included in the standard name with the term maximum_spell_length_of_days. (season_length could be defined accordingly) With this part of the standard name the formula is defined unique. I thought that we don't want to describe the whole formula with the cell methods, only the variable part to structure the standard names.

Best wishes Heinke

comment:5 follow-up: Changed 13 years ago by jonathan

Dear Heinke

I am not deliberately confusing :-) but I do find this a difficult issue; it's because it's difficult that we've been discussing it for so long, I suppose.

Now I think that we do not need a new cell method of count. I proposed it earlier because it appeared to be a general way to describe how often conditions were satisfied. But perhaps it's better to be less general in order to address our present needs. I think we could propose these as standard names:

number_of_days_with_air_temperature_below_threshold
number_of_days_with_air_temperature_above_threshold
number_of_days_with_lwe_thickness_of_precipitation_amount_above_threshold
number_of_days_with_wind_speed_above_threshold

spell_length_of_days_with_air_temperature_below_threshold
spell_length_of_days_with_air_temperature_above_threshold
spell_length_of_days_with_lwe_thickness_of_precipitation_amount_above_threshold
spell_length_of_days_with_lwe_thickness_of_precipitation_amount_below_threshold

The definitions of these standard names will require some special conditions. Possibly they could be described in the standard name guidelines. I would suggest:

A variable whose standard name has the form number_of_days_with_X_below|above_threshold must have a coordinate variable or scalar coordinate variable with the a standard name of X to supply the threshold(s). It must have a climatological time variable, and a cell_methods entry for within days which describes the processing of quantity X before the threshold is applied. A number_of_days is an extensive quantity in time, and the cell_methods entry for over days should be sum.

A variable whose standard name has the form spell_length_of_days_with_X_below|above_threshold must have a coordinate variable or scalar coordinate variable with the a standard name of X to supply the threshold(s). It must have a climatological time variable, and a cell_method entry for within days which describes the processing of quantity X before the threshold is applied. A spell_length_of_days is an intensive quantity in time, and the cell_methods entry for over days can be any of the methods listed in Appendix E appropriate for intensive quantities e.g. maximum, minimum or mean.

Thus we could have mean or minimum spell lengths as well as maximum spell lengths, using the same standard name. But if you think that is an unnecessary generality, we can put the maximum in the standard name as you proposed. In that case I am not sure what the cell_methods should say for over days.

Although this would not be a change to the convention, the example would still be useful, I think.

Example 7.11. Number of frost days during NH winter 2007-2008, and maximum length of spells of consecutive frost days. A "frost day" is defined as one during which the minimum temperature falls below freezing point (0 degC). This is described as a climatological statistic, in which the minimum temperature is first calculated within each day, and then the number of days or spell lengths meeting the specified condition are evaluated. In this operation, the standard name is also changed; the original data are air_temperature.

variables:
  float n1(lat,lon);
    n1:standard_name="number_of_days_with_air_temperature_below_threshold";
    n1:coordinates="threshold time";
    n1:cell_methods="time: minimum within days time: sum over days";
  float n2(lat,lon);
    n2:standard_name="spell_length_of_days_with_air_temperature_below_threshold";
    n2:coordinates="threshold time";
    n2:cell_methods="time: minimum within days time: maximum over days";
  float threshold;
    threshold:standard_name="air_temperature";
    threshold:units="degC";
  double time;
    time:climatology="climatology_bounds";
    time:units="days since 2000-6-1";
  double climatology_bounds(time,nv);
data: // time coordinates translated to date/time format
  time="2008-1-16 6:00";
  climatology_bounds="2007-12-1 6:00", "2000-8-2 6:00";
  threshold=0.; 

Is this OK? I'm not really moderating this discussion; it's continuing the discussion you and I had on the email list. It would be very helpful to have someone else comment or it, or moderate it. Perhaps Alison could consider it, but she will be short of time until the school holidays end, so we may have to wait. But if you and I agree, that would be good progress!

The growing season length is a more complicated proposition.

Best wishes

Jonathan

comment:6 Changed 12 years ago by apamment

Dear Jonathan and Heinke,

I support the introduction of a new example 7.11, as given in Jonathan's last posting under this ticket, into the conventions document to explain the use of the proposed new classes of standard names number_of_days_with_X_below|above_threshold and spell_length_of_days_with_X_below|above_threshold.

It would be best if the names could be proposed again to the list, along with Jonathan's proposals for amending the Standard Name Guidelines. I think that the names, the guidelines and the example taken all together form a sound proposal that will enable many of the 'statistic indices' quantities to be described in CF metadata.

Best wishes, Alison

comment:7 Changed 12 years ago by taylor13

Dear Jonathan, Heinke, and Alison,

I have only read through this superficially, but I'm struck that this is a case where my recently suggested idea to "parse" the standard name into components might simplify and generalize things. I am therefore inclined to agree with Jonathan that no change should be made to the convention, and since we do not yet have the ability to represent standard names in terms of components, then perhaps adding the example with the proposed new standard names is a good idea.

Karl

comment:8 in reply to: ↑ 5 Changed 12 years ago by jonathan

With Alison's and Karl's support (thank you), this proposal now qualifies for approval according to the rules. The only change to the convention is to include a new example. Here I restate the proposed change to the convention, and I will propose the standard names and modifications to the guidelines by email to the CF list. If there are no further comments on this ticket, as moderator I will close the ticket and conclude that the change should be made, if and when the discussion on the email list accepts the new standard names.

Jonathan

New example, to be inserted after Example 7.10.

Example 7.11. Extreme statistics and spell-lengths. Number of frost days during NH winter 2007-2008, and maximum length of spells of consecutive frost days. A "frost day" is defined as one during which the minimum temperature falls below freezing point (0 degC). This is described as a climatological statistic, in which the minimum temperature is first calculated within each day, and then the number of days or spell lengths meeting the specified condition are evaluated. In this operation, the standard name is also changed; the original data are air_temperature.

variables:
  float n1(lat,lon);
    n1:standard_name="number_of_days_with_air_temperature_below_threshold"; 
    n1:coordinates="threshold time";
    n1:cell_methods="time: minimum within days time: sum over days";
  float n2(lat,lon);
    n2:standard_name="spell_length_of_days_with_air_temperature_below_threshold";
    n2:coordinates="threshold time";
    n2:cell_methods="time: minimum within days time: maximum over days";
  float threshold;
    threshold:standard_name="air_temperature";
    threshold:units="degC";
  double time;
    time:climatology="climatology_bounds";
    time:units="days since 2000-6-1";
  double climatology_bounds(time,nv);
data: // time coordinates translated to date/time format
  time="2008-1-16 6:00";
  climatology_bounds="2007-12-1 6:00", "2000-8-2 6:00";
  threshold=0.; 

comment:9 Changed 11 years ago by jonathan

  • Owner changed from cf-conventions@… to jonathan
  • Status changed from new to assigned
  • Summary changed from New cell_method 'count over days' to cell_methods for statistical indices

This discussion was successfully concluded a long while ago, so as moderator I declare this change to be accepted. I am also changing the summary of the ticket to reflect where the discussion ended.

The CF convention should be modified as described above, by the insertion of a new example. That means a new CF minor version number. Heinke Höck and Alison Pamment should be named as additional authors of the standard document. (Alison worked on this proposal in previous discussions on the email list.)

The proposal also requires some new standard names and additions to the standard name guidelines, described above.

Jonathan

comment:10 Changed 10 years ago by painter1

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.