⇐ ⇒

[CF-metadata] example H4 (was Pre-proposal for "charset")

From: Jonathan Gregory <j.m.gregory>
Date: Fri, 3 Mar 2017 14:25:18 +0000

Dear David

As Bob has separately pointed out, my last posting might be less clear than it
should be because the sentence starting "What if you had" was garbled. I think
H4 is not ambiguous if that's the entire contents of the file, but mostly the
CF examples don't imply that. Is there a rule that you can't have data vars
from more that one station in the file? If it's legal to do that, consider:

dimensions:
  time = 100233 ;
  name_strlen = 23 ;
variables:
  float lon1 ;
    lon:standard_name = "longitude";
    lon:long_name = "station longitude";
    lon:units = "degrees_east";
  float lat1 ;
    lat:standard_name = "latitude";
    lat:long_name = "station latitude" ;
    lat:units = "degrees_north" ;
  float lon2 ;
    lon:standard_name = "longitude";
    lon:long_name = "station longitude";
    lon:units = "degrees_east";
  float lat2 ;
    lat:standard_name = "latitude";
    lat:long_name = "station latitude" ;
    lat:units = "degrees_north" ;
  char station_name(name_strlen) ;
    station_name:long_name = "station name" ;
    station_name:cf_role = "timeseries_id";
  double time(time) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1970-01-01 00:00:00" ;
    time:missing_value = -999.9;
  float temp1(time) ;
    temp:standard_name = ?air_temperature? ;
    temp:units = "Celsius" ;
    temp:coordinates = "time lat1 lon1" ;
    temp:_FillValue = -999.9;
  float temp2(time) ;
    temp:standard_name = ?air_temperature? ;
    temp:units = "Celsius" ;
    temp:coordinates = "time lat2 lon2" ;
    temp:_FillValue = -999.9;
attributes:
  :featureType = "timeSeries";

You cannot tell which of the locations is described by the station_name. That
is my objection to example H4.

Best wishes

Jonathan


----- Forwarded message from David Hassell <david.hassell at ncas.ac.uk> -----

> Date: Fri, 3 Mar 2017 09:02:01 +0000
> From: David Hassell <david.hassell at ncas.ac.uk>
> To: Jonathan Gregory <j.m.gregory at reading.ac.uk>
> CC: Bob Simons - NOAA Federal <bob.simons at noaa.gov>, CF Metadata
> <cf-metadata at cgd.ucar.edu>, "Signell, Richard" <rsignell at usgs.gov>
> Subject: Re: example H4 (was Pre-proposal for "charset")
>
> Hello,
>
> I still think that H4 is OK: The featureType attribute is set and the
> station_name variable has the cf_role attribute, so I think that it is
> clear that it applies to all of the variables dimensioned "time" - just as
> clear as if there were more an explicit station dimension. Even if the
> featureType were not set in the file (which is allowed for orthogonal
> multidimensional array representations) then the presence of cf_role
> attribute would still be enough, I think.
>
> That said, I still support Jonathan's first suggestion:
>
> > require there to be a station dimension if you want to include any
> station-
> > related variables in the file which aren't listed in the coordinates
> attribute.
>
> I am currently writing software to parse DGSs, and having infer these
> relationships (rather than getting them directly from the coordinates
> attribute) is, for me, challenging the goal that the conventions make it
> "practical to write software" (section1.1)!
>
> All the best,
>
> David
>
> On 2 March 2017 at 19:26, Jonathan Gregory <j.m.gregory at reading.ac.uk>
> wrote:
>
> > Dear Bob et al.
> >
> > Yes, this thread is currently not about your charset proposal, but about
> > H4,
> > because of its being an example. My request for an example arises because I
> > guess that when CF metadata contains char arrays, it should already be
> > clear
> > from their purpose (within CF) whether they are characters or strings.
> >
> > However that doesn't seem to work in the case of H4, as you say. As David
> > says,
> > > I think that such a variable contains spatial information for the
> > > implied instance dimension. Therefore I think that the "cf_role" variable
> > > should be an auxiliary coordinate variable and treated accordingly, thus
> > > removing the ambiguity that Jonathan points out.
> > ... but it isn't. I recall discussing this point with Steve Hankin and John
> > Caron when ch 9 was being drafted, but I hadn't seen the full implications.
> > John argued that we shouldn't require all station variables to be listed in
> > the coordinates attribute, thus making them formally auxiliary coordinate
> > variables, in case there were lots of them. He argued that association by
> > sharing the same dimension is sufficient. Although it's not like the rest
> > of
> > the CF convention, I would say, association in this way *is* sufficient,
> > when
> > there *is* a station dimension. In this case there isn't!
> >
> > What if you had, in the same file, two data variables, containing a
> > timeseries
> > timeseries from a single locations, with no station dimension, so each
> > timeseries has just a time-dimension. You can include some station
> > identifier
> > with cf_role="timeseries_id", but there's no way to know which station it
> > identifies, so this is useless. As far as I know it's legal - or have I
> > over-
> > looked something? If it's legal now, I would say this is the first case
> > we've
> > had where the CF convention is actually defective - I mean, not just the
> > text,
> > but the design of the convention - so we must change it.
> >
> > It could be fixed by any of these methods:
> >
> > * require there to be a station dimension if you want to include any
> > station-
> > related variables in the file which aren't listed in the coordinates
> > attribute.
> >
> > * require all station variables to be listed in the coordinates attribute
> > (if
> > there is no station dimension).
> >
> > * require all data variables in the file to apply to the same station, so
> > that
> > the file deals with only one location (if there is no station dimension).
> >
> > Of these, I dislike the last one, because it puts a requirement on the
> > file as
> > a whole, whereas most of CF deals with data variable, for which files are
> > just
> > containers.
> >
> > Best wishes
> >
> > Jonathan
> >
> >
> > On Wed, Mar 01, 2017 at 08:39:10AM +0000, David Hassell wrote:
> > > Date: Wed, 1 Mar 2017 08:39:10 +0000
> > > From: David Hassell <david.hassell at ncas.ac.uk>
> > > To: Bob Simons - NOAA Federal <bob.simons at noaa.gov>
> > > CC: Jonathan Gregory <j.m.gregory at reading.ac.uk>, CF Metadata
> > > <cf-metadata at cgd.ucar.edu>, "Signell, Richard" <rsignell at usgs.gov>
> > > Subject: Re: [CF-metadata] example Re: Pre-proposal for "charset"
> > >
> > > Hello,
> > >
> > > On the validity of H4 ...
> > >
> > > The problem for me (in H4 and Bob's file) is that the "cf_role" variable
> > is
> > > not listed by the coordinates attribute of any of the data variables to
> > > which it applies. But should it be? To my reading the conventions are a
> > > little vague. On one hand it says that *all* spatiotemporal coordinates
> > > should be listed in this fashion (section 9.5), but also doesn't say
> > > explicitly that the "cf_role" variable should an auxiliary coordinate
> > > "station_info" variable in H4.)
> > >
> > > This sounds like a defect to me. The conventions could perhaps replace
> > > "Where feasible a variable with the attribute cf_role should be included"
> > > with "Where feasible *an auxiliary coordinate *variable with the
> > attribute
> > > cf_role should be included".
> > >
> > > All the best,
> > >
> > > David
> > >
> > > On 28 February 2017 at 21:22, Bob Simons - NOAA Federal <
> > bob.simons at noaa.gov
> > > > wrote:
> > >
> > > > [This has degenerated into a debate about whether a given file is a
> > valid
> > > > CF DSG file, but I'll continue.]
> > > >
> > > > Perhaps I am misunderstanding this, but it is very hard for me to
> > > > interpret H4 as "defective", as if it were a minor error within an
> > > > otherwise valid CF DSG file.
> > > >
> > > > The description right above it says "When the intention of a data
> > variable
> > > > is to contain only a single time series, *the preferred encoding* is a
> > > > special case of the multidimensional array representation." [emphasis
> > mine]
> > > >
> > > > That reference to an encoding that is a special case of the
> > > > multidimensional array representation is almost certainly a reference
> > to an
> > > > entire paragraph in section 9.2 which starts with
> > > > "If there is only a single feature to be stored in a data variable,
> > there
> > > > is no need for an instance dimension and it is permitted to omit it."
> > and
> > > > then discusses that.
> > > >
> > > > So the whole point of H4 is to give an example where there is an
> > *implied*
> > > > time series dimension, not an actual one. [I'm not happy about this
> > type
> > > > of file either, but CF DSG seems to explicitly allow it, and even
> > actively
> > > > encourage it. I'm just trying to follow the rules.]
> > > >
> > > > ---
> > > > And if H4 is defective, several groups that now make this type of file
> > are
> > > > going to be surprised to hear it.
> > > >
> > > >
> > > > On Tue, Feb 28, 2017 at 12:16 PM, Jonathan Gregory <
> > > > j.m.gregory at reading.ac.uk> wrote:
> > > >
> > > >> Dear Rich, Bob
> > > >>
> > > >> Rich asked
> > > >> > What is the difference that makes example CF H.4 okay, but not Bob's
> > > >> example?
> > > >>
> > > >> That's a good question. You're right, I think example H4 has the same
> > > >> problem! I didn't read it carefully enough - sorry. The
> > > >> station_name:cf_role
> > > >> attribute in H4 says it's a station ID, but there's no association
> > with
> > > >> the
> > > >> data variable. If you had more than one data variable in the file, as
> > in
> > > >> Bob's
> > > >> example as he gave it originally, you couldn't tell which one it
> > belonged
> > > >> to,
> > > >> so it can't be used as identification. In examples H6 and H7, however,
> > > >> where
> > > >> there are several timeseries, there is an instance (station)
> > dimension,
> > > >> and the
> > > >> timeseries_id variable is station_name(station, name_strlen). So
> > here's
> > > >> it's
> > > >> clear that the station_name belongs to the stations, and you can infer
> > > >> that
> > > >> it's an array of strings, not a 2D character array.
> > > >>
> > > >> Therefore I think H4 and H5 are defective.
> > > >>
> > > >> In Bob's example, he has several data variables (five, I think) each
> > of
> > > >> size
> > > >> 996, and a variable
> > > >>
> > > >> char timeseries(timeseries=10);
> > > >> :cf_role = "timeseries_id";
> > > >> :long_name = "timeseries";
> > > >>
> > > >> I don't know what this refers to - that's my problem. Does it belong
> > to
> > > >> any
> > > >> of the data variables? The dimension timeseries is not otherwise used.
> > > >>
> > > >> Best wishes
> > > >>
> > > >> Jonathan
> > > >>
> > > >>
> > > >> On Mon, Feb 27, 2017 at 02:41:49PM -0500, Signell, Richard wrote:
> > > >> > Date: Mon, 27 Feb 2017 14:41:49 -0500
> > > >> > From: "Signell, Richard" <rsignell at usgs.gov>
> > > >> > To: Jonathan Gregory <j.m.gregory at reading.ac.uk>
> > > >> > Subject: Re: [CF-metadata] Pre-proposal for "charset"
> > > >> >
> > > >> > On Mon, Feb 27, 2017 at 1:11 PM, Jonathan Gregory
> > > >> > <j.m.gregory at reading.ac.uk> wrote:
> > > >> > > Dear Bob
> > > >> > >
> > > >> > > That's right, there doesn't have to be an instance dimension. The
> > > >> problem with
> > > >> > > the file is that the variable you're concerned with (timeseries)
> > > >> isn't linked
> > > >> > > to any of the other variables, so its purpose is not clear.
> > > >> > >
> > > >> >
> > > >> > Jonathan,
> > > >> >
> > > >> >
> > > >> >
> > > >> > #1 CF Example H.4
> > > >> > http://cfconventions.org/cf-conventions/v1.6.0/cf-convention
> > > >> s.html#example-h.4
> > > >> >
> > > >> > dimensions:
> > > >> > time = 100233 ;
> > > >> > name_strlen = 23 ;
> > > >> >
> > > >> > variables:
> > > >> > float lon ;
> > > >> > lon:standard_name = "longitude";
> > > >> > lon:long_name = "station longitude";
> > > >> > lon:units = "degrees_east";
> > > >> > float lat ;
> > > >> > lat:standard_name = "latitude";
> > > >> > lat:long_name = "station latitude" ;
> > > >> > lat:units = "degrees_north" ;
> > > >> > float alt ;
> > > >> > alt:long_name = "vertical distance above the surface" ;
> > > >> > alt:standard_name = "height" ;
> > > >> > alt:units = "m";
> > > >> > alt:positive = "up";
> > > >> > alt:axis = "Z";
> > > >> > char station_name(name_strlen) ;
> > > >> > station_name:long_name = "station name" ;
> > > >> > station_name:cf_role = "timeseries_id";
> > > >> >
> > > >> > double time(time) ;
> > > >> > time:standard_name = "time";
> > > >> > time:long_name = "time of measurement" ;
> > > >> > time:units = "days since 1970-01-01 00:00:00" ;
> > > >> > time:missing_value = -999.9;
> > > >> > float humidity(time) ;
> > > >> > humidity:standard_name = ?specific_humidity? ;
> > > >> > humidity:coordinates = "time lat lon alt" ;
> > > >> > humidity:_FillValue = -999.9;
> > > >> > float temp(time) ;
> > > >> > temp:standard_name = ?air_temperature? ;
> > > >> > temp:units = "Celsius" ;
> > > >> > temp:coordinates = "time lat lon alt" ;
> > > >> > temp:_FillValue = -999.9;
> > > >> >
> > > >> > attributes:
> > > >> > :featureType = "timeSeries";
> > > >> >
> > > >> >
> > > >> > #2 Bob's example:
> > > >> >
> > > >> > netcdf summary_allTB2007.nc {
> > > >> > dimensions:
> > > >> > timeseries = 10;
> > > >> > time = 996;
> > > >> > variables:
> > > >> > char timeseries(timeseries=10);
> > > >> > :cf_role = "timeseries_id";
> > > >> > :long_name = "timeseries";
> > > >> >
> > > >> > double time(time=996);
> > > >> > :units = "seconds since 1970-01-01T00:00:00Z";
> > > >> > :standard_name = "time";
> > > >> > :long_name = "time";
> > > >> > :calendar = "gregorian";
> > > >> > :axis = "T";
> > > >> >
> > > >> > double latitude;
> > > >> > :valid_min = -90.0; // double
> > > >> > :valid_max = 90.0; // double
> > > >> > :axis = "Y";
> > > >> > :long_name = "latitude";
> > > >> > :standard_name = "latitude";
> > > >> > :units = "degrees_north";
> > > >> >
> > > >> > double longitude;
> > > >> > :valid_min = -180.0; // double
> > > >> > :valid_max = 180.0; // double
> > > >> > :axis = "X";
> > > >> > :long_name = "longitude";
> > > >> > :standard_name = "longitude";
> > > >> > :units = "degrees_east";
> > > >> >
> > > >> > double depth;
> > > >> > :positive = "down";
> > > >> > :axis = "Z";
> > > >> > :valid_min = 0.0; // double
> > > >> > :valid_max = 10971.0; // double
> > > >> > :long_name = "depth";
> > > >> > :standard_name = "depth";
> > > >> > :units = "m";
> > > >> >
> > > >> > char platform;
> > > >> > :long_name = "MVCO ASIT";
> > > >> >
> > > >> > char instrument;
> > > >> > :long_name = "Imaging FlowCytobot";
> > > >> >
> > > >> > double crs;
> > > >> > :grid_mapping_name = "latitude_longitude";
> > > >> > :longitude_of_prime_meridian = 0.0; // double
> > > >> > :semi_major_axis = 6378137.0; // double
> > > >> > :inverse_flattening = 298.257223563; // double
> > > >> > :epsg_code = "EPSG:4326";
> > > >> >
> > > >> > double Phaeocystis(time=996);
> > > >> > :_FillValue = -9999.9; // double
> > > >> > :long_name = "Phaeocystis";
> > > >> > :standard_name = "Phaeocystis";
> > > >> > :units = "1";
> > > >> > :coordinates = "time depth latitude longitude";
> > > >> > :grid_mapping = "crs";
> > > >> > :platform = "platform";
> > > >> > :instrument = "instrument";
> > > >> >
> > > >> > // global attributes:
> > > >> > :featureType = "timeSeries";
> > > >> > :cdm_data_type = "TimeSeries";
> > > >> > :Conventions = "CF-1.6";
> > > >> > :summary = "Phytoplankton concentration by class derived from
> > images
> > > >> > collected by Imaging FlowCytobot\n";
> > > >> > :institution = "WHOI";
> > > >> > :title = "Phytoplankton concentration by class";
> > > >> > }
> > > >> >
> > > >> > --
> > > >> > Dr. Richard P. Signell (508) 457-2229
> > > >> > USGS, 384 Woods Hole Rd.
> > > >> > Woods Hole, MA 02543-1598
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Sincerely,
> > > >
> > > > Bob Simons
> > > > IT Specialist
> > > > Environmental Research Division
> > > > NOAA Southwest Fisheries Science Center
> > > > 99 Pacific St., Suite 255A (New!)
> > > > Monterey, CA 93940 (New!)
> > > > Phone: (831)333-9878 <(831)%20333-9878> (New!)
> > > > Fax: (831)648-8440 <(831)%20648-8440>
> > > > Email: bob.simons at noaa.gov
> > > >
> > > > The contents of this message are mine personally and
> > > > do not necessarily reflect any position of the
> > > > Government or the National Oceanic and Atmospheric Administration.
> > > > <>< <>< <>< <>< <>< <>< <>< <>< <><
> > > >
> > > >
> > > > _______________________________________________
> > > > CF-metadata mailing list
> > > > CF-metadata at cgd.ucar.edu
> > > > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> > > >
> > > >
> > >
> > >
> > > --
> > > David Hassell
> > > National Centre for Atmospheric Science
> > > Department of Meteorology, University of Reading,
> > > Earley Gate, PO Box 243, Reading RG6 6BB
> > > Tel: +44 118 378 5613
> > > http://www.met.reading.ac.uk/
> >
>
>
>
> --
> David Hassell
> National Centre for Atmospheric Science
> Department of Meteorology, University of Reading,
> Earley Gate, PO Box 243, Reading RG6 6BB
> Tel: +44 118 378 5613
> http://www.met.reading.ac.uk/

----- End forwarded message -----
Received on Fri Mar 03 2017 - 07:25:18 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒