⇐ ⇒

[CF-metadata] Proposals for Versioning CF Conventions and Standard Names on Github

From: Chris Barker <chris.barker>
Date: Tue, 23 Sep 2014 17:26:45 -0700

On Tue, Sep 23, 2014 at 9:16 AM, Jonathan Gregory <j.m.gregory at reading.ac.uk
> wrote:

> I think I must not have missed a point somewhere. Version control is not
> the
> same as branches, is it.


well, we have a vocabulary issue here:

Applying version numbers to something and what those version numbers mean
is one thing.

On the other hand: "version control" is usually used to describe a software
system that helps you manage the various versions (numbered or not) of
files involved with a software system. CVS, SVN, git, Hg are all version
control systems, often also called Source code management systems (SCM),
because they were all designed to be used for software development.

However, despite being designed for software development, such systems can
be very helpful for developing other sorts of documents -- particularly if
they are large-ish document with multiple parts that may be in multiple
files and multiple people want to contribute to them.

So: I _think_ what is being proposed here is that the CF-2.*.* version of
the CF standards document be developed as a community effort in an SCM,
specifically git.

Also (and a bit orthogonally) that the entire endeavor be managed in a
project management system, specifically gitHub. gitHub provides a hosting
service and nice front end to git, as well as an issue tracking system,
etc, for managing the project.

Currently, CF is using TRAC for the project mangement -- gitHub provides
similar, though I think better for this sort of thing, functionality as
TRAC tickets.

However, as far as I know, the CF document itself is not managed in a SCM
-- so that would be the real change here. TRAC can be linked to an SCM,
including SVN and git, but it's not as clean a linkage with as nice an
interface as gitHub.

Another trick is that while git (or SVN, or...) can be used to mange any
kind of document -- it really works best with simple text formats -- i,e OK
with LaTeX, great with RST or MarkDown, can be done, but painful with XML.
The reason for this is that git uses line by line comparisons and merging
of changed documents -- binary formats are useless, and XML is very easy to
break if you swap out lines.

(there may be a XML-aware diff-ing and merging tool that one could plug
into git -- I have no idea)

So there are various proposals on the table, that are semi-independent (and
if they are not on the table then I am putting them there..;-) )

1) We manage the development of the CF-2.x.x document in a source code
management system, much like a software project

2) We manage the project on gitHub -- using gitHub issues to discuss, and
the whole fork-pull-request thing for updating the document.

3) We move discussion to gitHub issues, rather than TRAC tickets.

4) We use somethign other than DocBook XML for the document - XML does not
lend itself to an SCM, and non-experts editing the doc.



> We already have version control and maybe we could
> add a third digit to it if we corrected defects between versions.


We have versioning, not version control -- see above.


> I do not
> see a need for branches in developing the convention. In software
> development
> you need branches when different changes overlap and are being developed
> concurrently.

That has hardly ever been the case for the CF convention, as far
> as I remember, though I think there might at the moment be a couple of trac
> tickets that modify the same part of the document.


There may be little need for branching, yes -- but actually branching is
easy and fast with git, and you never know when what you are doing is
conflict with what someone else is doing until you merge. But anyway, git
and gitHub are pretty useful, even if you don't branch much.

And we may want to use branches to keep separate documents apart (though I
don't personally think that's the way to us them...)


> Moreover in the end
> you have to reconcile concurrent developments, and I would say that in the
> case of the CF convention it would hardly make sense to develop two changes
> separately and then reconcile them subsequently - it would be much more
> sensible to reconcile and probably combine them as they were being
> discussed,
> I would argue.


well the git mantra is to branch often, but also to merge often....I"m not
sure the mechanism makes a lot of difference here anyway.


> Thus I think we are fine with the existing system that agrees
> changes independently, and then combines them all to make a new version.
>

that's pretty much what would happen with branches (or without...)

I don't know about what software systems are best suited for it. I think
> that
> trac is a good system for it, because it records the whole discussion and
> it's
> easy for anyone to read and contribute to it without understanding anything
> except simple text markup (and even that is inessential).


gitHub issues are very similar in this regard. One thing it does better is
link to email.


> I'm still unclear about my previous question. Is it envisaged that many
> people
> might prepare a new version of the document with a trac ticket implemented
> in
> it, and then request to upload it? Who would do the proof-reading and give
> the final OK that the change was as agreed in the ticket?


I think the idea here is a different work flow. If the document is manged
in git, then:

Discussion of an idea would happen in a git issue. When someone (anyone)
had a proposal for changing the text of the document, they would clone the
repo, make the change in their copy, and issue a pull request. Then anyone
could comment on that change via the pull request system. Whoever (I
presume a small group of people) had the authority to make the final
decision would merge (Or not) the change when they thought it was ready.


> It would be helpful
> to know what folk at PCDMI think who manage the current system.


Indeed.


> Is this way of
> doing it better than having a single editor, as we do now? The convention
> is
> not like a large software package. It is something we all write together,
> in
> effect, rather than something we all contribute to independently.


I think a software package is very much something we write together. And
this sort of system/workflow is a good way to do that. I would be
interested in what eh folks currently managing the document think ,but I
think a system without a bottle neck at one small group of people would be
much better -- particular in the early stages as things change fast.

I've been involved with a wide variety of approaches to creating and
managing a document that a lot of people contribute to -- and they all
really suck compared to a software version control system.

Does this make it any more clear?

-- 
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20140923/cdc92788/attachment-0001.html>
Received on Tue Sep 23 2014 - 18:26:45 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒