Proposal for RC --> NSIDC data transfer methodology: Rev. 1.11

Bruce Raup braup at nsidc.org
Thu Sep 27 10:22:02 MDT 2001


Hello all,

We need your feedback.  Below is a proposed scheme for transferring data
from Regional Centers to NSIDC.  We would appreciate any comments or
questions on it, sent either to me directly, or to this list.  The latter
is preferred, though, as it tends to stimulate further illuminating
discussion.

Thanks in advance,
Bruce

$Revision: 1.11 $
$Id: data_transfer_specification.txt,v 1.11 2001-09-27 10:00:22-06 braup Exp braup $


Title:  A Methodology for Transferring GLIMS Analysis Products from
Regional Centers to NSIDC.

Prepared by:  Bruce Raup and Siri Jodha Singh Khalsa,
              National Snow and Ice Data Center

CONTENTS

1.  INTRODUCTION
2.  SCALAR INFORMATION
3.  VELOCITY VECTORS
4.  LITERATURE REFERENCES
5.  GLACIER OUTLINES, OTHER VECTOR INFORMATION, AND TOPOLOGY
6.  PUTTING IT ALL TOGETHER

---------------------------------------------------------------------------

1.  INTRODUCTION

The following text outlines our proposal for transferring the results of
analyses performed at the Regional Centers to NSIDC for ingest into the
GLIMS data base.  It specifies the formats and conventions for the files
that would convey this data.  Multiple files will be required to transfer
the results of a single analysis.

The GLIMS database is designed to hold different types of information about
glaciers, necessitating different transfer formats.  The identifying and
descriptive information will be conveyed in an ASCII file using
Parameter-Value Language, or PVL.  Velocity vectors will also be in an
ASCII file, but in tab-separated-value (TSV) format.  The data that make up
glacier outlines, centerlines, and tiepoint regions will be in one or more
ESRI shapefiles, with accompanying index and database files.  Point
measurements can be included in the ASCII PVL file or in a shapefile.
Reference information (journal articles, etc.) will be in "tagged reference
format", along the lines of the Endnote text-based format.  Each of these
formats are described in more detail below.

2.  SCALAR INFORMATION

Most of the attributes in the GLIMS database can be represented in simple
"parameter = value" form.  This includes all the attributes in these
tables:

Glacier_Static
Glacier_Dynamic
Tiepoint_Region
Area_Histogram
Area_Histogram_Data
Vector_Set
Image
Instrument
Band
Point_Measurement
Reference_Point
Institution

The proposed format is:
 > Optional Label1
table1.field1 = a string
table1.field2 = a number     #this is a comment
table1.field3 = another string
#here is a comment on a separate line
 > Optional Label2
table2.field1 = a number
table2.field2 = a string
table2.field3 = another number

Notes:

a. A line beginning with the ">" character begins a record (row) of a
   particular table.  This separates the entries for tables that have more
   than one row per analysis.

b. The "#" character begins comments, unless it is within quotation marks.
   Characters from the "#" to the end of the line are considered to be the
   comment.

c. Spaces on either side of the "=" are optional, and any number of them
   are allowed.

d. A submission will not have to duplicate information that doesn't vary,
   such as details of the institution or the instrument.  Thus, just
   institution_id or instrument_id will be adequate.

3.  VELOCITY VECTORS

For entries into the "Vector" table we propose a tab-separated-value
format.  Since a velocity analysis might consist of tens of thousands of
rows, we decided that a separate file using a more compact TSV, as opposed
to PVL, would be more efficient.  The format of this ASCII file will be

 > Optional Label
velocity data set identifier
x_value1 (tab) y_value1 (tab) delta_x1 (tab) delta_y1 (newline)
x_value2 (tab) y_value2 (tab) delta_x2 (tab) delta_y2 (newline)

Notes:

a. A line beginning with the ">" character begins a set of vectors. The
   next line contains the velocity data set identifier which must match a
   value for the field name "vel_set_id" in the "Vector_Set" table of a PVL
   file for the glacier it refers to.

b. The number of lines in a velocity set will be compared to the value for
   "num_vecs" in the corresponding entry in the PVL file for the
   "Vector_Set" table, if present.  If different, an error message will be
   issued and the RC contacted.

4.  LITERATURE REFERENCES

The "Reference_Document" table will store information about journal
articles and other reference documents.  The database is designed to hold
bibliographic information in two separate tables:  one table holds the
reference data in a tagged format, and the other table holds a dictionary
of all the tags.  The tags that we will use are those in the Endnote export
file format, described at
"http://www.ecst.csuchico.edu/~jacobsd/bib/formats/".  A Regional Center
need only supply references with those tags.  We define an additional
"custom" tag (%4) to associate with each entry a list of glacier IDs.

An example file containing one record for a journal article might look
like:

# Begin journal file excerpt
%0 Journal Article
%A Bishop, M.P.
%A Shroder, J.F.
%A Hickman, B.L.
%A Copland, Luke
%D 1997
%T Scale-dependent analysis of satellite imagery for characterization of
glacier surfaces in
the Karakoram Himalaya
%J Geomorphology
%V 591
%! Scale-dependent analysis of satellite imagery for characterization of
   glacier surfaces in the Karakoram Himalaya
%F Y
%K Batura
%4 G_E090000N25000
%4 G_E090010N25010
%4 G_E090020N25020
%4 G_E090030N25030

%0 Journal Article
%F raup:2000
%A Raup, Bruce H.
%A Kieffer, Hugh H.
%A Hare, Trent M.
%A Kargel, Jeffrey S.
%T Generation of Data Acquisition Requests for the ASTER Satellite
   Instrument for Monitoring a Globally Distributed Target: Glaciers
%J IEEE Transactions On Geoscience and Remote Sensing
%V 38
%N 2
%P 1105--1112
%D 2000
%8 March

# End journal file excerpt

Notes:

a. A reference record can begin simply with the Endnote fields, or can
   include any number of %4 tags to associate glacier IDs with the
   reference.

b. There exist freely available tools for converting between various
   formats, including Endnote and TeX's format BibTeX.

5.  GLACIER OUTLINES, OTHER VECTOR INFORMATION, AND TOPOLOGY

The GLIMS database has been designed to store the topology, i.e. the
spatial relationships, between glacier outlines.  These relationships are
difficult to represent outside of a GIS and we have decided that the
transfer format will not preserve topology, but rather the topology will be
reconstructed upon ingest into the database.

The scheme proposed for conveying outlines, centerlines, terminus
positions, ELAs, etc., recognizes the "segment" as a fundamental entity in
the GLIMS database.  Each segment has attributes defining its type and the
material on either side of the segment.

The transfer format chosen, ESRI shapefiles, is a well-documented,
widely-supported format for transferring the geometry and attributes of
spatial features.  Because it is anticipated that may regional centers will
use GIS packages to perform their analyses, shapefiles should be a common
option for outputting the results of those analyses.  In addition, there
are freely available tools for handling shapefiles.  One, called Shapelib
(http://gdal.velocet.ca/projects/shapelib/) is an open-source C library for
creating and manipulating shapefiles.

Whereas the GLIMS database will require that glacier outlines be closed
polygons, these polygons will typically be sent to NSIDC as set of
segments, using the Shapefile entity type "PolyLineZ".  This is a
convenient way in which attributes can be assigned to different segments of
the polygon.  Glacier outlines that do not have such attribute information
(e.g. data digitized from historical maps) may be sent as complete
polygons, using the Shapefile entity type "PolygonZ".

The tables "Glacier_Outline", "Center_Line", and "Tiepoint_Region_Outline"
all use the "Segment" table which in turn uses the "Segment_Point" table to
store the coordinates of the vertices of the segment.  Data for these
tables would be generated from information in the shapefile.

The shapefiles should use feature type PolyLineZ (called ARCZ in  shapelib)
to store segments.  X, Y and, optionally Z coordinates are defined for each
vertex and are written to the shapefile.  Also, a fourth measurement, the M
variable, can be stored.  This is being reserved for future use.

Each segment can have a number of attributes associated with it.  Two are
mandatory:  one to specify whether the segment is part of an outline or a
centerline, and one to specify the associated glacier ID or tiepoint region
ID.  Any of the other fields of the "Segment" table, such as
"coord_system", "segment_left_material", etc. should be assigned as
attributes to the segment.  These and other segment attributes are written
to a dBASE (.dbf) file when the shapefile is created.

Shapefiles must be in the 8.3 format (an ESRI restriction), and the three
files (.shp, .shx, .dbf) should have the same basename prefix (the part of
the filename left of the '.').  This one shapefile can hold all the
segments associated with one analysis session.

Notes:

a. All glacier outlines, centerlines, and tiepoint regions for a given
   analysis session can be stored in one shapefile.

b. Glacier and tiepoint region outlines (as closed polygons) can be pieced
   together upon ingest by extracting all the segments, in order, that have
   the same outline ID (from the .dbf file).  Entries for the
   "Glacier_Outline" and "Tiepoint_Region_Outline" tables can be easily
   created in this way.

c. A given segment may be in the shapefile more than once if it is shared.
   In this case, the difference instances will have different values for
   the outline ID attribute.  At ingest time, NSIDC will find duplicate
   segments and store them in the database only once.

d. Similarly, NSIDC will need to eliminate duplicate points for storage in
   the Segment_Point table.

6.  PUTTING IT ALL TOGETHER

An analysis session will generate a number of files.
Parameter-value-language files should have the suffix .pvl.
Tab-separated-value files should be named 'tablename.tsv'.  Bibliographic
files can be called something like "lit_refs.en".

An example submission might consist of six files:

alaska.pvl
alaska.shp
alaska.shx
alaska.dbf
alaska_vectors.tsv
ak_lit_refs.en

These could all be packaged into a gzipped tar file or a zip file.


===========================================================================

Questions/Comments:

- How are analysisIDs created? An analysisID is associated with a
  particular glacier and a particular analysis session done by an RC.  An
  RC will submit information on many glaciers at one time.

  Answer: This should be done at ingest time, simply assigning the next
  sequential number.  RCs will provide the glacier ID and the time of
  analysis.  The data type for the analysisID should be capable of holding
  very large numbers.  Current designation as "int" is probably not
  sufficient.

- We need to finalize form, and algorithm for generating, glacier and
  tiepoint region IDs (both of which RCs must generate).

- This scheme assumes that either 1) all coordinates are in Lon/Lat, or 2)
  all N/E coordinates share the same reference point.  Is this a problem?

- What about images?  Can be supplied in .jpg or .gif or .png and have the
  filename tie it to either an analysis or a glacier_id?

- How will we represent hierarchy or heritage (this glacier was one part of
  glacier X) in the transfer?

  Answer:  In the Glacier_Static table, there should be a field called
  something like "parent_icemass_id" to point to that glacier X.  We've
  talked about having such a pointer, but I don't think we've implemented
  it yet.  If implemented this way, then the info will simply be in the PVL
  file in the Glacier_Static section.

SELECTED GLOSSARY

Analysis
  one snapshot of one glacier

Analysis session
  a set of analyses from one region and one time, the results of which are
  generally submitted as one unit


-- 
Bruce Raup
National Snow and Ice Data Center                     Phone:  303-492-8814
University of Colorado, 449 UCB                       Fax:    303-492-2468
Boulder, CO  80309-0449                                    braup at nsidc.org







More information about the GLIMS mailing list