Proposal for RC --> NSIDC data transfer methodology: Rev. 1.11
Luke Copland
luke.copland at ualberta.ca
Mon Oct 1 15:17:10 MDT 2001
Hi Bruce and others,
I just had a read through the data transfer proposal (below), and it
ties in very closely with what we've been thinking and working on here
recently. Polyline-based shapefiles seem to be the best way to transfer
the data, and provide a great way to store associated attribute
information. We just have a couple of questions/comments:
1. Is it worthwhile being able to transfer images in geotiff (*.tif)
format? These images are georeferenced, so it would make it easier to
relate them back to the shape files than using non-georeferenced jpg/gif
imagery. I'm not sure if it would be feasible to store the files as
.tifs, but convert them to jpg/gifs when requested for viewing on a web
browser?
2. Are you still planning on differentiating between areas of a glacier
boundary which are delineated by rock versus those defined by an ice
divide? If so, what are the terms that should be used in the attribute
table?
3. Can elevation datasets be stored in the database if they are
available? If so, then in what format? I'm guessing that TSV files could
work, or perhaps geotiffs (which is the format we mainly use here to
transfer elevation information).
As an aside, we're still working on the difficult question of defining
exactly what a glacier is, and whether to consider tributaries as
separate entities. We've found the Basin utility in ArcView incredibly
useful for defining glacier drainage basins, and the location of ice
divides on large ice caps. We're tending towards the idea of working
upwards from a glacier terminus, and counting all tributaries that feed
ice and water into the terminus as part of one single glacier. I'm
currently working on defining and cataloguing the glaciers on Manson Ice
Cap, Ellesmere Island, as a way to work through potential problems and
to demonstrate our methods to others. Once done, I'll send on the
information to NSIDC so that it can be input to the database and any
issues can be identified.
Cheers,
Luke.
--
Dr. Luke Copland
Department of Earth & Atmospheric Sciences
University of Alberta
Edmonton, Alberta T6G 2E3, Canada
Tel: 780 707 5583, Fax: 780 492 7598
luke.copland at ualberta.ca, http://arctic.eas.ualberta.ca/luke
Manager, GLIMS regional centre for the Canadian High Arctic
Bruce Raup wrote:
>
> Hello all,
>
> We need your feedback. Below is a proposed scheme for transferring data
> from Regional Centers to NSIDC. We would appreciate any comments or
> questions on it, sent either to me directly, or to this list. The latter
> is preferred, though, as it tends to stimulate further illuminating
> discussion.
>
> Thanks in advance,
> Bruce
>
> $Revision: 1.11 $
> $Id: data_transfer_specification.txt,v 1.11 2001-09-27 10:00:22-06 braup Exp braup $
>
> Title: A Methodology for Transferring GLIMS Analysis Products from
> Regional Centers to NSIDC.
>
> Prepared by: Bruce Raup and Siri Jodha Singh Khalsa,
> National Snow and Ice Data Center
>
> CONTENTS
>
> 1. INTRODUCTION
> 2. SCALAR INFORMATION
> 3. VELOCITY VECTORS
> 4. LITERATURE REFERENCES
> 5. GLACIER OUTLINES, OTHER VECTOR INFORMATION, AND TOPOLOGY
> 6. PUTTING IT ALL TOGETHER
>
> ---------------------------------------------------------------------------
>
> 1. INTRODUCTION
>
> The following text outlines our proposal for transferring the results of
> analyses performed at the Regional Centers to NSIDC for ingest into the
> GLIMS data base. It specifies the formats and conventions for the files
> that would convey this data. Multiple files will be required to transfer
> the results of a single analysis.
>
> The GLIMS database is designed to hold different types of information about
> glaciers, necessitating different transfer formats. The identifying and
> descriptive information will be conveyed in an ASCII file using
> Parameter-Value Language, or PVL. Velocity vectors will also be in an
> ASCII file, but in tab-separated-value (TSV) format. The data that make up
> glacier outlines, centerlines, and tiepoint regions will be in one or more
> ESRI shapefiles, with accompanying index and database files. Point
> measurements can be included in the ASCII PVL file or in a shapefile.
> Reference information (journal articles, etc.) will be in "tagged reference
> format", along the lines of the Endnote text-based format. Each of these
> formats are described in more detail below.
>
> 2. SCALAR INFORMATION
>
> Most of the attributes in the GLIMS database can be represented in simple
> "parameter = value" form. This includes all the attributes in these
> tables:
>
> Glacier_Static
> Glacier_Dynamic
> Tiepoint_Region
> Area_Histogram
> Area_Histogram_Data
> Vector_Set
> Image
> Instrument
> Band
> Point_Measurement
> Reference_Point
> Institution
>
> The proposed format is:
> > Optional Label1
> table1.field1 = a string
> table1.field2 = a number #this is a comment
> table1.field3 = another string
> #here is a comment on a separate line
> > Optional Label2
> table2.field1 = a number
> table2.field2 = a string
> table2.field3 = another number
>
> Notes:
>
> a. A line beginning with the ">" character begins a record (row) of a
> particular table. This separates the entries for tables that have more
> than one row per analysis.
>
> b. The "#" character begins comments, unless it is within quotation marks.
> Characters from the "#" to the end of the line are considered to be the
> comment.
>
> c. Spaces on either side of the "=" are optional, and any number of them
> are allowed.
>
> d. A submission will not have to duplicate information that doesn't vary,
> such as details of the institution or the instrument. Thus, just
> institution_id or instrument_id will be adequate.
>
> 3. VELOCITY VECTORS
>
> For entries into the "Vector" table we propose a tab-separated-value
> format. Since a velocity analysis might consist of tens of thousands of
> rows, we decided that a separate file using a more compact TSV, as opposed
> to PVL, would be more efficient. The format of this ASCII file will be
>
> > Optional Label
> velocity data set identifier
> x_value1 (tab) y_value1 (tab) delta_x1 (tab) delta_y1 (newline)
> x_value2 (tab) y_value2 (tab) delta_x2 (tab) delta_y2 (newline)
>
> Notes:
>
> a. A line beginning with the ">" character begins a set of vectors. The
> next line contains the velocity data set identifier which must match a
> value for the field name "vel_set_id" in the "Vector_Set" table of a PVL
> file for the glacier it refers to.
>
> b. The number of lines in a velocity set will be compared to the value for
> "num_vecs" in the corresponding entry in the PVL file for the
> "Vector_Set" table, if present. If different, an error message will be
> issued and the RC contacted.
>
> 4. LITERATURE REFERENCES
>
> The "Reference_Document" table will store information about journal
> articles and other reference documents. The database is designed to hold
> bibliographic information in two separate tables: one table holds the
> reference data in a tagged format, and the other table holds a dictionary
> of all the tags. The tags that we will use are those in the Endnote export
> file format, described at
> "http://www.ecst.csuchico.edu/~jacobsd/bib/formats/". A Regional Center
> need only supply references with those tags. We define an additional
> "custom" tag (%4) to associate with each entry a list of glacier IDs.
>
> An example file containing one record for a journal article might look
> like:
>
> # Begin journal file excerpt
> %0 Journal Article
> %A Bishop, M.P.
> %A Shroder, J.F.
> %A Hickman, B.L.
> %A Copland, Luke
> %D 1997
> %T Scale-dependent analysis of satellite imagery for characterization of
> glacier surfaces in
> the Karakoram Himalaya
> %J Geomorphology
> %V 591
> %! Scale-dependent analysis of satellite imagery for characterization of
> glacier surfaces in the Karakoram Himalaya
> %F Y
> %K Batura
> %4 G_E090000N25000
> %4 G_E090010N25010
> %4 G_E090020N25020
> %4 G_E090030N25030
>
> %0 Journal Article
> %F raup:2000
> %A Raup, Bruce H.
> %A Kieffer, Hugh H.
> %A Hare, Trent M.
> %A Kargel, Jeffrey S.
> %T Generation of Data Acquisition Requests for the ASTER Satellite
> Instrument for Monitoring a Globally Distributed Target: Glaciers
> %J IEEE Transactions On Geoscience and Remote Sensing
> %V 38
> %N 2
> %P 1105--1112
> %D 2000
> %8 March
>
> # End journal file excerpt
>
> Notes:
>
> a. A reference record can begin simply with the Endnote fields, or can
> include any number of %4 tags to associate glacier IDs with the
> reference.
>
> b. There exist freely available tools for converting between various
> formats, including Endnote and TeX's format BibTeX.
>
> 5. GLACIER OUTLINES, OTHER VECTOR INFORMATION, AND TOPOLOGY
>
> The GLIMS database has been designed to store the topology, i.e. the
> spatial relationships, between glacier outlines. These relationships are
> difficult to represent outside of a GIS and we have decided that the
> transfer format will not preserve topology, but rather the topology will be
> reconstructed upon ingest into the database.
>
> The scheme proposed for conveying outlines, centerlines, terminus
> positions, ELAs, etc., recognizes the "segment" as a fundamental entity in
> the GLIMS database. Each segment has attributes defining its type and the
> material on either side of the segment.
>
> The transfer format chosen, ESRI shapefiles, is a well-documented,
> widely-supported format for transferring the geometry and attributes of
> spatial features. Because it is anticipated that may regional centers will
> use GIS packages to perform their analyses, shapefiles should be a common
> option for outputting the results of those analyses. In addition, there
> are freely available tools for handling shapefiles. One, called Shapelib
> (http://gdal.velocet.ca/projects/shapelib/) is an open-source C library for
> creating and manipulating shapefiles.
>
> Whereas the GLIMS database will require that glacier outlines be closed
> polygons, these polygons will typically be sent to NSIDC as set of
> segments, using the Shapefile entity type "PolyLineZ". This is a
> convenient way in which attributes can be assigned to different segments of
> the polygon. Glacier outlines that do not have such attribute information
> (e.g. data digitized from historical maps) may be sent as complete
> polygons, using the Shapefile entity type "PolygonZ".
>
> The tables "Glacier_Outline", "Center_Line", and "Tiepoint_Region_Outline"
> all use the "Segment" table which in turn uses the "Segment_Point" table to
> store the coordinates of the vertices of the segment. Data for these
> tables would be generated from information in the shapefile.
>
> The shapefiles should use feature type PolyLineZ (called ARCZ in shapelib)
> to store segments. X, Y and, optionally Z coordinates are defined for each
> vertex and are written to the shapefile. Also, a fourth measurement, the M
> variable, can be stored. This is being reserved for future use.
>
> Each segment can have a number of attributes associated with it. Two are
> mandatory: one to specify whether the segment is part of an outline or a
> centerline, and one to specify the associated glacier ID or tiepoint region
> ID. Any of the other fields of the "Segment" table, such as
> "coord_system", "segment_left_material", etc. should be assigned as
> attributes to the segment. These and other segment attributes are written
> to a dBASE (.dbf) file when the shapefile is created.
>
> Shapefiles must be in the 8.3 format (an ESRI restriction), and the three
> files (.shp, .shx, .dbf) should have the same basename prefix (the part of
> the filename left of the '.'). This one shapefile can hold all the
> segments associated with one analysis session.
>
> Notes:
>
> a. All glacier outlines, centerlines, and tiepoint regions for a given
> analysis session can be stored in one shapefile.
>
> b. Glacier and tiepoint region outlines (as closed polygons) can be pieced
> together upon ingest by extracting all the segments, in order, that have
> the same outline ID (from the .dbf file). Entries for the
> "Glacier_Outline" and "Tiepoint_Region_Outline" tables can be easily
> created in this way.
>
> c. A given segment may be in the shapefile more than once if it is shared.
> In this case, the difference instances will have different values for
> the outline ID attribute. At ingest time, NSIDC will find duplicate
> segments and store them in the database only once.
>
> d. Similarly, NSIDC will need to eliminate duplicate points for storage in
> the Segment_Point table.
>
> 6. PUTTING IT ALL TOGETHER
>
> An analysis session will generate a number of files.
> Parameter-value-language files should have the suffix .pvl.
> Tab-separated-value files should be named 'tablename.tsv'. Bibliographic
> files can be called something like "lit_refs.en".
>
> An example submission might consist of six files:
>
> alaska.pvl
> alaska.shp
> alaska.shx
> alaska.dbf
> alaska_vectors.tsv
> ak_lit_refs.en
>
> These could all be packaged into a gzipped tar file or a zip file.
>
> ===========================================================================
>
> Questions/Comments:
>
> - How are analysisIDs created? An analysisID is associated with a
> particular glacier and a particular analysis session done by an RC. An
> RC will submit information on many glaciers at one time.
>
> Answer: This should be done at ingest time, simply assigning the next
> sequential number. RCs will provide the glacier ID and the time of
> analysis. The data type for the analysisID should be capable of holding
> very large numbers. Current designation as "int" is probably not
> sufficient.
>
> - We need to finalize form, and algorithm for generating, glacier and
> tiepoint region IDs (both of which RCs must generate).
>
> - This scheme assumes that either 1) all coordinates are in Lon/Lat, or 2)
> all N/E coordinates share the same reference point. Is this a problem?
>
> - What about images? Can be supplied in .jpg or .gif or .png and have the
> filename tie it to either an analysis or a glacier_id?
>
> - How will we represent hierarchy or heritage (this glacier was one part of
> glacier X) in the transfer?
>
> Answer: In the Glacier_Static table, there should be a field called
> something like "parent_icemass_id" to point to that glacier X. We've
> talked about having such a pointer, but I don't think we've implemented
> it yet. If implemented this way, then the info will simply be in the PVL
> file in the Glacier_Static section.
>
> SELECTED GLOSSARY
>
> Analysis
> one snapshot of one glacier
>
> Analysis session
> a set of analyses from one region and one time, the results of which are
> generally submitted as one unit
>
> --
> Bruce Raup
> National Snow and Ice Data Center Phone: 303-492-8814
> University of Colorado, 449 UCB Fax: 303-492-2468
> Boulder, CO 80309-0449 braup at nsidc.org
More information about the GLIMS
mailing list