Data Compilation

Sort/Merge Routines

All original data except those corresponding to ID=1 and ID=5 (Asian data through 1987 and all data from 1988, onward) data were received already sorted by decreasing pressure. ID=1 and ID=5, however, were not sorted by decreasing pressure and therefore required some special processing. Each original sounding was sorted into five categories:

  1. temperature, dewpoint depression, geopotential height and winds at mandatory pressure levels;
  2. temperature, dewpoint depression and winds at variable pressure levels;
  3. winds at variable pressure levels;
  4. winds at variable geopotential heights (this category was generally reported infrequently); and
  5. tropopause data

Pressure levels are often repeated in this format. For example, while the first level in categories 2, 3, and 4 is always the surface, each category contains different information. At other times, what was reported as a mandatory level was also repeated as a significant level. Typically, two to four repeat levels were found for each sounding. To make these data compatible with the soundings from other sources, each sounding was passed through a sort/merge program.

The sort/merge program worked as follows: Each sounding was first sorted by decreasing pressure, then by increasing geopotential height. Repeat levels were flagged, based on identical values of pressure or geopotential height. The most complete record possible for each repeat level was then assembled. The first occurrence of any repeat value and its associated quality flag was retained (see Appendix 4 ). A drawback of the technique is that although two levels may be identified as repeats, temperature or wind values associated with repeat levels are sometimes slightly different. Retaining only the first occurrence will occasionally result in retaining a bad value. While in theory, this problem could have been eliminated by also inspecting each quality flag, the large volume of processing made this impractical. Spot checks indicate that this is only a minor problem. An artifact of the technique, however, is that since some of the quality flags indicate whether or not there is agreement between values at repeat levels (but do not indicate which level is correct), any flags retained relevant to between-category checks should be ignored (see appendix.html - fourAppendix 4).


Data through 1991 were subjected to additional rudimentary error-checking routines. The design of the error-checking procedure used through 1991 was guided by three considerations:

  1. A desire to produce the highest-quality database possible;
  2. Recognition that the large amount of raw data (106 soundings) prohibits exhaustive error-checking procedures, such as manually inspecting each sounding;
  3. A desire to retain all data values present in the original soundings, thus allowing users to make their own determination of data quality.

The first step involved flagging obvious errors, such as negative wind speeds, negative geopotential heights, etc. All soundings were then passed through a seasonally adjustable limits check to screen additional errors. No guarantee was made that this routine identifies all errors, and it is recognized that, on occasion, what is flagged as errors may in fact be valid data. As such, all obvious and probable errors are simply flagged, but the original data is retained.

The seasonally adjustable limits check worked as follows. All data from all stations for 1987 were stratified by season into 15 atmospheric layers bounded by pressure levels. Seasons are defined as December through February (winter), March through May (spring), June through August (summer) and September through November (autumn). Since all checks are based on pressure, no check on the pressure values, themselves, have been performed.

Initial frequency histograms of geopotential height, temperature, wind direction and wind speed were compiled for each layer and for each season in 1987. Extreme outliers were eliminated by discarding values more than four standard deviations from the respective means. Since wind speed does not follow a normal distribution, these values were first converted into log wind speed. Means and standard deviations for the remaining data were then recomputed. Using these data as representative sample means and standard deviations of the complete data set, we then flagged any value which was greater than plus or minus four standard deviations from the sample mean for the respective season and layer. The means and standard deviations used in the limits check are given in Appendix 3.

An implicit assumption in the error-checking routine is that the layer mean is representative of the mean for any level within that layer. Since a large number of atmospheric levels are used in the check, this is a tolerable assumption for testing temperature, wind direction and wind speed. It was found to be inappropriate for geopotential height; however, due to the logarithmic decay of pressure with increasing elevation.

To check the geopotential heights, we made use of this logarithmic relationship: Taking the log of pressure at the bottom (P1) and top (P2) of the layer in which the observed pressure (P) fell, a weight W was calculated:

W = [LOG(P)-LOG(P2)]/[LOG(P1)-LOG(P2)]. (1)

Next, ZL1 and ZH1, respectively, are defined as the lowest and highest allowable geopotential height at the base of the layer, and similarly, ZL2 and ZH2 as the lowest and highest allowed limits of the top of the layer (with limits taken as +/- 4 standard deviations from the mean). We then calculate ZL and ZH, which, by incorporating the weight W, define the limits of allowable geopotential height for the value of P:

ZL = W*[ZL1-ZH1] + ZH1 (2)

ZH = W*[ZL2-ZH2] + ZH2 (3)

The allowable value of Z at pressure level P in the sounding is thus

Equation 4(4)

As part of the error-checking routine, the 1000-mb mandatory level was deleted if the corresponding geopotential height was less than the reported station elevation. Frequently, situations were encountered in which a level was reported (i.e., pressure or geopotential height data were given), but all other variables had missing value codes. These levels were deleted.

Since atmospheric moisture can be highly variable, with considerable uncertainties in cold arctic conditions, the dewpoint depression data were simply screened to flag negative values and any values exceeding an arbitrary high threshold of 35 degrees C. Those wishing to use these data are referred to Elliot and Gaffen (1991) who provide an overview of problems in rawinsonde moisture data.

*Experience has shown that these error checks are of limited value to the user. They have hence been dropped for 1992, onward. FORTRAN codes providing more comprehensive error checks is available from M. Serreze.