This post will address some issues I’ve known about for a while, but never got around to discussing formally or systematically. The issues are

- how the chosen base period of a spatially-averaged temperature time series affects the trend
- how the spatial coverage, which is a function of data coverage and base period choice, affects the trend.
- how adjusting the base period of one series of spatially-averaged anomalies to another base period affects its trend

This issue is very relevant because many blogs compare multiple global temperature time series that use different base periods so all but one time series has to be adjusted. Adjusting a time series of anomalies causes all of the anomalies for a specific month to move up or down by a constant amount. While the shifting is month-specific, the difference between the two series is an annual cycle which has no effect on the resulting trend.

For a single time series representing a single station or point on the Earth, adjusting the series incurs no error. However, when dealing with spatially-averaged anomalies, the logic behind adjusting the base period is erroneous. Consider a time series of spatially-averaged temperature anomalies relative to 1961-1990. We want to compare it to another time series of spatially-averaged anomalies that are relative to 1981-2010. To adjust the first series to the base period of the second series, we calculate the first series’ mean January anomaly from 1981-2010, the mean February anomaly from 1981-2010, etc. Then this long-term monthly mean series is subtracted from the first series. Now both series average to zero over 1981-2010. The problem is that there is an implicit assumption that the January anomalies from 1981-2010 all represent the same spatial extent. Furthermore, when subtracting this mean January anomaly from all the January anomalies, that same assumption is being made for each January anomaly. This means that the January 1880 anomaly represents the same spatial extent as the January 1956 or January 1987 anomaly. This is obviously wrong. But what difference does it make?

To investigate, I’ll use GHCN v2, CRUTEM 3 and GISTEMP’s (1,200 km) gridded data. I interpolated the GISTEMP data into a fixed-offset 5° x 5° grid to match GHCN and CRU’s spatial resolution. I calculated anomalies at the grid cell level relative to 1901-1930 up to 1981-2010 in steps of ten years and required that at least 20 years of data to be present to calculate a valid long-term mean. To properly calculate the spatial averages of these land-only data, I used a land mask to adjust the grid cell weights to avoid overweighting grid cells with both land and ocean present. However, for the GISTEMP data, I also calculated an additional average without adjusting the grid cell land fraction. This is done because GISTEMP’s land-only data extends far beyond land because of the 1,200 km smoothing algorithm and I don’t want that data masked out. The figure below shows the GISTEMP temperature anomaly for December 2010 to illustrate how far beyond land the data extends.

The presence of a few islands within 1,200 km of a grid cell center can fill in a lot of ocean area. The global averages of the three data sets I’m using are calculated differently by their respective creators. From IPCC’s AR4 WGI report,

The global average for CRUTEM3 is a land-area weighted sum (0.68 × NH + 0.32 × SH). For NCDC it is an area-weighted average of the grid-box anomalies where available worldwide. For GISS it is the average of the anomalies for the zones 90°N to 23.6°N, 23.6°N to 23.6°S and 23.6°S to 90°S with weightings 0.3, 0.4 and 0.3, respectively, proportional to their total areas.

As an aside, the GISTEMP method isn’t as simple as it sounds. They apply the reference station method at all 100 equal area sub-grid cells inside the primary 80 equal-area grid cells. From these data, they calculate zonal means and then the global mean. Also, as per Dr. Ruedy, the formulation above is no longer used. They compute the global mean as 0.3*T(23.6:90) + 0.2*T(0:23.6) + 0.2*T(-23.6:0) + 0.3*T(-90:-23.6). This change was made to make the global mean consistent with the mean of the hemispheric means since the GISTEMP group had receive lots of inquires about apparent inconsistencies with these means. For this analysis, I will calculate the global means using the old formulation and I won’t attempt to implement any reference station method.

I ran my calculations using all three methods to see how sensitive the results are to the spatial averaging method. I calculated trends for two periods: 1880-2010 and 1981-2010. The figure below shows the trends calculated for these two periods as a function of the chosen base period for all three averaging methods, starting with simple area-weighting.

Recall that adjusting the base period of spatially-averaged anomalies doesn’t change the trend. Here we see that the trend *should* change to reflect the differing resultant spatial coverage. Comparing trends among adjusted anomalies might lead to false conclusions about their relative sizes. The most proper way is to adjust anomalies at the grid cell level.

GHCN and CRUTEM show very similar variability in their trends for the 1880-2010 period though CRUTEM is consistently larger. These two data sets generally show increasing warming given more recent base periods for the 1981-2010 period. Masking out the “excess” data in GISTEMP consistently increases its trend. For the four recent base periods, starting with 1951-1980, the three data sets (including GISTEMP masked) show convergence for the 1981-2010 period.

Calculating the global mean from hemispheric means, GHCN and CRUTEM still show a strong correlation over longest period as well as increasing trends in the recent period. GISTEMP’s base period sensitivity diminishes significantly relative to the simple area-weighting case. The above noted convergence remains for GHCN and CRUTEM, but GISTEMP pulls a bit higher than before.

The zonal mean-derived global averages show the trends for the recent 30-year period in the strictly land-only data are noticeably lower relative to the other two averaging methods. To try to understand what explains some of the differences between the trends among the four data sets, we can look at the fraction of land area accounted for in each series’ spatial average. First, let’s look at GHCN’s spatial coverage for each base period.

This figure should be familiar. It is very similar to the well-known GHCN station count. Note that the earlier the base period, the more stable the spatial coverage is from about 1920 to 1990. The two base periods that offer similar and the best overall coverage are 1951-1980 and 1961-1990. The presence of a strong annual cycle in the 1971-2000 series suggests that for many grid cells, calculating the climatology failed for at least one month and those grid cells go missing for that same month or months each year. All the series show drops in coverage at 1990 ranging from small to large. Now let’s look at CRUTEM’s spatial coverage.

CRU and GHCN show strong similarities but what stands out is that the 1990s drop is more gentle in CRUTEM than GHCN. One could use the word “decline” instead of “drop”. A prominent annual cycle also shows up in the 1971-2000 base period. Now we’ll look at GISTEMP’s coverage.

Yes, you are reading the graph correctly. GISTEMP’s land surface area is greater than the total land surface! I won’t dwell on this one because of the obvious difficulties in interpreting such numbers. To be able to make more realistic interpretations, we can look at the spatial coverage of the GISTEMP data when it’s restricted to land-only.

GISTEMP’s 1,200 km smoothing method gives it a head start of almost 0.5 over CRUTEM and GHCN. At about 1920, the spatial coverage stabilizes for the first five base periods. The remainders jump at about 1955 to full coverage. The masked GISTEMP trends for 1981-2010 corresponding to the four most recent base periods are very stable because the spatial coverage isn’t changing nearly as much as the other two data sets.

The figure below shows the calculated linear trends (°C/decade) vs. the maximum spatial coverage (fraction of total land area) for each data set for each trend for each spatial averaging method. Also given are the correlation coefficient and a p-value.

Over the full 1880-2010 period using data calculated with simple area-weighted averages, CRUTEM and GISTEMP (masked) show numerically and statistically insignificant correlation between their respective data sets’ trends and maximum spatial coverage. GHCN and GISTEMP however do show numerically and somewhat statistically significant correlations. Interestingly, the more “spatial” coverage GISTEMP has, the lower the trend.

The hemispheric-mean derived data show a similar correlation for GHCN while GISTEMP’s correlation goes up a bit. GISTEMP continues to show lower trends with greater spatial coverage.

With the zonal-mean derived data, GISTEMP masked how behaves more like its unmasked counterpart. GHCN and CRUTEM’s figures remain very constant across the different spatial averaging techniques. Now we’ll look at the figures for the recent 30-year period.

With simple averaging, the linear trends are all showing numerically and statistically significant correlation with the spatial coverage. GISTEMP masked and unmasked show an almost perfect negative correlation!

The hemispheric-mean derived data show that relationship between GHCN, CRU and GISTEMP’s trends and spatial coverage don’t noticeably change with this averaging method. The stand-out is GISTEMP masked whose correlation has flipped sign!

The zonal-mean derived data show similar correlations in magnitude and sign as the other two methods. I’ll conclude that the trend vs. spatial coverage relationship is not particularly sensitive to the spatial-averaging method. This came as a surprise to me because I thought that calculating global means from at least hemispheric means if not zonal means would greatly dampen the variability in the temperature anomaly due to changing spatial coverage. I guess I was wrong.