Skip to content

pjbartlein/mp-interp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mp-interp

Mean-preserving interpolation

This repository contains Fortran 90/95 code that implements four approaches for mean-preserving interpolation as might be applied, for example, to the interpolation of pseudo-daily values of climate data from monthly means, that when averaged, reproduce those monthly means. (The often-used linear interpolation approach does not do so.) The four methods implemented here include those by:

  • Epstein, E.S. (1991), On obtaining daily climatological values from monthly means, J. Climate 4:365-368;
  • Harzallah, A. (1995) The interpolation of data series using a constrained iterating technique, Monthly Weather Review 123:2251-2254;
  • Killworth, P.D. (1996) Time interpolation of forcing fields in ocean models, J. Physical Oceanography 26:136-143; and
  • Rymes, M.D. and D.R. Meyers (2001) Mean preserving algorithm for smoothly interpolating averaged data, Solar Energy 71:225-231.

In the discussion here, the example cases are considered to be single- or multiple-year time series of monthly data, from which we seek mean-preserving daily values, i.e. daily values that when averaged (or accumulated) over a particular month yield a mean value equal to the monthly input data, something that in general simple linear interpolation does not provide. The approaches can be generalized to other situations if we regard the individual years as "outer intervals", months as "inner intervals", and days as "subintervals," for which interpolated values are sought. For example, the Harzallah (1991) approach was illustrated using a time series of annual values (the inner intervals, in this case), from which interpolated monthly subinterval values that preserved the annual means were sought.

The Epstein (1991) approach is based on a harmonic analysis (or harmonic regression) of, for example, a year's worth of monthly mean "control" values, where the basic harmonic analysis is recast as the integration of the control values over the year; this assures preservation of the means of the interpolated values. The function fit to the control data is then evaluated at "target" subinterval (e.g. daily) values. The method is periodic, in the sense that the curve defined by the interpolated daily values "wraps around". It is calculated directly; no iterative fitting or improvement is involved. Like all of these methods, it is prone to "overshooting" (which, in practice, is a necessary feature (as opposed to a problem) for reproducing the means). Because it is applied a year at a time, when applied to a time series of many years of monthly data, small discontinuities in the interpolated daily data will occur between years. In the implementation here, these are "fixed" by smoothing the interpolated daily over the end of one year and beginning of the following year.

The Harzallah (1995) approach involves fitting a spline to the control (e.g. monthly) data, and evaluating the spline at target (e.g. daily) values. Interpolated daily values using the initial spline fit are not necessarily mean preserving. The method then finds the "residuals" between the initial (e.g. monthly) means and the means of the interpolated (e.g. daily) values, and fits a second spline to those values. This procedure is repeated until the residuals are acceptably small, and then the interpolated values for a particular "target" day are taken as the sum of the interpolated values across iterations. The method is not periodic, but can be made so by padding the beginning and end of a year with data from the end and beginning of the year, respectively. Year-to-year continuity can be had by padding the beginning of any particular year with data from the end of the previous year, and padding the end with data from the next year. Harzallah used natural cubic splines, and additional spline-fitting methods are implemented here (e.g. Akima's method and a piecewise-cubic Hermite spline).

The Killworth (1996) approach is a direct, as opposed to iterative approach, and is also non periodic. It uses linear interpolation, applied to a set of "adjusted" (or "pseudodata") control values, that when interpolated, produce values that preserve the means of the original control data. The adjustment is made by inverting a tridiagonal matrix (with elements chosen to smooth the data), and postmultiplying it by the original control-data vector to yield the adjusted control data. The interpolated values form linear seqments, which relative to the other approaches may seem too unrealistic. Like the Harzallah approach, the Killowrth approach can be made periodic via padding.

The Rymes and Meyers (2001) approach involves iteratively "correcting" a initial set of interpolated values for each "subinterval" (e.g. day). Their approach is unique, it that it allows for the imposition of upper and lower bounds for the interpolated values, which avoids strong-arm corrections of interpolated values for variables like precipitation, which should not be negative. The number of iterations required may be quite large, but the number can be substantially reduced by beginning with initial linearly interpolated values.

All of these methods are related to some extent. For example, the updating equation (eqn. 3) in Rymes and Meyers nmnew1 = nAnmold1 resembles eqn 4 in Killworth, nAnd'1 = nd1, and it also the case that cubic splines (as in Harzallah) can be fit by solving an equation like nAnxn = nb1 for x. (In each case, A is a tridiagonal matrix.). What differs among the methods is the iterative refinement of the interpolated values.

Fortran implementation

The approaches are implemented here as modules that contain a subroutine that manages the application of the specific algorithms to series of, for example, years, which are referred to here as "outer intervals". For each outer interval, the subroutine collects the appropriate "inner interval" (e.g. monthly) data, along with the number of "subinterals" or target points for which interpolated data are desired. These data are then passed to another subroutine that gets the interpolated values for the outer interval. These modules are named mp_interp_epstein_subs.f90, mp_interp_harzallah_subs.f90, mp_interp_killworth_subs.f90, and mp_interp_rymes_and_meyers_subs.f90, respectively.

There is a module of utility subroutines, including one called enforce_mean(). None of the approaches are exactly mean-preserving. The Harzallah (1991) and Rymes and Meyers (2001) approaches are iterative, and only approach mean-preserving using a reasonable number of iterations. At first glance, the Epstein (1991) approach should be, but the algorithm was derived assuming that the inner intervals (e.g. months) are all of the same length. In the Killworth (1996) approach, rounding errors in the matrix inversion contribute to inexact matching of the means. In the Epstein (1991) and Killworth (1996) approaches, the differences between the control means and those calculated using the interpolated data are very small. All of the approaches have problems with variables like precipitation, which must be non-negative. In the applications here, negative interpolated values are simply set to zero for such variables. The role of the enforce_mean() subroutine is to adjust the subinterval interpolated values so that they match their respective inner-interval value. This is done by simply apportioning the difference across the (non-zero) subinterval values. In practice this could lead to small month-to-month discontinuities among the daily values, but these are not really noticeable.

The enforce_mean() subroutine includes an argument tol (tolerance) which specifies difference between the observed monthly values and the means of the pseudo-daily interpolated values that when exceeded, trigger the adjustment. Values of tol are typically 0.01 or 0.001 times the mean values of the data. Values of tol that are too large will lead to apparent differences between and "unadjusted" and "adjusted" mean values. Values of tol that are too small may lead to numerical instability when the actual differences between the observed and interpolated means are small. This can be easily seen in time-series plots or maps of the adjusted and unadjusted values.

The Harzallah (1995) and Killworth (1996) approaches require some external (to the code here) procedures. The Harzallah approach requires a spline-fitting routine, and the method has been tested with three: 1) a cubic spline implemented by John Burkhardt (https://people.sc.fsu.edu/~jburkardt/f_src/spline/spline.html, last accessed 8 Dec 2020), 2) a piecewise cubic Hermite spline also implemented by Burkhardt (in the same collection), and 3) Akima's method (akima697, https://calgo.acm.org/, last accessed 8 Dec 2020). Harzallah's approach was developed using cubic splines, but the piece-wise cubic Hermite spline, which enforces monotonicity of the interpolated values between control points, seems to produce less-dramatic "overshoots". Akima's method looks like compromise between the two.

KIllworth's (1996) method requires a matrix-inversion subroutine. Here the functions DGETRF(), and DGETRI() from the lapack95 library were used.

Validation

The four approaches were tested by attempting to reproduce some of the figures in the original articles. Figure 1 in Epstein (1991), Fig. 2 in Harzallah (1995), Fig. 5a in Killworth (1996), and Figs. 4 and 5 in Rymes and Meyers (2001) were extracted from .pdfs of the articles, and the mean inner-interval values (i.e. annual values in Harzallah (1995), monthly values in the others) were digitized. These values are shown in blue in the figures in the folder /test_methods/plots/. The interpolated subinterval values are shown by small red dots, and the inner-interval means of those values are shown by larger red dots. The differences between the published examples and their reproduction here are small, and probably related to the digitization of the original.

Comparison of the approaches

The four approaches were compared using "observed" daily and monthly-mean data for a twenty-year period (1997-2016) drawn from the NCEP-DOE Reanalysis 2 (https://www.psl.noaa.gov/data/gridded/data.ncep.reanalysis2.html). Two variables were used, near-surface (2 m) air temperature (e.g. air.2m.gauss.*year*.nc, where *year* is replaced by an appropriate value), also known as tas, and precipitation rate (e.g. prate.sfc,gauss.*year*.nc), also known as pr. Two gridpoints were selected, one from a region with a Mediterranean-like climate (California, 238.75°E, 38.75°N) and the other with a monsoonal climate (West Africa, 3.75°E, 8.75°N), to represent two instances of high seasonal contrast in precipitation with many daily zeros, which might be expected to produce problems for mean-preserving interpolation. The input data, both daily and monthly, are in the folder /data/source/, and the output data used to construct the figures are in `/data/interp/'.

The folder /figs contains a set of 16 plots, one for each method, variable and region, plus a set of four summary plots showing all methods for each variable and region. Each figure shows data for the full twenty-year period, plus a five-year period blown up to show more detail. The original daily values are plotted in the background in gray in each figure. For the sixteen method by variable by region plots (e.g. epstein_tas_monsoon.pdf, the input monthly mean data, formed by averaging the daily data, are plotted in two ways, as a step-plot (in black), and as black dot at the midpoint of each month. The interpolated (pseudo-) daily data are plotted in blue, and the monthly means of the interpolated daily data in red. Owing to the application of the enforce_mean() subroutine, the black and red dots can be seen to be coincident.

There are some subtle differences among the approaches that can be seen in the comparison plots (e.g. all_pr_med.pdf): The Epstein approach appears to produce the greatest amount of overshooting, but even so, it is not astonishing. The Rymes and Meyers approach appears to procduce the least amount of overshooting, but takes about twice as long as the others to execute. The Killworth approach, as mentioned previous, produces straight-line segments with abrupt changes in slope, which may in some contexts be regarded as too unrealistic. The Harzallah approach (in purple on the four comparison plots) appears to be a "middle-of-the-road" choice, with modest overshoots, and also a fast execution time.

Overall, there is more similarity than difference among the approaches; all produce satisfactory results. It should be noted, howver that none of the approaches produce pseudo-daily values that "look" like the observed daily values -- the interpolation methods are not "weather generators", but instead simply perform the taks of producing daily interpolated values that preserve the monthly mean values they are derived from.

Programs

The modules can be found in the folder /f90/modules/, and demonstration programs for producing the results described below can e found in the folder /f90/demo_programs/. The program test_epstein.f90, for example, was used to regenerate Fig. 1 in Epstein (1991), the program demo_ts_epstein.f90 was used to regenerate the data for the comparison figures, and the program demo_1yr_epstein.f90 produces simple output at the console for a single year of data.

The modules mp_interp_epstein_subs.f90, mp_interp_harzallah_subs.f90, are the same as those in the PaleoCalAdj repository at [https://github.com/pjbartlein/PaleoCalAdjust]. The module pseudo_daily_interp_subs.f90, is also the same, and includes some subprograms that are not actually used here.

The programs were run on Windows 10, using the Intel Fortran Compiler version 19.1 in the Microsoft Visual Studio Community 2019 development environment, and on MacOS Catalina using the gcc gfortran compiler version 10.2.0, in the Eclipse for Scientific Computing IDE version 2020-09 (4.17.0).

About

Mean-preserving interpolation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published