Soil

With the lognormal assumption, problems with error structure diminished, and more reasonable prediction intervals were obtained. While the differences between distributional assumptions diminished when aggregating data of single chambers to an annual value, differences were important on short timescales and were especially pronounced when aggregating across chambers to plot level.

Hence we recommend as a good practice that researchers report plot-level fluxes
with uncertainties based on the lognormal assumption.
Model data integration studies should compare
predictions and observations of soil

Instantaneous measurements of soil

Derivation of ecosystem-scale

One challenge is spatial heterogeneity paired with a limited number of
measurement locations, which together constrain the
precision of the plot-level aggregated flux

A second challenge is posed by a large component of random error.
It originates from intrinsic fine-scale process variation such as microbial
metabolic pathways, gas diffusion, or microbial population dynamics
and, to a smaller extent, from instrumentation error
and flux calculations

The error distribution model becomes important for studies of
model data integration, inverse modeling, and data assimilation

In this study, we tackle this second challenge of analyzing and aggregating flux data associated with random error. We evaluate the assumption of random error being lognormally distributed as an alternative to the assumption of additive random error from a normal or Laplace distribution.

The lognormal distribution describes measurements with a more or less
skewed distribution. It is defined as a continuous probability distribution of
a random variable whose logarithm is normally distributed. Such distributions
often arise when values are not
negative, such as the usual case with soil

The objectives of this study are, first, to demonstrate that using the lognormal
assumption leads to improved analysis of soil

Using observed fluxes of four automated soil

Data were collected at the ES-LMa FLUXNET site near
Majadas de
Tiétar, Extremadura, Spain (39

Each measurement has uncertainty, and this uncertainty can be characterized by
a density distribution. For similar environmental conditions, observed fluxes (

Equations (1) and (2c) are extreme cases
of a hierarchical model that accounts for both types of error
(Appendix

Error terms are the difference between observed fluxes and a true basic flux. The true flux is unknown but can be estimated by the average flux under similar environmental conditions.

A simple method of estimating the absolute error terms is daily differencing, excluding days with and after rain events

An alternative method is the lookup table approach (LUT).
It is commonly used in the marginal
distribution sampling method

A third alternative is modeling the base flux by its relationship
with ancillary observations, such as temperature.
We tried modeling the

When using the lognormal assumption, daily differencing was applied to the
log-transformed observed fluxes,
whereas for the LUT approach the difference between observed
and mean flux was computed
with the log-transformed values

The aggregation across time (Sect.

The correlation cannot be computed by the uncertainties of individual fluxes,

Gaps in the flux time series have to be filled before
computing the annual aggregated flux. Shorter gaps were filled using the LUT
with a window size up to

For the plot-level annual aggregation, we estimated the fluxes during long gaps by the mean flux of the other chambers. Using this mean of the other chambers is not fully statistically valid, because one should correct for the chamber offsets that vary slowly across time. However, this was the best estimate we could get for the dataset used.

We are interested in the value and the uncertainty
of the flux aggregated across time and across the replicate chambers of the
recorded measurement.
Across chambers we analyze a sample of four replicates.
Across time, we are concerned with the propagation of the random variability
induced by the random variations (measurement error and process variation)
of the individual measurements (Eq.

The uncertainty of the aggregated value – here, the mean across
several soil

If IE is dominating, the error is usually well described
by independent normal distributions with a mean of 0 (Eq. 1c) with the
well-known error propagation rules (Eq. 5).

However, for time series usually one must consider autocorrelation,
where successive measurements are not independent of each other,
i.e., where knowing the random error of one measurement
holds information for predicting the error of other measurements close in time.
One has to add covariance terms when summing variances.
For autocorrelated series this leads to formulas dependent on the effective
number of observations (Eq.

In the studied case,

Hence, the uncertainty (

For gap-filled records the residual error is missing. Hence, those records do not contribute to the number of effective observations. However, they are included in computing the mean aggregated flux.

An overview of the properties of the lognormal distribution is provided in
Appendix

For aggregating fluxes across chambers,
we first log-transformed each observed flux,

For aggregating fluxes of a single chamber across time, we considered the error
term in each half-hourly measurement as a realization of a lognormally
distributed random variable. The propagation of the error to the sum of such
random variables (Eq. A7a) requires the distribution
parameters.
Hence, these parameters,

At observations of low fluxes the instrumentation error component cannot be neglected and the lognormal assumption is violated. Such observations were treated as gap-filled for most of aggregation scenarios; i.e., they contributed to the expected value but not to the error propagation for the mean flux.

For applying these concepts to researchers data, we provide well-documented code in two publicly available packages for the R language.

Computing fluxes from series of concentrations measured inside chambers is
provided by the package

Utilities dealing with lognormally distributed data are provided with the package

The distribution of error terms obtained by daily differencing had strong
tails, while
when applying the daily differencing to log-transformed values,

Quantile–quantile plots compare the sample quantiles of observation error
to theoretical distribution quantiles. The closer the points to the
displayed

On the original scale the error magnitude (standard deviation of error terms across days) scales with flux magnitude (top). Log transformation avoids this problem (bottom). Columns correspond to different chambers.

We compared the aggregation of half-hourly fluxes across four neighboring
chambers using the
lognormal assumption (Sect.

Observed fluxes for neighboring chambers (symbols) and aggregated across chambers: expected values (lines) and 95 % prediction interval bounds (shaded areas). Crosses denote gap-filled values. The lognormal approach avoided negative lower prediction interval bounds with hot moments, for example with rain events on 4 April.

The expected value of the aggregated fluxes across the 48 half-hourly
measurements per day was the same across distributional assumptions. It
corresponded to the mean of the observed values.
The width of the 95 % prediction interval was similar for most records but
differed in a few cases (Fig.

Instances where the lognormal assumption resulted in much wider prediction
intervals occurred on days with very low fluxes.
In these cases the process variation, which scales with the flux, is small
compared
to the instrumentation error, and the assumption that error is dominated by the
multiplicative
component (Eq. 2c) is violated.
Those cases need to be treated differently.
One way of counteracting the resulting overestimation of
uncertainty is setting a gap-filling flag for the uncertainty estimate of very
small fluxes (Fig.

The difference in width of the 95 % prediction intervals for daily
aggregates between distributional assumption is shown
by box plots on an absolute scale

Contrary to the short-term aggregation, distribution of annually aggregated
fluxes of
each chamber did not differ much between the two approaches (Fig.

The combined temporal annual and cross-chamber aggregation to
the plot level can be done with two alternatives.
With one alternative, temporal aggregation (using either the normal or lognormal
assumption) is done first, and aggregation across replicates (using the lognormal
assumption) estimates is done using the annual estimates of each chamber.
With the second alternative, the aggregation across replicates is done first for
each half hour across all chambers, and these plot-level fluxes are then
aggregated across time.
The latter “replicate-first” alternative yielded lower uncertainty estimates
(standard deviation of 0.005 instead of 0.02

With the lognormal assumption the distribution of random error can be
inspected on a log scale rather than on the original scale. This improves two
distributional problems

Although the increase of variance could be handled alternatively
by an explicit error model in generalized regression or flexible cost functions in model inversion

The increase of variance with flux magnitude also created the pattern of
apparent Laplace distribution (Fig.

We assumed that, if a lognormally distributed process variation dominates the
observation error of single chambers, then such a process variation also
dominates the differences between chambers. Hence, we assumed also a lognormal
distribution of measurements across several chambers.
With only four replicates, we cannot inspect distributional properties.
However, using the lognormal assumption was
especially important for periods of high variability across chambers,
which occurred at
the Majadas de Tiétar site mostly during the dry summer period, similar
to findings of

At the Majadas site, we attribute negative fluxes to measurement error;
however, negative fluxes can be real, especially at sandy alkaline soils
with low decomposition, i.e., with sparse vegetation. There are abiotic causes
for these negative fluxes: carbonate dissolution, soil air shrinkage with
temperature and pressure, and

Nevertheless, the lognormal assumption is not applicable at karstic soils with a high proportion of conditions with real negative fluxes. However, if the proportion of observations with conditions for negative fluxes is low, these conditions can be flagged and the observations can be handled similar to gap-filled records or records where measurement error dominates, which contribute to the expected value but not to the uncertainty estimate.

Annually aggregated mean flux estimates (symbols) and their 95 % prediction interval bounds (bars) are of similar width for normal and lognormal assumption. The x axis denotes different chamber locations. The aggregation excluded long gaps, which led to different aggregation periods and differences across chambers.

Further, we explored consequences of aggregating measurements
of a single chamber across
time using the lognormal assumption compared to classical aggregation using the
normal assumption. A single chamber measurement representing
a time period can be assumed
to be a normal or a lognormal random variable. These assumptions resulted in
different aggregated uncertainties when aggregating across a few days (Fig.

However, we encountered a problem when fluxes were very low,
where the instrumentation error component
becomes dominant and the lognormal assumption is violated.
If the lognormal
assumption is applied to such cases, time aggregation leads to overestimation
of uncertainty, because it overestimates the multiplicative error.
Those records need to be flagged similar to gap-filled records
before aggregation using the lognormal approach (Sect.

When half-hourly measurements of a single chamber were aggregated to longer timescales such as to annual aggregates, the differences in uncertainty bounds
between distributional assumptions decreased (Fig.

Also the skewness disappeared (Fig.

Overall, we suggest using the lognormal assumption for aggregating across fluxes from replicated chambers but the normal assumption for aggregating half-hourly observations of a single chamber across time with number of records exceeding, say, 40.

When deciding whether to first aggregate across chambers or across time,
the “cross-chamber-first” alternative
(Sect.

Our finding on the suitability of the model of a multiplicative, lognormal
process variation sheds new light on the process variation, i.e., the as-yet-unattributed soil processes that
generate random fluctuations in soil

We propose the alternative hypothesis based on small-scale spatial heterogeneity
and stochasticity of the temperature sensitivity of chemical reactions involved.
Metabolic rates associated with microbial communities differ across
micrometer distances in their temperature sensitivity, and these metabolic
rates in turn largely drive respiration and soil

To obtain plot-level estimates of soil

Estimate error terms by daily differencing or, preferentially, LUT.

Fill gaps in the data and flag gap-filled records.

Flag low-flux conditions where instrumentation error is dominating or where real negative fluxes can occur.

Aggregate data of single chambers across time.
For confidence or prediction intervals, take care of autocorrelation.
Use the lognormal assumption if the aggregation runs over a limited number
of observations, less than 40, say.
Take care of flagged values that should contribute to
the estimated flux but should not contribute to the flux uncertainty
(Sects.

For plot-level estimates aggregate the time-aggregated estimates across several chambers using the lognormal assumption, as the last step.

In model data integration compare predictions and observations of
soil

The presented methodology and
tools
will help researchers to better analyze soil

This section compiles the properties of the lognormal distribution that are most relevant to using the lognormal assumption when aggregating observations.

The density of the lognormal distribution is described by two parameters
(Eq.

Traditionally, parameters are
given on a log scale, where the location parameter

The first two moments, i.e., the expected value and the variance, are given by
Eq. (A2).
The
expected value is larger than the median,

Equation (A2b) relates the standard deviation
to the relative error, i.e., the coefficient of variation:

The parameters of the distribution can be estimated by the log-transformed
sample (Eq. A4).

The quantiles of the lognormal distribution are derived from the quantiles
of the normal distribution (Eq.

Density distributions of lognormal distributions (lines)
get closer to normal density
(shaded area) as multiplicative standard deviation

For example, the
97.5 % quantile of the standard normal distribution with

The product of several lognormal random variables is again lognormally distributed, because the sum of normally distributed random variables on a log scale is again normally distributed.

For the sum of several lognormal random variables, to date, there is no closed
formula known. However, it can be approximated by a lognormal distribution, and
the parameters of this distribution can be found by various methods

There might be flagged terms that should contribute to the sum
but should not contribute to the reduction of relative uncertainty with error
propagation across many terms.
Examples are gap-filled values or observations where a proper estimate of the
multiplicative uncertainty is missing.
In this case,

The multiplicative standard
deviation,

The observations of

Such a LGM can be estimated using INLA

Instrumentation error is of magnitude

If the lognormally distributed variation is small compared to the normally
distributed one, it can be neglected. The model then simplifies to Eq. (B2).

If the normally distributed variation is small compared to the lognormally
distributed one, it can be neglected. The model then simplifies to Eq. (B3).

During periods with similar environmental conditions, we can model

We fitted model Eq. (B1) to the data of chamber 2
for a 5 d period in April using INLA

During shorter (

Compared to the single-chamber fit, the estimate of the normal error
decreased further (

This indicates that assumption of negligible instrumentation error
compared to the lognormally distributed process variation
(

In addition to the model with both error terms, we fitted models
with only one of the error terms included and compared models
by the deviance information criterion (DIC)

As an outlook, we will study if this approach can be extended to longer periods and adapted to more complex models with time-varying differences between chambers.

Autocorrelation between error terms is important for propagation of the uncertainty when aggregating over time (Sect.

The coefficients of the empirical autocorrelation function of the error terms,

Autocorrelation in error terms on the original scale was less strong than autocorrelation in error terms on a lognormal scale (Fig.

Empirical correlation coefficients are stronger between residuals on a log-transformed scale (bottom).

Number of unflagged observations and number of effective records after accounting for autocorrelation in residuals.

The essential functions for dealing with the lognormally distributed measurements
and their aggregation have
been implemented in the openly available R package

The data used for this study are accessible at

The supplement related to this article is available online at:

TW analyzed the data and took the lead in writing the manuscript. All authors contributed to the writing and discussion. KM and TSEM maintained the chambers. MM designed the Large-Scale Manipulation Experiment (MaNiP) and contributed greatly to scientific discussions.

The authors declare that they have no conflict of interest.

Tarek S. El-Madany, Mirco Migliavacca, Oscar Perez-Priego, and Kendalynn Morris thank the Alexander von Humboldt Stiftung for financial support of the MaNiP project. We want to thank Marco Pöhlmann, Olaf Kolle, Martin Hertel, Gerardo Marcos Moreno, Ramón López-Jimenez and Arnaud Carrara for helping us to maintain the automatic respiration chambers. Numerous comments from two anonymous referees improved the paper.

The article processing charges for this open-access publication were covered by the Max Planck Society.

This paper was edited by Salvatore Grimaldi and reviewed by two anonymous referees.