There are numerous networks and initiatives concerned with the non-satellite-observing segment of Earth observation. These are owned and operated by various entities and organisations often with different practices, norms, data policies, etc. The Horizon 2020 project GAIA–CLIM is working to improve our collective ability to use an appropriate subset of these observations to rigorously characterise satellite observations. The first fundamental question is which observations from the mosaic of non-satellite observational capabilities are appropriate for such an application. This requires an assessment of the relevant, quantifiable aspects of the measurement series which are available. While fundamentally poor or incorrect measurements can be relatively easily identified, it is metrologically impossible to be sure that a measurement series is “correct”. Certain assessable aspects of the measurement series can, however, build confidence in their scientific maturity and appropriateness for given applications. These are aspects such as that it is well documented, well understood, representative, updated, publicly available and maintains rich metadata. Entities such as the Global Climate Observing System have suggested a hierarchy of networks whereby different subsets of the observational capabilities are assigned to different layers based on such assessable aspects. Herein, we make a first attempt to formalise both such a system-of-systems networks concept and a means by which to, as objectively as possible, assess where in this framework different networks may reside. In this study, we concentrate on networks measuring primarily a subset of the atmospheric Essential Climate Variables of interest to GAIA–CLIM activities. We show assessment results from our application of the guidance and how we plan to use this in downstream example applications of the GAIA–CLIM project. However, the approach laid out should be more widely applicable across a broad range of application areas. If broadly adopted, the system-of-systems approach will have potential benefits in guiding users to the most appropriate set of observations for their needs and in highlighting to network owners and operators areas for potential improvement.
Observing aspects of the Earth system in a variety of ways from satellite and non-satellite platforms is a necessary condition but often insufficient to enable full understanding. Observing in real-world conditions is a tough proposition. The real-world is not a laboratory where repeated measurements under identical conditions are possible (e.g. Boumans, 2015). The real environmental state is constantly evolving both in space and in time. The real environmental exposure places practical limits on what can be measured, where it can be measured and to what degree of fundamental quality it can be measured. It is therefore inevitable that there shall be a wide range of measurement systems and capabilities with which to meet distinct user needs and applications and that these may be further limited by a combination of technological, financial, geopolitical or logistical considerations. The challenge is how to make sense of such a mosaic of capabilities in order to properly inform data users of the most appropriate subset of measurements for their specific applications (e.g. Bojinski et al., 2014), which, in many cases, were not foreseen when undertaking the original measurements. Here, we develop a system-of-systems framework approach to address this challenge and use as an illustrative case study identification of suitable non-satellite atmospheric observational series, which may be used to characterise satellite observations.
The Horizon 2020 project GAIA–CLIM aims to improve the usability of non-satellite measurements to characterise satellite measurements. A key step to achieving this is to identify, in as unambiguous a manner as possible, those non-satellite measurements that are suitable for such an application. These reference observations must be sufficiently well characterised (JGCM, 2008; Immler et al., 2010; WMO/BIPM, 2010), so that if a difference is found in the satellite data being compared, after accounting for inevitable co-location mismatch effects, we can be confident that the difference arises from the satellite and not from the comparator.
Were an appropriate, internationally accepted, method available for identifying suitable non-satellite data sources for such a purpose, then that would be used by the project. Surprisingly, to date there is no such accepted set of criteria by which to systematically evaluate the suitability for given applications across observing platforms and across networks using assessable metrics. Furthermore, while several international bodies refer to a system-of-systems-observing architecture (e.g. GEO, 2016), there is still no formal definition of either the layers of such a system-of-systems or their defining and assessable characteristics. Different groups and domain-area experts have alighted on distinct conventions (GCOS, 2014), which makes it difficult for the potential users to quickly and easily assess which of the large set of potential measurements available are most appropriate for their application.
Therefore, within GAIA–CLIM we have attempted to put forth an initial definition of formal layers of a system-of-systems approach and a set of assessable metrics with which to categorise non-satellite observations, building upon existing efforts. In this paper, we outline the approach adopted by GAIA–CLIM and the results of its application to a restricted set of observations, covering a subset of the Global Climate Observing System (GCOS) atmospheric Essential Climate Variables (ECVs; Bojinski et al., 2014). Besides explicit reference to a system-of-systems approach in the peer-reviewed literature by Seidel et al. (2009) and Bodeker et al. (2016), such a concept is also present in the grey literature, e.g. in several recent GCOS documents (GCOS, 2014, 2015) and a report by the US National Academy of Sciences (NAS, 2009). Although alluded to in these references, the defining characteristics of the layers are not clearly laid out. A formalised and systematically applied approach would help users and data providers to judge the usability of observational capabilities and hence to use the right measurements for a specific application.
The paper summarises key results from two GAIA–CLIM deliverables (GAIA–CLIM, 2015, 2016; further details are available therein) and is structured as follows. Section 2 provides the proposed system-of-systems designation and describes the characteristics of each layer. Section 3 provides an overview of the assessment process adopted by GAIA–CLIM that builds upon the Climate Data Record (CDR)-based assessment proposed by CORE-CLIMAX (Schulz et al., 2017; which in turn build upon Bates and Privette, 2012), modified to account for distinctions between measurements and derived data products. Section 4 describes the initial results of an application to a selected set of networks suitable to meeting GAIA–CLIM's needs. Section 5 discusses caveats and potential future developments and Sect. 6 provides a summary.
The system-of-systems approach adopted is illustrated in Fig. 1. This recognises that different observations exist for distinct purposes and the inevitable trade-off that exists between spatio-temporal observational density and aspects of observing quality. The designation of any candidate measurement series to a given layer should be a function of demonstrable measurement qualities such as traceability, metadata, comparability, data completeness, documentation, record longevity, measurement programme stability and sustainability. (Sect. 3). Before this, however, the defining characteristics of each layer in Fig. 1 need to be formally defined. These definitions build and expand upon GCOS (2014), with further details given in GAIA–CLIM (2015).
Proposed layers in a system-of-systems approach to be adopted within GAIA–CLIM arising from Seidel et al. (2009).
Reference-observing networks provide metrologically traceable
observations, with quantified uncertainty, at a limited number of
locations and/or for a limited number of observing platforms, for
which traceability has been attained.
The measurements are traceable through an unbroken processing
chain (in which the uncertainty arising in each step has been
rigorously quantified) to SI units, common reference points defined
by BIPM or community recognised standards (ideally recognised by the
National Measurement Institute), using best practices documented in
the accessible literature (Immler et al., 2010). Dirksen
et al. (2014) provide an example of the steps required to deliver
such a product. Uncertainties arising from each step in the processing chain are
fully quantified and included in the resulting data. Uncertainties
are reported for each data point. Individual components of the
uncertainty budget are available. Where uncertainties are
correlated, these are appropriately handled. Full metadata concerning the measurements is captured and
retained, along with the original raw data, to allow subsequent
reprocessing of entire data streams as necessary. The measurement and its uncertainty are verified through
complementary, redundant, observations of the same measurand on
a routine basis. The observations programme is actively managed and has
a commitment to a long-term operation. Change management is robust including a sufficient programme of
parallel and/or redundant measurements to fully understand any
changes that do occur. Unnecessary changes are minimised. Measurement technology innovation is pursued. New measurement
capabilities through new measurement techniques or innovations to
existing techniques are encouraged which demonstrably improve the ability to
characterise the measurand. These innovations must
be managed in such a way as to understand their impacts on the
measurement series before they are deployed.
Baseline-observing networks provide long-term records that are capable
of characterising regional, hemispheric and global-scale
features. They lack the absolute traceability of reference
observations.
The baseline network is a globally or regionally representative
set of observations capable of capturing, at a minimum, relevant
large-scale changes and variability. As such, a baseline network may
be considered a minimum and highest priority subset of the
comprehensive networks (Sect. 2.3) and should be actively curated
and retained. The measurements are periodically assessed, either against other
instruments measuring the same geophysical parameters at the same
site, through comparisons to NWP/reanalyses or through
intercomparison campaigns to provide understanding of the relative
performance of the different techniques in use. Ideally, such
intercomparisons should include reference-quality measurements. Representative uncertainties that are based upon understanding
of instrument performance or peer-reviewed literature are available. Metadata about changes in observing practices and
instrumentation are retained. The observations have a long-term commitment. Changes to the measurement programme are minimised and managed (by
overlapping measurements or measurements with complementary
instruments over the change), with efforts made to quantify the
effects of changes in an appropriate manner.
Comprehensive-observing networks provide high spatio-temporal density
data information necessary for characterising local and regional
features.
The comprehensive networks provide observations at the detailed
space and timescales required to fully describe the nature,
variability and change of a specific climate variable, if analysed
appropriately. They include regional and national operational
observing networks. Representative uncertainties based upon, e.g. instrument
manufacturer specification and knowledge of operations should be
provided. In their absence, gross uncertainties based upon expert or
operator judgement should be provided. Metadata should be retained. Although encouraged, long-term operation is not required.
The measurement system maturity matrix (MSMM) used herein, like its counterpart for CDRs developed under CORE-CLIMAX (Schulz et al., 2017), is a tool used to assess various assessable facets of a measurement series or measurement network. It assesses to what extent measurement best practices have been met. The assessment can be performed either on individual instruments/sites or for entire networks. It should be stressed that a given measurement's maturity is distinct from its applicability to a given problem, where additional concerns pertain, such as measurement location, frequency and scheduling. For example, a user interested in tropical processes cannot make use of a measurement in the polar regions and vice versa. Such aspects are user specific and cannot be captured within the matrices detailed herein.
There are six mandatory and one optional major categories where
assessments are made, which overlap with but are not identical to
those used to assess CDRs under CORE-CLIMAX (Schulz et al.,
2017). Where the categories overlap, in many cases the guidance
differs substantially to reflect the distinction between the
measurements and derived CDRs. The assessment categories are
metadata, documentation, uncertainty characterisation, public access, feedback and update, usage, sustainability, software (optional, completed only where appropriate).
Within each category are a number of subcategories. For each of these
subcategories, the assessment will assign a score from 1 to 6
(sometimes 6 is not used and/or 1 and 2 are identical criteria),
reflecting the maturity of that aspect of the measurement system.
The maturity can be considered in three broad categories that give
information on the scientific grade and sustainability of the
measurements being assessed.
Maturity scores 1 and 2 establish comprehensive
measurement capability (CMC, comprehensive network type
measurements): the instruments are placed in the field and recording
data but may not be well curated or metrologically understood and
calibrated. Maturity scores 3 and 4 establish a baseline measurement
capability (BMC, baseline network type measurements): these
measurements are better characterised and understood, and intended
to be run for the long-term. However, they lack strict traceability
and comparability. Maturity scores 5 and 6 establish a reference
measurement capability (RMC, reference network type
measurements): these measurements are very well characterised, with
strict traceability and comparability, and robustly quantified
uncertainties. The measurements are actively managed and curated,
and envisaged as a sustained contribution.
Assessment of results may require expert interpretation for each assessed measurement series, because the circumstances under which the measurements were taken may affect what maturity level can be reasonably expected to be attained. All relevant subcategory scores should be considered. From the data provider's perspective, such an assessment may inform strategic developments and improvements to the measurement programme. From the perspective of data users, the assessment should provide an indication of applicability to their intended use.
When considering an assessment of a network in certain categories or subcategories, it is likely to be appropriate to perform the exercise on a per-asset (instrument or site) basis, rather than a network-wide basis. This is particularly the case for the sustainability category but may also be applicable elsewhere if there are intra-network heterogeneities in protocols pertaining to, for example, metadata, uncertainty quantification or documentation. In such cases and where practical, the assessment should be performed individually on each unique subset and stored in the assessment report metadata. Both the network-wide mean score (or a representative score of “core” sites) and the range of scores should then be reported in the summary. Such a refined assessment helps to ensure both appropriate network subselection for certain applications and a fair assessment that may help network operators and coordinators to identify and address network internal issues.
An assessment should be conducted by an assigned leader, who organises the exercise, provides guidance to the participants and collects and analyses the results. Where a substantive assessment of the state of multiple networks, instruments or sites is being organised, it is recommended to create an additional supplement of specific assessment criteria details or “rules of the round”. This guidance should be retained alongside the completed assessments to permit full interpretation of the results.
Full guidance for assessment strands is given in GAIA–CLIM (2015), including detailed guidance notes to aid assessors. Here we reproduce (lightly edited for clarity) the guidance given for the first subcategory of metadata standards in full but thereafter provide solely a high-level overview of each category for brevity. All remaining subcategories contain similar tables and assessment guidance to that shown in the first subsection of Sect. 3.3.1. Readers wishing to perform an assessment should refer to the full in-depth guidance (GAIA–CLIM, 2015) and any subsequent update thereto.
Metadata are data about data, which should be standardised as completely as possible and adequately document how the measurement series was attained. This involves aspects such as instrumentation, siting and observing practices. The measurement system should use appropriate high-quality metadata standards, which permit interoperability of metadata. If an International Organization for Standardization (ISO) standard is defined, then the assessment in future would be against such a standard. However, at the present time no such universally agreed standard exists that pertains across all aspects of EO science. There are emerging efforts under WIGOS (WIGOS, 2015, 2017) to create universal metadata standards, and there are several de facto working standards such as CF-compliant file headers and formats. Unless and until an ISO standard is developed and applied, the assessors' judgement will be required as to the appropriateness of the standards being adhered to.
The six maturity scores in the metadata subcategory standards.
Note that it is likely that this subcategory can only be fully assessed by the measurement initiator. An external assessment can be made by asking the data provider directly or if the metadata and data are freely available from a portal (which would tend to indicate a mature measurement system). However, signs for used standards can be found by looking at the data record documentation and/or at a sample data file. The assessment can be made as follows:
Score 1 and 2 (no distinction made in this case between these two levels): no standard is considered. Data are made available solely as are with at most the geographical measurement location, time of observation and instrument-type metadata provided that enables use but prohibits measurement understanding.
Score 3: “standard identified/defined” means that the measurement originator has identified or defined the standard to be used but has not yet systematically applied it. The information about this can most often be found in format description documents or from statements on web pages.
Score 4: a systematic application requires that you can find the relevant metadata protocol identifier and details in every file of the measurement product and descriptions.
Score 5: the measurement provider has implemented procedures to check the metadata contents. This could be ascertained by a check on consistency of metadata header information in individual data files.
Score 6: this score will be attained if, in addition to mandatory metadata, additional optional metadata are collected, retained and transmitted. This score may not apply to some data streams where all metadata are considered mandatory but may help to differentiate truly well-performing measurement series in other cases, where metadata is differentiated into mandatory and optional classes, such as the WIGOS metadata standards (WIGOS, 2015, 2017).
Documentation is essential for the effective use and understanding of a measurement record. Although the category has three subcategories, it is possible that two or more of these may be covered by a single document.
A
This category assesses the practices used to characterise and represent uncertainty in a measurement series. Note that uncertainty nomenclature and practices must follow established definitions (JGCM, 2008) to attain a score of 5 or 6 in any of the subcategories.
This category relates to archiving and accessibility of the measurement record, how feedback from user communities are established and whether this feedback is used to update the measurement record. It also concerns version control, archival and retrieval of present and previous versions.
This category is related to the usage of measurement series in research applications and for decision support systems. Mature measurement series will have broad adoption and widespread and varied usage.
Sites of networks assessed for GAIA–CLIM. Each location denotes an observational asset capable of measuring one or more target ECVs at the surface, near surface or through much of the atmospheric column.
Example maturity matrix assessing the NDACC (Network for the Detection of Atmospheric Composition Change).
This category pertains to aspects of sustainability, and hence suitability, of any given measurement programme for scientific, operational and societal applications. For a measurement programme to be used in critical applications, its long-term sustainability must be assured. Where an international measurement network is being assessed, the network shall typically consist of individual measurement sites operated by distinct legal entities, with distinct funding mechanisms and in a variety of siting environments. In such cases, there are two options. One is to provide a typical score that is representative of the network as a whole but is then not indicative of the maturity of individual contributing sites. The alternative, preferred option is that this assessment be performed for each site, with the site-by-site scores retained as metadata associated with the assessment and the range of scores highlighted appropriately in the assessment summary by providing both a mean value and the range.
This major strand is optional and shall apply only to those measurements for which routine-automated and substantive processing occurs from the raw measured data to the provided geophysical parameters of the measurement series. Cases where this would be appropriate would include measurement series for which the directly measured parameter is a digital count, a radiance, a photon count or some other indirect proxy for the reported measurand, and processing exists to convert from the measured quantity to the reported quantity. Conversely, where the measurement constitutes a direct proxy for the measurand, such as a platinum resistance thermometer or anemometer, and the conversion is facile, the software readiness category is not appropriate.
We identified a total of 54 plausible networks and two aircraft permanent infrastructures for EO characterisation in the context of GAIA–CLIM activities (Appendix A provides a full accounting of these). These networks are those based upon expert solicitation to be most likely to constitute baseline or reference-quality measurement systems according to the criteria put forth in Sects. 2 and 3 and hence be usable in downstream applications within the project. The assessment results will thus a priori be likely to be at baseline or reference level relative to a holistic assessment of the entirety of the non-satellite observational capabilities. Such an assessment, while highly desirable, would constitute a far more substantive effort than was possible under GAIA–CLIM. We were able to complete or solicit assessments by third party contributors to 43 of these networks or subnetworks as part of the same research infrastructure (Appendix A). The assessed networks cover a broad range of geographical locations (Fig. 2). As expected, the most sparsely populated and remote regions of the Earth are where the density of measurement stations is lowest.
Per the developed guidance (Sect. 3; GAIA–CLIM, 2015) a set of rules of the round were agreed at the outset. The assessment was performed on a network-by-network basis given the available time and the project resource constraints. The maturity matrix collection has been carried out by co-authors based upon their individual areas of expertise and their involvement in several international measurement programmes and networks. Significant effort has been made to fill in the matrices consistently across networks. In those cases where filling in the matrix was considered challenging by co-authors, an assessment aided by the assessed network PI or other core members has been solicited (see acknowledgements). In such cases, the authors worked to fully support the network PIs to ensure a consistent compiling.
An example of maturity matrix collection is provided in Fig. 3 for the
Network for the Detection of Atmospheric Composition Change (NDACC;
NDACC, 2015) as filled in by NDACC working group co-chairs guided by
BIRA-IASB colleagues, who are active participants in managing this
network. The scores reported in Fig. 3 show that NDACC can be
considered, according to the MSMM, as a reference network in
subcategories like data traceability, but for the uncertainty
quantification, the network is currently assessed at baseline
level. All the matrices in the same form as shown in Fig. 3 for NDACC
are available on the GAIA–CLIM website
(
A summary of all results attained is given in Fig. 4 (cf. Fig. 3). Here, the single collated and agreed set of assessed maturity scores is given per network, even if several contributions were solicited (see Sect. 4.2). Most networks completed assessments for at least all the mandatory assessment criteria. Usage and software, given the agreed rules of the round, have the lowest level of completion. A synthesis of the results is given in Sect. 4.3, although readers interested in applicability of a given network to their application area may find it more useful to consider Fig. 4 and the equivalent figures to Fig. 3 from the GAIA–CLIM project website.
Summary of assessment results for all the networks that were assessed by GAIA–CLIM as detailed in Appendix A. Note that colour assignments follow the inline key given in Fig. 3.
The main issue in the use of the assessment tools developed is related to the inevitable and irreducible level of subjectivity involved. Even though quantifiable metrics are used and backed up by guidance (first subsection of Sect. 3.3.1 provides an example), interpretation shall vary from assessor to assessor. The guidance cannot envisage all use cases and there may be ambiguity as to the most appropriate categorisation because some but not all criteria for a range of possible scores for a subcategory may be met simultaneously. Assessor-to-assessor uncertainty has been evaluated through a redundancy exercise based on the compilation of the matrix for the same network by at least three assessors working independently for five networks: EARLINET (Pappalardo et al., 2015), GRUAN (Bodeker et al., 2016), TCCON (Toon et al., 2009), AERONET (Holben et al., 1998) and NDACC (Vigouroux et al., 2015; NDACC, 2015).
The outcome of the exercise shows a minimum uncertainty in the
attribution of the maturity matrix scores among the selected compilers
of
Range of original assessments
We now go through each of the five major assessment strands requested for all networks to ascertain any common strands or findings that may point to more systemic issues across many networks. This may in turn point to potential for remedial actions that can be undertaken by PIs and/or funders.
Table 2 reports the frequency of occurrence for the metadata category of the MSMM. Relevant international standards for metadata are assessed as having been adopted by most of the networks we have considered. Classification of file-level metadata also appears to be robust throughout most of the networks and includes for most of them complete location, file-level and measurement-specific metadata. Conversely, collection-level metadata for the majority of networks can still be improved. Such collection-level metadata serves to increase discoverability and usability of whole series and would constitute relatively little work for networks to address.
Frequency of occurrence of the maturity matrix scores for the three subcategories (standards, collection level, file level) of the main category metadata. Note that file level does not use the score category 6.
Table 3 reports the frequency of occurrence of scores for the documentation category of the MSMM. A high level of maturity is assessed for the provision of a formal description of measurement methodology. Most networks provide journal papers on measurement systems with updates published in a timely fashion. Formal validation reports are available via published reports or journal papers on product validation or on intercomparison to other instruments for most networks, leading to a prevalence of high scores in this category. For formal guidance on performing measurements, documentation in some form of manufacturer-independent characterisation and validation is provided by the majority of the networks. However, more of the networks attain baseline or comprehensive than reference scores overall in this subcategory, highlighting it as an area in which improvements could be made.
Frequency of occurrence of the maturity matrix scores for the three subcategories (formal description of measurement methodology, formal validation report, formal measurement series user guidance) of the main category, documentation.
Table 4 reports the frequency of occurrence for the uncertainty category of the MSMM. Routine quality monitoring is performed at a high level by most of the networks with a clear majority assessed as meeting standards expected of reference networks. Unfortunately, other aspects of the uncertainty strand show a much more mixed message, highlighting limitations across many of the networks to robustly assess and quantify uncertainty in the measurements to modern metrological (measurement science) norms and expectations. Measurement traceability is assessed as constituting a reference level only for about 50 % of the selected networks. Quantification of uncertainty is also of extremely mixed maturity level among the different networks, and only a few of them can be ranked with a score corresponding to the level of a reference network. Intercomparison and cross validation, which ensure measurement comparability, are well-established mechanisms of uncertainty quantification and validation in less than a half of the reviewed networks.
Frequency of occurrence of the maturity matrix scores for the four subcategories (traceability, comparability, quantification and routine quality monitoring) of the main category, uncertainty.
Table 5 reports the frequency of occurrence for the “public access, feedback and update” category. In general, it was not always easy to find detailed information on data usage which may lead to some heterogeneity in assessed scores not truly reflective of the underlying network maturity. Access to networks' public databases is high and, as such, most of the networks are assessed as being at a reference level. Updates to data records are mature for most of the networks along with long-term data preservation aspects. Conversely, systematic collection of user feedback is based on a robust mechanism only for a few networks and most of them are at or below a baseline level. Control of data version and preservation of the different versions also varies hugely across the networks, with most of them assessed as being at baseline level. User feedback and preservation maturity could be increased by many networks at little to no cost and would represent adoption of best practices.
Frequency of occurrence of the maturity matrix scores for the five subcategories (public access/archive, user feedback mechanism, updates to records, version control and long-term data preservation) of the main category – public access, feedback and update.
Table 6 reports the frequency of occurrence for the sustainability category of the MSMM. Most of the assessed networks provide high maturity scores across the board for this category. For most of the networks, long-term ownership and rights to the site are guaranteed and the site is representative. Most of the networks offer a robust scientific support framework provided by at least two experts, which includes active instrumentation research and development being undertaken. A programmatic funding support to the network activities is ensured and not dependent upon a single investigator or funding line, with only a few networks with expectation of follow-on funding (only in one case project pending). This refers to the network-wide assessment of the MSMM. Consequently, networks with heterogeneous funding structures, may have single sites which still face sustainability issues.
Frequency of occurrence of the maturity matrix scores for the three subcategories (sitting environment, scientific/expert support, programmatic support) of the main category, sustainability. Note that category six is not used in scientific/expert support.
Two main categories were deigned optional for the specific assessment round following the first few attempts at compilation: software (already optional) and usage (which, although mandatory in Sect. 3 guidance, numerous compilers felt unable to fully complete). Most of the maturity matrix compilers reported being either unsure of the definitions in these two categories or were not able to provide the requested information. In particular, the software category was not always able to represent the range of practices within the networks. However, it is worth noting that the usage category has revealed that, for most of the networks, societal and economic benefits and influence on decision-makers (including policy) of the provided data is still limited. The GAIA–CLIM activities, if successful, will increase usage for the specific case of satellite characterisation.
Based upon primarily uncertainty category scores we have
categorised the networks we classified as falling into reference,
baseline and comprehensive categories for the purposes of downstream
use within GAIA–CLIM. The designation for any assessment strand is
based upon the commensurate scores highlighted in Fig. 4. The selected
networks are those classified as reference for the uncertainty
category (score
The resulting network classifications can be used to map and visualise geographical measurement capabilities by ECV, vertical domain and measurement system maturity. As an example, in Fig. 6, water vapour networks classified as comprehensive, baseline and reference according to the MSMM for the category uncertainty are compared, and the “global” picture of all the networks measuring water vapour is also reported. In this realisation, networks measuring the vertical profile, the full column content or at the surface have not been differentiated from one another. This figure highlights that most of the networks are classified as baseline in their capability to report the measurement uncertainty. Most of the water vapour measurements are collected in the Northern Hemisphere and there is a clear lack of reference measurements in the Southern Hemisphere. Figure 7, which reports the same comparison for the documentation category of the MSMM, provides results consistent with Fig. 6. Similar maps are to be made available for all primary assessment strands and GAIA–CLIM ECVs, and an interactive mapping and visualisation tool is in the advanced stages of preparation.
Classification based upon primarily uncertainty MSMM
category scores of the existing networks at the global scale
providing in-situ water vapour measurement. Panel
Similar to the CORE-CLIMAX experience with CDR producers, there was originally a degree of user scepticism around the potential value of the assessment activity. Several networks remarked upon completion that the exercise had been useful and had led to discussions around potential innovations or improvements which could yield increased assessment scores in any future assessment but, more importantly, increase accessibility, usability and robustness of their measurement systems. This benefit was felt most strongly for those networks with a central mission to provide the highest possible quality measurements such as USCRN (Diamond et al., 2013; Leeper et al., 2015), TCCON, NDACC and EARLINET. For example, the results of the NDACC assessment were discussed in depth at their most recent annual meeting and this led to several suggestions for improvements.
In performing an assessment at the level of the network there were recognised limitations. In the results presented in Sect. 4, each maturity matrix refers to the “lowest common denominator” of the performance of networks' core stations. This implies that the network assessment might not be representative of the status of the measurements performed in all the stations of the network. This implies that for networks which exhibit a substantive degree of heterogeneity in aspects of their measurement systems, a further subselection assessment will be required by users prior to any data analysis.
Furthermore, as detailed in Sect. 4.2 there exists an irreducible degree of ambiguity in the performed assessments arising from assessor-to-assessor ambiguity in interpretation of guidance and/or knowledge of particular facets of the network operation. To this end, great care has been adopted by all the maturity matrix compilers to provide information as reliably as possible and, in a few cases, a plenary discussion with all the network representatives has been carried out and was felt to be useful. Even in such cases the number of assessors is limited relative to the sample size of opinions one would ideally seek to ensure robust inferences. This is to some extent unavoidable in that for any network there are only a handful of experts who comprehensively understand all necessary facets of the observational programme and its management. To ensure against misuse the individual network assessments presented online include a caveat to this effect.
Same as Fig. 6 but for the documentation category of the MSMM.
Two categories of the MSMM, software and usage, have been not considered robust enough at the current stage and were excluded from the final network assessment described in Sect. 4. They should not be adopted in future applications of the MSMM without further discussing and improving their usefulness and assessability.
After some debate amongst compilers it was agreed that for each specific assessment, the scores related to each of the subcategories must be retained and made available. The score from the main categories only is not felt to be appropriate and does not show the real value of the MSMM approach. To provide a value representative for each main category of the maturity matrix, if absolutely required, it was proposed that users must consider the minimum value from the related subcategories to avoid providing undue confidence and also to encourage consideration of the assessment at the subcategory level.
Finally, on a practical note, people filling in a maturity matrix have provided their scores in several different ways owing to the lack of a common collection template. If the MSMM is to be adopted as a tool for the self- or external assessment of a network, a new template should be provided that is able to meet the expectation of most of the compilers. An interactive online maturity matrix collection tool showing each category by clicking on the scores was considered, but its implementation was beyond the scope and the resources of GAIA–CLIM. It would, however, be a helpful development in future if broader adoption were foreseen.
The MSMM approach has been used in the first instance solely for the internal purposes of GAIA–CLIM. However, there is also a broader need to articulate and adopt a system-of-systems approach, which this documentation may help to nurture (GCOS, 2014, 2015, 2016). There undoubtedly exists broad agreement on many aspects of what constitutes measurement best practices and what networks should strive for. Nevertheless, there are significant challenges to its likely broad adoption which were recently highlighted by GCOS (GCOS, 2014) and are expanded upon here.
Action G13 of the GCOS Implementation Plan (GCOS, 2016) adopted by UNFCCC, which alludes to a capabilities based assessment of measurement assets which the present work may contribute to.
Perhaps the largest challenge is that currently a broad range of non-satellite measurement networks have been called “reference”, “baseline” or “comprehensive”, which, when assessed against the criteria detailed in Sect. 3, would instead fall within a different category. The lack of clarity historically regarding a system-of-systems architecture, taken together with fractured observational governance and support structures, has led to a varied use and adoption of network nomenclatures and practices both across and within Earth observation science disciplines. This means that what different subcommunities concerned with environmental measurements refer to as reference, baseline or indeed comprehensive network measurements is not always the same. Often it is not even remotely similar.
If a system-of-systems approach is to be broadly adopted, significant
further work is required to reconcile the disparate approaches to
network designations and to manage the transition to a more
transdisciplinary approach to network assignations. There are several
risks and/or challenges in any such transition:
National or international funding support for a measurement programme
may be tied to its present designation. There is a risk in enforcing
any change that the funding support for the programme is endangered. An
example is the ocean reference network, which is not a reference
network in the sense advocated here but is rather closer to baseline
capability. Nevertheless, this is still the best set of ocean
observations available, and risking its loss would be a significant
mistake. Users may use a measurement programme because of its current
designation and may get confused if measurement programmes are
reassigned or renamed without adequate consultation or justification. The observers undertaking the measurement programme may not fully
understand the implications if updates to protocols and/or practices
are required.
The other side to these concerns is that allowing the status quo to
continue means that users referring to, for example, a reference network
in the marine, atmospheric and composition communities (just by way
of example) may be comparing measurement programmes that widely
differ concerning their fundamental measurement characteristics and
qualities and, therefore, their suitability for a given application. The
status quo places the responsibility of understanding the measurement
systems and networks, on a system-by-system and even a ECV-by-ECV basis,
firmly on the end user. Experience shows that end users are,
understandably, unlikely to have either the time or the necessary
in-depth knowledge and/or expertise to fully understand the
distinctions that may exist between similarly named programmes and
assume, often incorrectly, that they are equivalent. This is a barrier
to the effective usage of existing EO capabilities by scientists,
policymakers and other end users and will continue to be so unless
and until a more holistic approach, such as that suggested herein, is
adopted.
Unfortunately there is no obvious mechanism for driving the adoption of a consistent nomenclature. The World Meteorological Organization and/or GCOS are the most obvious candidates in this context. However, many of the networks have limited or no involvement in WMO or via the National Meteorological Services. It is therefore unclear to what extent even WMO may be able to enforce such an approach.
It is clear that alongside adoption and designation of a system-of-systems framework, it is necessary to provide material to aid users to understand what the layers mean and to show real case examples of how they can be used. GAIA–CLIM will, through its work packages, provide case study examples in the domain area of satellite measurement characterisation. However, further examples in other domain areas and application areas are necessary, which will be beyond the remit of GAIA–CLIM. The MSMM shall be repeated in the new Horizon 2020 INTAROS project in 2018–2019 where its use shall expand to other domains and with an Arctic region focus. The new 2016 GCOS Implementation Plan (GCOS, 2016) has an action that alludes to the application of this or a modified version hereof to multiple domains (Fig. 8).
Even if the layer designations and criteria documented herein were adopted,
there would remain the challenge of ensuring linkages between the different
components of the global observing system to realise the benefits. This includes
aspects such as infrastructure co-location, intercomparison campaigns,
information sharing, training and development. Such interlinkages will
become both more obvious and more realisable if a system-of-systems
architecture approach and assessment is adopted. Some subsets of these aspects
that touch upon satellite calibration/validation are covered within the
regularly updated Gap Assessments and Impacts Document of GAIA–CLIM
(
The assessment of measurement maturity can only ever be used to decide which observations to utilise for what purpose. While measurement maturity assessments would permit users to rule out certain observations as being suitable, they cannot absolutely determine which of the remaining observations may be useful. In addition to measurement series maturity users must consider aspects such as spatio-temporal availability, measured parameters, data formats/availability and data policy. These aspects are demonstrably use-case specific and hence ill-suited to inclusion in the assessment approach detailed herein. Rather they must be considered in addition to the measurement maturity assessment results. A combination of measurement maturity assessment and these additional aspects may serve to highlight critical gaps in capabilities. For example, Figs. 6 and 7 highlight a paucity of available reference quality water vapour measurements outside North America and Europe which may limit our ability to characterise water-vapour-sensitive satellite instruments.
We have provided a proposed definition of observing system layers in a system-of-systems context and a means by which to assess, in a quantifiable and objective manner, demonstrable aspects of a given measurement series that help to place it into the appropriate layer. The assessment closely mirrors but is distinct from existing efforts to assess maturity of CDRs. In practice, the application to atmospheric ECVs will inform work within GAIA–CLIM in the creation of tools and products to be served via a virtual observatory facility of co-location match-ups between satellite measurements and selected non-satellite series that were assessed herein as sufficiently mature. The approach developed should be more broadly applicable to other domains and problems and if broadly adopted may have tangible benefits for data users and data providers alike. However, as this was a first attempt at such an exercise, there are undoubtedly potential improvements that could be made were it to be taken forwards. We hope that this paper provides a basis for further discussions and refinements.
The assessment results are made available at
Table summarising pertinent details of those networks considered in Sect. 4, listing those 49 networks reviewed within GAIA–CLIM Task 1.2 for which complete discovery metadata have been collected. The first column reports the measurement domain, the second the network acronym, the third, the network coverage, and the fourth includes the number of measured ECVs (repeated for columns 5 to 8). Those for which maturity assessments were performed and which are discussed from Sect. 4.1 onwards are italicised.
PWT led the production of the assessment guidance and drafting of this paper. FM led the assessment process. FM, JS, TO, BI, ACM and GP contributed to the drafting of the guidance for assessment. KK and MdM provided a review of this guidance. MR, ET, AA, MB, MdM all contributed to the assessment of the guidance. PWT, RD and CV provided reviews and suggestions on the assessment. All authors contributed to the drafting of the paper.
The authors declare that they have no conflict of interest.
This work was supported by the European Union's Horizon 2020 research and innovation programme project GAIA–CLIM, grant no. 640276. It reflects only the authors' views and the agency is not responsible for any use that may be made of the information it contains. This work benefitted from the input of D. Tan (formerly of ECMWF, currently unaffiliated) during early drafting stages. This work has been based upon the substantial work undertaken by CORE-CLIMAX and a number of precursor studies assessing data set maturity. Without these preceding efforts this work would not have been possible. Arnoud Apituley, Greg Bodeker, Barry Goodison, Mark Bourassa, and Ge Peng provided feedback based upon early drafts of GAIA–CLIM (2015) that served to improve the guidance. We gratefully acknowledge all the PIs and data managers of the measurements networks who facilitated our work to deal with the maturity matrix assessment of each of these components of the global observing system. In non-alphabetical order: Atsushi Shimizu (NIES), Masatomo Fujiwara (Hokkaido University), Michael Palecki (NOAA), Simone Lolli (NASA-JCET), Thierry Leblanc (NASA-JPL), Wolfgang Steinbrecht (DWD), James Hannigan (NCAR), Thomas Blumenstock (KIT), Nik Kaempfer (UBern), Monica Campanelli (ISAC-CNR), Anne Thompson (NOAA), Victor Estelles (CSIC), Ruud Dirksen (DWD), Adolfo Comeron (Univ. Politecnica de Catalunya), Doug Sisterson (DOE-ARM, US), Thomas Eck (NASA-GSFC). We also acknowledge the requirements of the BING maps add-in; its use is in agreement with the terms and conditions and privacy statement. Finally, we thank two anonymous reviewers for their comments on the discussion draft. Edited by: Lev Eppelbaum Reviewed by: two anonymous referees