<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GI</journal-id><journal-title-group>
    <journal-title>Geoscientific Instrumentation, Methods and Data Systems</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GI</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Instrum. Method. Data Syst.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">2193-0864</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gi-15-165-2026</article-id><title-group><article-title>A database-driven research data framework for integrating and processing high-dimensional geoscientific data</article-title><alt-title>A database-driven research data framework for geoscientific data</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Handy</surname><given-names>Dennis</given-names></name>
          <email>dhandy1@uni-koeln.de</email>
        <ext-link>https://orcid.org/0000-0002-8057-720X</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>van der Meij</surname><given-names>W. Marijn</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-8724-5120</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Zickel</surname><given-names>Mirijam</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-4473-1759</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Reimann</surname><given-names>Tony</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-9253-4418</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Institute of Geography, University of Cologne, 50923 Cologne, Germany</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Dennis Handy (dhandy1@uni-koeln.de)</corresp></author-notes><pub-date><day>20</day><month>May</month><year>2026</year></pub-date>
      
      <volume>15</volume>
      <issue>1</issue>
      <fpage>165</fpage><lpage>181</lpage>
      <history>
        <date date-type="received"><day>30</day><month>September</month><year>2025</year></date>
           <date date-type="rev-request"><day>7</day><month>October</month><year>2025</year></date>
           <date date-type="rev-recd"><day>22</day><month>January</month><year>2026</year></date>
           <date date-type="accepted"><day>13</day><month>April</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Dennis Handy et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gi.copernicus.org/articles/15/165/2026/gi-15-165-2026.html">This article is available from https://gi.copernicus.org/articles/15/165/2026/gi-15-165-2026.html</self-uri><self-uri xlink:href="https://gi.copernicus.org/articles/15/165/2026/gi-15-165-2026.pdf">The full text article is available as a PDF file from https://gi.copernicus.org/articles/15/165/2026/gi-15-165-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e104">This paper introduces a modular research data framework designed for geoscientific research across disciplinary boundaries. It is specifically designed to support small research projects, providing a bottom-up solution that empowers individual teams that need to adhere to strict data management requirements from funding bodies, but often lack the financial and human resources to do so. The framework supports the transformation of raw research data into scientific knowledge. It addresses critical challenges, such as the rapid increase in the volume, variety and complexity of geoscientific datasets, data heterogeneity, spatial complexity, and the need to comply with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. The framework uses a dual-component architecture. First, an Online Transaction Processing (OLTP) system features a user interface and a persistent relational database, ensuring accurate and consistent data storage when capturing and managing diverse geoscientific research data. Complementing this, an orchestration layer manages automated data pipelines to process the stored data and generate dynamic in-memory Online Analytical Processing (OLAP) databases that allow flexible, high-performance analysis. It is adaptable to evolving research requirements and supports various data types and methodological approaches, such as machine learning and deep learning, that place high demands on the data and their formats. A case study in Western Romania demonstrates the application of the data framework in an interdisciplinary geoarchaeological research project by processing and storing heterogeneous datasets, thereby reducing data management efforts, improving findability, replicability, and reproducibility, and streamlining the integration of high-dimensional data for small, interdisciplinary teams.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Universität zu Köln</funding-source>
<award-id>006</award-id>
</award-group>
<award-group id="gs2">
<funding-source>Deutsche Forschungsgemeinschaft</funding-source>
<award-id>436834905</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e116">Recent technological advancements have driven rapid growth in the volume, variety, and complexity of geoscience research data <xref ref-type="bibr" rid="bib1.bibx40" id="paren.1"/> that fuels new data-driven approaches to generate scientific knowledge <xref ref-type="bibr" rid="bib1.bibx11" id="paren.2"/> and requires a considerable investment in equipment and expertise, as it accumulates large and heterogeneous volumes of data stored in various formats and databases <xref ref-type="bibr" rid="bib1.bibx12" id="paren.3"/>. This trend coincides with the increasing, fundamental discussion about how scientific data should be published in order to ensure its reusability <xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx28" id="paren.4"><named-content content-type="pre">e.g.</named-content></xref>. The FAIR principles, findability, accessibility, interoperability, and reusability, which emerged from the FORCE11 community, aim to enhance the reuse of research data and highlighted the importance of effective data management. The authors stress the need for a cultural shift in research data management but also specifically highlight the importance of computational capabilities in data-rich research environments <xref ref-type="bibr" rid="bib1.bibx42" id="paren.5"/>. The growing complexity of research data and the increasing publication requirements pose new challenges, particularly for smaller projects that lack their own research data infrastructure. This results in a growing demand for straightforward methods of storing and processing data.</p>
      <p id="d2e136">While the enormous growth of data volumes and the fulfilment of the Fair Principles are not exclusive to the geosciences, geoscientific data and geoscientific research data management have some unique characteristics and challenges that require the development of tailored methods for storage and processing. As Geosciences analyse complex, coupled processes across different spatial and temporal scales, the heterogeneous nature of its research objects often leads to a high-dimensional parameter space; a high number of potentially interdependent variables <xref ref-type="bibr" rid="bib1.bibx6" id="paren.6"/>. The dimensionality and heterogeneity of geoscientific data requires significant computing resources, and requires multiple processing steps due to its complexity <xref ref-type="bibr" rid="bib1.bibx24" id="paren.7"/>. <italic>Spatial Dependence</italic>, <italic>Spatial Heterogeneity</italic> and <italic>Scale Dependence</italic> further challenge storage and analysis of spatial data. Understanding these spatial properties is essential, as they necessitate adapted methods and workflows throughout the entire data lifecycle: Spatial dependence refers to the autocorrelation of spatial data, requiring specialised methods to analyse spatial relationships, whereas spatial heterogeneity refers to non-stationary spatial patterns influenced by spatial anisotropy or overlapping processes across different scales <xref ref-type="bibr" rid="bib1.bibx29" id="paren.8"/>. Interpreting and sharing spatial data relies on the traceability of the coordinate reference system (CRS) for its location component. Without being able to reproduce the CRS, the data might be plotted in an incorrect location, or the accuracy, precision, and scale are misjudged, leading to misleading results. For this reason, in addition to conventional metadata describing the data, spatial data must also include explicit locational information <xref ref-type="bibr" rid="bib1.bibx41" id="paren.9"/>. Spatial data thus requires reflective methodological approaches to ensure meaningful analysis, such as spatial data workflows, where the transformation of raw geospatial inputs into scientifically interpretable outputs depends on addressing the unique properties of spatial data. This calls for systems that account for spatial complexity at every stage of the workflow.</p>
      <p id="d2e161">These trends and challenges in geoscientific research data management underscore the urgent need for structured approaches to data management, particularly in small and interdisciplinary research projects that must comply with FAIR principles and funding requirements <xref ref-type="bibr" rid="bib1.bibx7" id="paren.10"><named-content content-type="pre">e.g.</named-content></xref>. However, project-based funding models pose a significant challenge to establishing sustainable data management structures. Many valuable domain-specific data systems originating from research projects struggle to survive due to limited ongoing funding <xref ref-type="bibr" rid="bib1.bibx19" id="paren.11"/>. Though specialised approaches for data management exist for particular geoscientific subfields, an interdisciplinary approach has been missing, integrating research data from various areas <xref ref-type="bibr" rid="bib1.bibx30" id="paren.12"/>, as required in integrative disciplines such as geomorphological and geochronological research. While research groups, projects, and laboratories may implement backup procedures or repositories to prevent the physical loss of data, the risk of losing information regarding its existence <xref ref-type="bibr" rid="bib1.bibx12 bib1.bibx28" id="paren.13"/>, its usability and discoverability remain significant as the data's publication not merely implies its accessibility. Metadata might be incomplete or missing completely, links might be broken, or the information needed to make the data usable is missing <xref ref-type="bibr" rid="bib1.bibx38" id="paren.14"/>. The inability to make research data FAIR is more than just an inconvenience, but has a quantifiable financial cost. Non-standardised research data results in billions of euros in expenses annually <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx8" id="paren.15"/>.</p>
      <p id="d2e185">Databases partially address the aforementioned challenges but focus primarily on static data storage rather than the researchers' entire workflows, the data's provenance <xref ref-type="bibr" rid="bib1.bibx26" id="paren.16"/> and how the data evolves through its lifecycle, from generation and transformation to storage, analysis and reuse, throughout a research project. Therefore, we propose broadening the perspective of geoscientific research data management by modelling a framework for geoscientific research data encompassing the entire data lifecycle. In contrast to established methods, rather than considering FAIR principles from the perspective of publication, we consider these principles proactively throughout the entire data lifecycle. We aim to: <list list-type="order"><list-item>
      <p id="d2e193">Address challenges arising from the rapid growth in the volume and complexity of data. To this, we provide a standardised approach to store scientific data throughout a research project's lifecycle by implementing data storage systems specifically designed for geoscientific data and providing online interfaces to access the data. By requiring comprehensive metadata, we are aiming to ensure the findability and reusability for future use.</p></list-item><list-item>
      <p id="d2e197">Address the challenges associated with processing increasingly intricate data workflows, we introduce the automated orchestration of data pipelines. These pipelines formalise recurring data workflows in code, thereby ensuring reproducibility and scalability while minimising errors.</p></list-item></list> This requires close consideration of the specific characteristics of geoscientific data, which pose pronounced challenges due to their spatial, temporal, and computational complexity, as well as the understanding of requirements specific to geoscientific research. To this end, we first identify the user requirements (Sect. <xref ref-type="sec" rid="Ch1.S2"/>). Then, we will describe how we used these requirements to design and implement the framework (Sect. <xref ref-type="sec" rid="Ch1.S3"/>). We test and demonstrate the practical application of our framework using a geoarchaeological project in western Romania (Sect. <xref ref-type="sec" rid="Ch1.S4"/>). Finally, we discuss how the framework addresses the outlined challenges and provide an outlook on future developments (Sect. <xref ref-type="sec" rid="Ch1.S5"/>).</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Challenges and requirements for FAIR geoscientific Data</title>
      <p id="d2e218">A software framework designed to support researchers should not only reflect theoretical concepts of scientific practices, such as the FAIR principles, but should also consider financial, technical, organisational and practical limitations that researchers face in their everyday work. Therefore, to identify the key challenges and requirements of research data management in the geosciences, we gathered insights on existing challenges, current best practices, and potential solutions through a literature review. Our review reveals that the identified challenges are not specific to the investigated environment but reflect systemic issues in geoscientific data management.</p>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Reusability, interoperability and preservation</title>
      <p id="d2e228">With geoscientific data being inherently diverse <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx30" id="paren.17"/>, the recent growth in data volume and complexity <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx40" id="paren.18"><named-content content-type="pre">e.g.</named-content></xref>, including large multi-dimensional and spatio-temporal datasets <xref ref-type="bibr" rid="bib1.bibx6 bib1.bibx24" id="paren.19"/>, poses significant challenges for research data management and computational methods <xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx25" id="paren.20"/>. For example, they are often stored in incompatible or hard-to-access locations, such as personal computers or local databases. This phenomenon, called <italic>data silos</italic>, impedes the discovery and reusability of data <xref ref-type="bibr" rid="bib1.bibx19" id="paren.21"/>. This issue of isolated data applies even to formal database systems. The often isolated subject-specific storage makes cross-domain evaluation difficult, and the complexity of data models causes users to develop error-prone workarounds <xref ref-type="bibr" rid="bib1.bibx18" id="paren.22"/>. Geoscientific data is often stored in different, often isolated formats (GIS, databases, specialised software, proprietary formats), leading to redundancies, integration problems, and data loss. The heterogeneity of standards and tools makes it difficult to perform a holistic analysis, even though combining different data sources could provide valuable insights. A central, networked solution is still lacking. Once projects end or employees change jobs, valuable data is often lost. Thus, data storages become <italic>data cemeteries</italic> <xref ref-type="bibr" rid="bib1.bibx12" id="paren.23"/>.</p>
      <p id="d2e261">In contrast, the trend towards numerous general-purpose data repositories, while offering availability, can exacerbate discovery and reusability issues because they often don't integrate or harmonise deposited data <xref ref-type="bibr" rid="bib1.bibx42" id="paren.24"/>. A persistent challenge is the absence of common global standards for data sharing and metadata <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx30" id="paren.25"/>. Many datasets lack sufficient metadata, use inconsistent spatial reference fields <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx41" id="paren.26"/> and semantic differences between datasets create barriers to interoperability. This poses a significant challenge to the fulfilment of the FAIR principles <xref ref-type="bibr" rid="bib1.bibx22" id="paren.27"/>. The quality of data varies, and scientists expend effort to assess whether it is relevant, understandable and reliable <xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx28" id="paren.28"/>. The lack of information about research methods, instrumentation and provenance (i.e. origin and processes) hinders data reuse <xref ref-type="bibr" rid="bib1.bibx28 bib1.bibx30" id="paren.29"/>. The lack of interdisciplinary standardisation further challenges true interoperability <xref ref-type="bibr" rid="bib1.bibx30" id="paren.30"><named-content content-type="pre">e.g.</named-content></xref>. For instance, while the United States Department of Agriculture (USDA) considers grain sizes from 0.002 to 0.05 mm as silt <xref ref-type="bibr" rid="bib1.bibx35" id="paren.31"/>, the World Reference Base for Soil Resources (WRB) draws the boundaries at  0.002 and 0.063 mm <xref ref-type="bibr" rid="bib1.bibx17" id="paren.32"/>.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Workflows complexity</title>
      <p id="d2e302">Complex data workflows have become standard, driven by a significant shift in the computational landscape due to the growing prominence of data-intensive sciences <xref ref-type="bibr" rid="bib1.bibx27" id="paren.33"/>. However, due to the application of recent computational methods, such as machine learning, data workflows are even increasing in volume, velocity, and complexity <xref ref-type="bibr" rid="bib1.bibx37" id="paren.34"/>. Even for domain experts, the transition from basic science to interpretation is complex. Throughout the workflow, strategic decisions include not only the definition of a model's structure and assumptions, but also the implementation of its code, the selection of parameter values, and the generation of outputs <xref ref-type="bibr" rid="bib1.bibx26" id="paren.35"/>.</p>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Institutional barriers</title>
      <p id="d2e322">Scientists often rely on adapted tools and proprietary data formats, meaning that enforcing a single, uniform system for research data management will not work in a heterogeneous scientific environment <xref ref-type="bibr" rid="bib1.bibx10" id="paren.36"/>. Data sharing is often hindered by perceived burdens of documentation, lack of time, fear of data loss, privacy concerns, legal issues <xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx38" id="paren.37"/>, and traditional academic incentives that focus primarily on publishing papers rather than properly curating and sharing datasets <xref ref-type="bibr" rid="bib1.bibx40" id="paren.38"/>. Furthermore, scientists sometimes abandon traditional, formally structured databases in favour of more ad hoc solutions, such as spreadsheets <xref ref-type="bibr" rid="bib1.bibx39" id="paren.39"/>.</p>
      <p id="d2e337">The analysis of actual scientific practice reveals that a large proportion of the identified challenges are already being discussed. However, a crucial challenge is that the implementation of a framework depends on the highly individual nature of science, including the specific combination of methods, available laboratory equipment and the state of the IT infrastructure. These challenges highlight the need for a modular, discipline-specific framework for small teams that balances standardisation for interoperability with flexibility for disciplinary requirements. Such a framework should support provenance tracking and metadata generation to minimise manual effort, and integrate FAIR principles into daily workflows without disrupting existing practices.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Design and implementation</title>
      <p id="d2e349">Our system architecture reflects the entire life cycle of research data within a project, from initial acquisition during fieldwork and laboratory analyses to final evaluation. To this end, it prioritises continuous data processing by implementing an orchestration framework for data pipelines rather than focusing solely on database storage. These pipelines consider the ingestion of raw data, its processing, transformation, storage and – eventually – automated retrieval and analysis (Fig. <xref ref-type="fig" rid="F1"/>). The framework consists of two separate but linked modules: an online transaction processing (OLTP) module based on a database implementing a normalised, relational data model (Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>, Fig. <xref ref-type="fig" rid="F2"/>a), and derived online analytical processing (OLAP) databases implementing a denormalised data model called star schema (Sect. <xref ref-type="sec" rid="Ch1.S3.SS2"/>, Fig. <xref ref-type="fig" rid="F2"/>b). However, users are shielded from the underlying technical complexity and are provided with a convenient user interface they can use to conveniently create, retrieve, update and delete objects in the relational database. Following the initial data capture, an ETL (extract, transform, load) pipeline is triggered. This critical process involves extracting data from the relational database, transforming it into a suitable format for analysis and loading it into an OLAP database. This, in turn, forms the basis for analytical pipelines that result in either the user interface or the file system. It should be noted that the implementation of the framework should adhere to the specific conditions set by the university's IT infrastructure.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e364">The system architecture illustrates the interaction between user, data storage and data pipelines within the framework. Users initiate the process by uploading data via the user interface or the file system (yellow boxes). The data is processed, validated and linked in the online transaction processing database (OLTP, blue box). An extract, transform, load (ETL) pipeline extract data from the OLTP database, transform it and load it into an online analytical processing database (OLAP, red box). Post-ETL pipelines automatically generate analyses, data visualisations, or specific data products, which are then made available through the user interface (e.g., dashboards) or exported to the file system (e.g., CSV, PDF reports).</p></caption>
        <graphic xlink:href="https://gi.copernicus.org/articles/15/165/2026/gi-15-165-2026-f01.png"/>

      </fig>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e375">Simplified, conceptual models of <bold>(a)</bold> a relational model and <bold>(b)</bold> a star schema in direct comparison. The normalised relational schema <bold>(a)</bold> organises data across logically linked tables to minimise redundancy and ensure data integrity. To clarify their function within this diagram, we use the illustrative terms <italic>Core entities</italic> for the central objects of research and <italic>Context entities</italic> for the descriptive metadata.  Although this structure makes analytical queries more complex, it is ideal for transactional operations and ensures consistency in operational workflows. By contrast, the central fact table in the star schema <bold>(b)</bold> stores quantitative measurements and is directly linked to dimension tables that contain descriptive information. This design enables efficient analytical queries by denormalising the data structure, improving performance for aggregations and slicing operations.</p></caption>
        <graphic xlink:href="https://gi.copernicus.org/articles/15/165/2026/gi-15-165-2026-f02.png"/>

      </fig>

<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Online transaction processing (OLTP) module</title>
      <p id="d2e411">The Online Transaction Processing (OLTP) module consists of two components: a relational database, serving as the central repository and single source of truth, and a user interface that acts as an intermediary between the database and users, abstracting the complexity of the data model and the technical implementation. The module was implemented with the Python framework Django and the relational database management system PostgreSQL. The user interface was implemented with <italic>Django Unfold</italic>. Our OLTP module is tailored to the needs of geomorphological, geoarchaeological and geochronological research, but can be easily adapted to other systems and requirements.</p>
<sec id="Ch1.S3.SS1.SSS1">
  <label>3.1.1</label><title>Database</title>
      <p id="d2e424">The core of the OLTP module is a normalised relational database that adheres to the fundamental principles of relational theory. This approach was chosen to ensure data integrity, minimise redundancy, and enable set-based queries throughout the entire lifecycle of geoscientific research data <xref ref-type="bibr" rid="bib1.bibx2 bib1.bibx18" id="paren.40"><named-content content-type="pre">e.g.,</named-content></xref>. The conceptual starting point is the model of a modular geoscience laboratory database developed by <xref ref-type="bibr" rid="bib1.bibx30" id="text.41"/>, but we go beyond their approach in two fundamental aspects. Firstly, we expand the data model's focus from a pure laboratory database to the entire life cycle of research data, which, in our domain, usually begins with field data collection. Secondly, and this is the core of our contribution, we adopt a stricter relational approach to data storage. While we also archive raw data and refer to it from the database, as recommended by Nordsiek and Halisch, we additionally store the processed and structured data directly in the database in accordance with relational theory. This enables complex queries on the actual measured values, not just on metadata that refers to files.</p>
      <p id="d2e435">This model is structured around the sample as a physical specimen. It serves as the central table, from which links are established to related entity types, such as locations, (stratigraphic) layers, various analyses, and the overall project context. This relational structure creates a network of context-related information, ensuring that no entity can exist in isolation. Every piece of data, from field observations to laboratory measurements, can be traced back to its origin and unambiguously linked to all other relevant information. If the sample has already been published, providing a reference to the publication or its <italic>International Generic Sample number</italic> (IGSN) <fn id="Ch1.Footn1"><p id="d2e441">formerly <italic>International Geo Sample Number</italic> <xref ref-type="bibr" rid="bib1.bibx20 bib1.bibx5" id="paren.42"/>.</p></fn> directly links the internal context to the public record.</p>
      <p id="d2e451">Central and recurring methods are directly mapped in the model by separate entity types, such as grain sizes. In addition, it implements a flexible design through a generic measurement table, allowing the addition of similar automated processing procedures for other analysis methods in the future without changing the core data model. To this end, the <italic>GenericMeasurement</italic> table references the <italic>Parameter</italic> (measured variable, designation, physical unit, and minimum and maximum limits) and the <italic>Method</italic> tables.</p>
      <p id="d2e463">The model distinguishes between internal and external data sources to ensure transparent data provenance and supports secure collaboration by using role-based access controls to manage pre-publication data. A detailed description of the data model, including all entities and analysis methods, is provided in the Appendix.</p>
</sec>
<sec id="Ch1.S3.SS1.SSS2">
  <label>3.1.2</label><title>Initial data processing</title>
      <p id="d2e474">Furthermore, the OLTP module anticipates the data pipelines (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>) by integrating initial data processing steps at data entry, where appropriate. As a proof of concept for this integrated approach, the system can currently automatically parse raw data from our laser diffractometer for particle size analysis. The user interface provides a means for uploading, after which the system extracts the critical data, reclassifies it into grain-size classes, and stores the results in respective database fields. When called up via the web interface, the system automatically generates a particle size distribution plot, providing users with direct visual feedback.</p>
</sec>
<sec id="Ch1.S3.SS1.SSS3">
  <label>3.1.3</label><title>User interface</title>
      <p id="d2e487">The web-based user interface (Fig. <xref ref-type="fig" rid="F3"/>) ensures data quality and integrity at the point of entry by translating the database's strict relational rules into an intuitive workflow, providing convenient support for managing geoscientific data. The interface enables data capture, validation, and exploration through standardised forms for sample and analysis data (e.g., grain size, luminescence dating) and file upload functionality for raw data (e.g., data tables from laboratory devices). Data entry forms follow international standards (e.g., World Reference Base for Soil Resources) and laboratory-specific templates (e.g., for luminescence dating parameters) and include dropdown menus for controlled vocabularies, mandatory fields and validation rules. Users can assign a sample to a location, link an analysis (e.g., luminescence age) to a sample and its laboratory dataset and document fieldwork contexts (e.g., campaign, stratigraphic layer). The technical connection between the entities is directly checked and established. Moreover, it supports the direct export of filtered datasets in standard file formats, such as CSV, JSON, or Excel tables.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e494">An example of the user interface implemented with Django Unfold that show <bold>(a)</bold> an overview of all available measurements from different projects that are stored in the database, grouped in sedimentological and geochronological measurements, <bold>(b)</bold> the filtering feature to extract specific data from the data base and <bold>(c)</bold> the resulting data from the query, in this case geochronological data from the Toboliu project.</p></caption>
            <graphic xlink:href="https://gi.copernicus.org/articles/15/165/2026/gi-15-165-2026-f03.png"/>

          </fig>

</sec>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Online analytical processing (OLAP)</title>
      <p id="d2e521">Complex queries, such as those involving joins across multiple tables, can degrade performance as datasets grow in size and structural complexity. To address this issue, the framework generates on-demand analytical databases that use a multidimensional star schema. This consolidates normalised tables into simpler structures with controlled redundancy to optimise query performance and data integration <xref ref-type="bibr" rid="bib1.bibx1 bib1.bibx18" id="paren.43"/>. As shown in Fig. <xref ref-type="fig" rid="F2"/>b, the central fact table of the star schema stores the core quantitative measurements of the research, which are contextualised by descriptive dimension tables (e.g., campaign, location, project), providing a spatial and conceptual framework. Users can navigate from aggregated data to detailed sample records, including geographic coordinates, time periods and measurement methods. This analytical layer supports filtering by criteria such as location, time or analytical method. For instance, a query could efficiently filter across multiple dimensions, such as location, time, or analytical method, to aggregate or extract specific measurements, such as all geochronological ages measured in the luminescence laboratory over the past decade. These dynamic databases are implemented using DuckDB, an open-source in-process database designed for analytical workloads <xref ref-type="bibr" rid="bib1.bibx32" id="paren.44"/>. It is closely integrated with Dagster for data orchestration (<xref ref-type="bibr" rid="bib1.bibx31" id="altparen.45"/>, see Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>).</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Data orchestration</title>
      <p id="d2e545">The increasing complexity of geoscientific data workflows, characterised by heterogeneous data sources, multiple processing steps, and interdisciplinary requirements (see Sect. <xref ref-type="sec" rid="Ch1.S2"/>), demands automated solutions for data orchestration. This concept encompasses the deployment, scheduling, and execution of a workflow's computational steps <xref ref-type="bibr" rid="bib1.bibx37" id="paren.46"/>. Or in technical terms, a data orchestrator enables the “construction, operation, and observation of [replicable] data pipelines” <xref ref-type="bibr" rid="bib1.bibx31" id="paren.47"/>. Data pipelines are automated, interconnected processes designed to manage dynamic data flows from source to destination. They extract, transform, validate and combine data, with each stage's output feeding the next. Pipelines can handle continuous, intermittent or batch data and facilitate real-time monitoring, error detection and correction. Their applications range from data storage to visualisation or machine learning (ML) models <xref ref-type="bibr" rid="bib1.bibx33" id="paren.48"/>. This enables the replicable automation of digital processing steps within complex scientific workflows.</p>
      <p id="d2e559">The framework implements a comprehensive and generally applicable architectural pattern by dividing data workflow automation into two levels. The OLTP module already contains basic transactional data pipelines for automating transactional operations, processing and transforming data during storage (e.g., via uploads through the user interface) and retrieval (e.g., for queries or exports). It is a self-contained component that, by design, is independent of the specific institutional IT infrastructure (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS1.SSS2"/>). However, effective data orchestration must be understood in the context of the specific IT environment it operates within. Automating tasks relies on local infrastructure, including connecting to specific laboratory instruments, accessing locally stored files, or saving results in the university's backup systems. Therefore, the framework introduces a second layer specifically to manage more complex and computationally intensive tasks that are inherently dependent on that infrastructure. This layer is implemented using the open-source orchestrator Dagster <xref ref-type="bibr" rid="bib1.bibx31" id="paren.49"/>.</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e569">A simplified extract, transform, load (ETL) pipeline that extracts data from the relational database or external data, transforms it through aggregation, validation, filtering, and cleaning and eventually loads it into the denormalised OLAP database for further use.</p></caption>
          <graphic xlink:href="https://gi.copernicus.org/articles/15/165/2026/gi-15-165-2026-f04.png"/>

        </fig>

      <p id="d2e579">Although the primary workflow involves managing data flow from OLTP to OLAP, the orchestrator's capabilities extend beyond this one-way process. For instance, ingestion pipelines can monitor a file system, automatically process files and feed the data into the relational database (Fig. <xref ref-type="fig" rid="F1"/>). ETL (extract, transform, load) pipelines then extract data from the relational database (and other data sources, if applicable), transform it (e.g. aggregation and validation) and load it into an OLAP database (Fig. <xref ref-type="fig" rid="F4"/>). Analytical pipelines transform processed data from these into analysis-ready formats, such as feature-engineered tables (e.g., for machine learning), normalised matrices for statistical analysis (e.g., PCA), and curated datasets for visualisation (e.g., depth plots, grain size plots), or completely automate analysis.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Case Study: Geoarchaeological data from Toboliu, Romania</title>
      <p id="d2e595">The DFG-funded archaeological research project “Living Together or Apart?” investigates a Bronze Age Tell settlement near Toboliu village in western Romania, focusing on its chronological and spatial development, and social organisation <xref ref-type="bibr" rid="bib1.bibx13" id="paren.50"/>. In addition to geoarchaeological analyses of the Bronze Age site, the project focused on landscape evolution to trace back prehistoric human-landscape interaction in the study area. Situated in the eastern Carpathian Basin, the investigated area has a fully humid temperate climate (Cfb) but already in close spatial proximity to a fully humid snow climate with warm summers  (Dfb) <xref ref-type="bibr" rid="bib1.bibx21" id="paren.51"/>. The project investigated the complex stratigraphy of loess derivates and historical environmental conditions, including the presumed widespread presence of wetlands, using interdisciplinary methods. These methods comprised satellite imagery analysis and geoscientific drilling at 15 sites, which produced data from sedimentological, geochemical, palynological, and micromorphological analyses, as well as radiocarbon and luminescence dating.</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Research data management challenges in Toboliu</title>
      <p id="d2e612">Managing the heterogeneous and high-dimensional research data from the Toboliu project presents various challenges, related to data volume, complexity, heterogeneity, institutional barriers, while adhering to fulfil the research data management guidelines of the DFG.
          <def-list>
            <def-item><term>Volume and complexity</term><def>

      <p id="d2e621">The high volume and heterogeneity of geoscientific data generated during fieldwork and laboratory analysis exceed the capacity for effective analysis, especially given the limited number of personnel available. In the Toboliu project, as in many small projects, managing the entire data lifecycle, from collection and preprocessing to analysis and interpretation, across multiple subdisciplines, including sediment analysis, geochronology, and palynology, lies with a doctoral student. Given the limited personnel resources and the sheer quantity of data, it is impossible to process and analyse the complete dataset in detail. Instead, researchers prioritise subdatasets based on preliminary trends or representative drilling cores to focus their efforts efficiently. However, a number of challenges complicate the exploratory data analysis and pattern identification required for this.</p>
            </def></def-item>
            <def-item><term>Heterogeneity and fragmentation</term><def>

      <p id="d2e630">The interdisciplinary nature of the project results in fragmented data spread across numerous files and storage systems, making it prone to discrepancies. Raw data from instruments like a tacheometer for measuring locations might be converted into shapefiles for GIS, while the related documentation of a drilling location is documented in a field diary, a paper form or an Excel-sheet. Potential corrections need to be made in all files aross the different storage media. Discrepancies arise if values are not changed consistently across related files, which makes it more difficult to identify errors. This proved particularly problematic when pre-processing raw laboratory results, as it was necessary to reconcile the data with the logically related metadata.</p>
            </def></def-item>
            <def-item><term>Complexity of workflows</term><def>

      <p id="d2e639">The large amount of high-dimensional data, combined with a multi-method approach, means that the data must be constantly restructured and transformed to meet the specific requirements of the various methods. The introduction of new methods or the correction of values in the existing data set leads to an enormous amount of effort in re-executing entire workflows.</p>
            </def></def-item>
            <def-item><term>Long-term data storage</term><def>

      <p id="d2e648">The Toboliu dataset faces a critical long-term challenge: while data from selected master cores will be published to address the project's research questions, a substantial portion of the collected field and laboratory data helps to overview patterns in the landscape or stratigraphy but proved incidental to the immediate goals. Although these data are physically stored, they remain only partially processed, documented, and unpublished. This causes indirect data loss: The data exists but is neither findable nor usable because it lacks contextual metadata or is only partially processed.</p>
            </def></def-item>
          </def-list></p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Application of the framework</title>
      <p id="d2e661">The aforementioned challenges were identified in an early stage of the Toboliu project and were actively addressed throughout fieldwork, laboratory measurements and data analysis using our geoscientific data framework.</p>
<sec id="Ch1.S4.SS2.SSS1">
  <label>4.2.1</label><title>Applying the OLTP module</title>
      <p id="d2e671">The web-based user interface had three critical functions in the Toboliu project: (i) structured digitisation of field data to replace paper records with digital formats, (ii) automate validation and standardisation during data entry, and (iii) enforce consistency of data relationships: Field notes typically recorded relationships between samples and their contextual metadata (e.g. locations, stratigraphic layers) as free-text annotations (e.g. Location <inline-formula><mml:math id="M1" display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula>, Layer <inline-formula><mml:math id="M2" display="inline"><mml:mi>Z</mml:mi></mml:math></inline-formula>). However, inconsistent notation among project partners (e.g. different abbreviations for locations or layers) introduced the risk of ambiguous or conflicting identifiers. Upon database entry, these informal references were systematically converted into unambiguous, machine-readable relationships through the use of unique sample identifiers and automated validation, which ensured the existence of the referenced entities and consistency. This process eliminated ambiguities while preserving the original contextual information in a queryable, FAIR-compliant format.</p>
      <p id="d2e688">Referential integrity and machine-readable relationships are required to flexibly filter entities based on their direct and indirect relationships. For instance, all grain size measurements are uniquely assigned to a sample. As the sample is also assigned to a location, it is possible to filter the measurements by location, campaign or project (Fig. <xref ref-type="fig" rid="F5"/>). Since the interface returns key parameters such as the “sample concentration” of a grain size measurement, i.e. if the sample's concentration was within the target range when measured, it helps users navigate the constantly evolving data directly, assessing the progress of the analysis and monitoring the data quality.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e695">An excerpt from the database showing grain size measurements in the user interface. In addition to the location and project assignment, the color of the measured quantity (sample concentration) indicates wether the value was within the accepted range for this property. Referential integrity and machine-readable relationships allow users to filter entities flexibly based on their direct and indirect relationships. For example, an analysis can be filtered by the location, campaign or project of its sample.</p></caption>
            <graphic xlink:href="https://gi.copernicus.org/articles/15/165/2026/gi-15-165-2026-f05.png"/>

          </fig>

</sec>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Applying OLAP</title>
      <p id="d2e714">An OLAP database, specifically its denormalised star schema, played a key role in further simplifying access to the vast and complex dataset. The schema enabled us to derive sub-datasets quickly and efficiently within a structure tailored to the requirements for a specific analysis. This allowed for significantly more flexibility in the project's exploratory phase, as manual data integration and transformation were almost entirely eliminated. While creating a view of the grain size measurements in the relational database still required several joins, the star schema in OLAP allowed the query to be reduced to a simple select statement. While not all workflows were fully automated (Sect. <xref ref-type="sec" rid="Ch1.S4.SS4"/>), this approach still accelerated critical analyses, enabling faster iteration and deeper insights.</p>
      <p id="d2e719">A key advantage was that derived data sets for further analysis were not generated ad hoc, but rather had a clearly defined and reproducible data structure. This meant that sub-records could be regenerated as required, even when the underlying data evolved due to new measurements or corrections. This decoupling of processes meant that the researcher could begin analyses, such as writing and testing code, while additional laboratory data was still being generated. Consequently, the project timeline was significantly shortened and iterative improvements became possible without delay.</p>
</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>Applying data pipelines</title>
      <p id="d2e730">Data pipelines were specifically designed and implemented to automate complex, recurring data workflows in code. This ensured reproducibility, minimised manual errors and was crucial given the project's specific challenges.
          <def-list>
            <def-item><term>Ingestion pipelines</term><def>

      <p id="d2e739">While the OLTP already automated the processing of laser diffraction raw data, custom pipelines extended the system's capabilities by directly ingesting raw measurement files from various laboratory devices, such as XRF spectrometers used for geochemical analyses, into the relational database. The focus of these pipelines was on capturing, parsing and storing raw data with minimal transformation, thereby ensuring full traceability and data integrity from the source. Supporting a variety of input formats (e.g. CSV and proprietary binary files) enabled the seamless integration of heterogeneous laboratory datasets.</p>
            </def></def-item>
            <def-item><term>ETL pipelines</term><def>

      <p id="d2e748">ETL pipelines extracted data from the relational database and applied transformations for data handling and validation, such as unit normalisation, outlier detection and referential integrity checks, before loading the processed data into an OLAP database. This step was crucial in preparing the datasets for complex analytical queries and ad hoc exploration by structuring the data into dimensions and facts optimised for OLAP operations.</p>
            </def></def-item>
            <def-item><term>Post-ETL pipelines</term><def>

      <p id="d2e757">Post-ETL pipelines processed the data from the OLAP database further to generate analysis-ready datasets, including feature-engineered tables and normalised matrices. These pipelines not only enabled the use of advanced analytical techniques but also automated full analytical workflows. Analyses such as texture classification, principal component analysis (PCA), cluster analysis, and interactive visualisations were automated directly as data pipelines within the data orchestration, decoupling data management from data analysis. For instance, K-means clustering was applied to geochemical fingerprints to identify distinct stratigraphic layers or flag anomalies in sample sequences. Through continuous data integration and iterative model retraining, these pipelines gradually improved the accuracy of their analyses, revealing emerging patterns, such as spatial correlations in sediment layer sequences and chronostratigraphic trends in geochemical signatures.</p>
            </def></def-item>
          </def-list>
          This dynamic process empowered data-driven research decisions from the project's earliest phases, such as targeted prioritisation of laboratory analyses, ensuring that insights became more precise and reliable as the dataset evolved.</p>
</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Discussion</title>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>Positioning the framework within the research data ecosystem</title>
      <p id="d2e780">Databases have been well-established technologies and applications in the geosciences for decades. They are used as laboratory information systems, repositories, and data catalogues. In contrast to geoscience laboratory information systems such as AusGeochem, StraboSpot, and Sparrow, repositories and data catalogues are mainly used for publishing or compiling published data <xref ref-type="bibr" rid="bib1.bibx19" id="paren.52"/>. However, scientific practice shows that, although scientists apply relational principles, for pragmatic reasons, they resort to ad hoc solutions that are not technically designed as databases. These primarily include spreadsheets or collections of text files <xref ref-type="bibr" rid="bib1.bibx39" id="paren.53"/>. Thus, existing technologies and actual scientific practice are not always capable of addressing contemporary challenges in research data management holistically, as they focus on isolated stages of the data lifecycle. This reality reveals a critical blind spot between large, formal infrastructures and the dynamic, iterative nature of local research data management.</p>
      <p id="d2e789">While large geoscientific projects often have designated database systems (e.g., <xref ref-type="bibr" rid="bib1.bibx43" id="altparen.54"/>), they primarily aim to archive and publish project-related data for long-term preservation. Though they might also provide an integrated database and infrastructure to facilitate and support research within these projects <xref ref-type="bibr" rid="bib1.bibx4" id="paren.55"/>, there remains a blind spot between large inter-organisational infrastructures and local research data management. Recent approaches, such as LinkAhead, adopt a more agile, holistic perspective on research data management that encompasses the entire research data lifecycle. It achieves this by employing a flexible data model that enables dynamic linking of all research entities (e.g., files, samples, processes) to track their provenance and relationships <xref ref-type="bibr" rid="bib1.bibx16" id="paren.56"/>. While this strategy provides flexibility for mapping the complex web of the research process, our framework deliberately adopts a different philosophy. It prioritises data integrity and analytical performance. Instead of a flexible, linkage-focused model, our framework is built on a strict relational OLTP database that enforces consistency at the point of entry via a predefined, yet extensible, schema. This initial investment in structure is a prerequisite for creating performant, on-demand analytical databases and ensures a consistent foundation for quantitative analysis with recent demanding computational methods, such as Machine Learning and Deep Learning. Our framework thus fits seamlessly into the existing research data management ecosystem and closes the gap between structured local archiving, high-performance and transparent analysis, and preparation for ingestion into large, established repositories or scientific workflow systems.</p>
</sec>
<sec id="Ch1.S5.SS2">
  <label>5.2</label><title>Addressing challenges in data management</title>
      <p id="d2e809">Through its holistic approach, which builds on the positioning outlined in the previous section, our framework addresses a wide range of contemporary key challenges in the management of research data, including data interoperability, reusability and preservation, data heterogeneity, and workflow complexity, as illustrated in the Toboliu project.</p>
<sec id="Ch1.S5.SS2.SSS1">
  <label>5.2.1</label><title>Interoperability and reusability</title>
      <p id="d2e819">Interoperability is “the ability of data or tools from non-cooperating resources to integrate or work together with minimal effort” <xref ref-type="bibr" rid="bib1.bibx42" id="paren.57"/>. The challenge involves integrating independently designed information spaces to enable consistent analysis, considering various levels of interoperability: Syntactic interoperability guarantees that systems can agree on shared syntactic formats for processing symbols <xref ref-type="bibr" rid="bib1.bibx14" id="paren.58"/>, while semantic interoperability refers to the consistent understanding of the information's context and meaning <xref ref-type="bibr" rid="bib1.bibx36" id="paren.59"/>. However, as described above, researchers often resort to local ad hoc solutions, such as individual spreadsheets and text files <xref ref-type="bibr" rid="bib1.bibx39" id="paren.60"/>. Furthermore, a significant limitation of established data management systems is that scientific data is stored in file systems that are separate from the metadata, which is stored in a database. This physical separation limits the ability to automatically access, archive, analyse and mine scientific data <xref ref-type="bibr" rid="bib1.bibx24" id="paren.61"/>. Accordingly, interoperability often fails due to challenges that precede semantic interoperability. Our proposed framework establishes the fundamental prerequisites for semantic interoperability by relying on mature, architecturally coherent approaches. By overcoming the structural separation of data and metadata from the ground up, a sustainable, future-proof foundation is created on which true semantic interoperability can be built.</p>
      <p id="d2e837">This leads to the framework's central objective: It enhances reusability by optimising the entire research data management process to ensure internal interoperability, thereby facilitating the combination of data and its analysis. It offers an exhaustive h context for each data point by capturing the entire research data lifecycle, from generation and transformation to storage, analysis, and reuse. A core strength lies in metadata validation, which minimises manual effort and is crucial for ensuring the long-term reusability of datasets. The FAIR principles advocate for machine-actionability and reusability for both humans and machines, relying on persistent identifiers and standardised metadata <xref ref-type="bibr" rid="bib1.bibx42" id="paren.62"/>. Our framework's use of machine-readable standards and detailed, automated metadata generation aligns with these foundational tenets. Generating metadata during the research process, rather than relying on post hoc documentation, marks a substantial step towards making reproducibility the norm rather than a difficult task. By automatically documenting detailed provenance through formalising data storage and workflows in code, our framework aligns with best practices, such as the FAIR Data Pipeline <xref ref-type="bibr" rid="bib1.bibx26" id="paren.63"/>. In our case study, this included creating pipelines adapted for specific laboratory devices and computational methods. While this investment improved all aspects of data handling, it also limited its ease of application in other environments. While this trade-off was justified by these enhancements, we recognise the need for a technical and semantic connection to external systems to further improve the reusability of the data we process and the replicability of the pipelines we apply. For publishing, the research context, including data and metadata, could be packaged into Research Object Crates (RO-Crates) to enable machine-readability <xref ref-type="bibr" rid="bib1.bibx34" id="paren.64"/>. Specific extensions such as <italic>Workflow Run RO-Crate</italic> record the provenance of computing workflows in detail <xref ref-type="bibr" rid="bib1.bibx23" id="paren.65"/>. This approach enables the creation and use of FAIR research archives <xref ref-type="bibr" rid="bib1.bibx34" id="paren.66"/>. While these robust technical frameworks handle the methods and structures of data exchange effectively, they can not address the underlying meaning. Consequently, significant semantic differences in research require community-led initiatives to develop and align semantic schemas <xref ref-type="bibr" rid="bib1.bibx22 bib1.bibx19" id="paren.67"/>.</p>
</sec>
<sec id="Ch1.S5.SS2.SSS2">
  <label>5.2.2</label><title>Workflow complexity</title>
      <p id="d2e870">Our framework addresses the complexity of geoscientific workflows by modelling the entire research data lifecycle, including data storage and data flow. At a technical level, data orchestration is implemented using the Dagster framework to automate Python pipelines that integrate data storage, processing and analysis. This approach was chosen to provide researchers with the performance and flexibility of a full programming language, a prerequisite for many machine learning algorithms and data-intensive tasks. To combine this flexibility and performance with standardised portability, a future extension could enable the export of these pipeline definitions to the Common Workflow Language (CWL). This would ensure that pipelines designed on the platform are portable and can also be executed in external, CWL-compatible environments (see <xref ref-type="bibr" rid="bib1.bibx3" id="altparen.68"/>).</p>
      <p id="d2e876">For instance, in the Toboliu project, the automated processing of raw data, such as particle size measurements, accelerated re-measurement of samples if the direct feedback on measurement quality indicated any issues. Integrating the database throughout the research process enabled us to create method-specific data views (e.g. grain size analysis, clustering and PCA), that are always based on the most up-to-date version of the data. Our framework addresses the limits of relational databases, as described by <xref ref-type="bibr" rid="bib1.bibx18" id="text.69"/>, by logically separating data management and data analysis through the introduction of denormalised OLAP databases. However, unlike their proposed solution of a data warehouse, we do not consider OLAP to be independent; rather, we consider it to be integrated with the OLTP through ETL pipelines.  Rather than generating static datasets for each analysis, we define the necessary data structures that draw directly from the most recent dataset with every execution. This approach not only allows researchers to focus on analysis rather than manual data integration but, as demonstrated in the Toboliu project, it enables true parallel work between the laboratory and data analysis, significantly accelerating the research process. The automated processing of the measurements enabled early insights in the data, which helped to make strategic decisions on measuring additional samples in an early stage. The ETL pipelines virtually eliminated the need for manual data transformation. The integration of all code-based analyses, such as grain size analysis, depth plots, principal components analysis and cluster analysis, into the post-ETL pipelines enabled code versioning and significantly simplified and accelerated the process.  The researcher could start data analysis while laboratory work was still in progress and test it on incomplete datasets.</p>
</sec>
<sec id="Ch1.S5.SS2.SSS3">
  <label>5.2.3</label><title>Data heterogeneity</title>
      <p id="d2e890">Challenges of data heterogeneity, as identified e.g. by <xref ref-type="bibr" rid="bib1.bibx30" id="text.70"/>, were also evident in the Toboliu project. For instance, Toboliu's heterogeneous, high-dimensional data from sedimentological, geochemical, and dating analyses were prone to inconsistencies due to their distribution across numerous files and irregular updates. Plotting grain size composition along a drill core depth required manual integration of tachymeter geodata, sample and stratigraphic unit information (field documentation), and derived measurement results. Our framework's unique focus on automating data linking and contextualisation significantly reduces the manual effort required by researchers, making it more practical for diverse datasets. Before applying our framework, corrections or remeasurements necessitated repeating the entire data integration and transformation process, increasing the risk of manual errors. By viewing databases as a system that accompanies the entire research process rather than as the final destination for finalised data, we have been able to maintain data integrity from the outset and identify errors immediately. In contrast to <xref ref-type="bibr" rid="bib1.bibx30" id="text.71"/>, our model explicitly considers fieldwork as the beginning of the research data lifecycle and a crucial link between all entities.</p>
</sec>
<sec id="Ch1.S5.SS2.SSS4">
  <label>5.2.4</label><title>Data preservation</title>
      <p id="d2e908">Beyond interoperability for publishing, the framework's structured approach provides a critical defence against the permanent loss of valuable research data. Ultimately, consistently structured recording and detailed documentation of the entire process are the most effective protection. For example, in Toboliu, significantly more data was collected and generated than was ultimately necessary to answer the research questions. Ultimately, only a subset of the 15 drill cores was relevant for the detailed analysis and description of landscape development. The remaining cores were found to be redundant for achieving the research project's objectives at this stage. Much of the data, which was collected at great expense and processed in the laboratory, has therefore not yet been published, even though it may become scientifically relevant in the future. Thanks to the structured storage and detailed documentation of the processing, the data is not only preserved and retrievable but also findable, accessible, interoperable, and reusable within the organisation. Thus, our framework prevents “data cemeteries”, as highlighted by <xref ref-type="bibr" rid="bib1.bibx12" id="text.72"/>, by standardising data management and storage throughout the entire research data lifecycle.</p>
</sec>
</sec>
<sec id="Ch1.S5.SS3">
  <label>5.3</label><title>Applicability of the framework: Implications for other projects and academic institutions</title>
      <p id="d2e923">Beyond its original scope, the framework has already demonstrated its modularity and scalability within our organisation. It is now used to manage and integrate datasets across multiple projects. It has also enabled the standardisation, long-term preservation and accessibility of legacy projects, ensuring compliance with FAIR principles and safeguarding valuable research assets. These capabilities facilitate interdisciplinary collaboration and support future usability of valuable research data.</p>
      <p id="d2e926">While designed and implemented as a generic approach in compliance with the FAIR principles, we adapted the implementation to the needs of our researchers, the technical requirements and limitations of our IT infrastructure and research environment, and the anticipated future uses of the framework, such as machine learning. Its successful application in Toboliu shows that modest investments in data infrastructure can significantly improve research efficiency. This highlights that exclusive investment only in software products is insufficient for complex modern research environments. Therefore, strategic investments in dedicated data engineers to orchestrate complex data flows are essential to unlock the full value of research data. Ultimately, we want to emphasise that our framework is intended to promote a shift in the technical and organisational implementation of research data management in the geosciences, rather than providing a fully functioning end-user application or data models.</p>
</sec>
</sec>
<sec id="Ch1.S6" sec-type="conclusions">
  <label>6</label><title>Conclusions</title>
      <p id="d2e938">This study presented a geoscientific research data framework that integrates data storage and orchestration to address challenges related to the increasing volume and complexity of geoscientific datasets. It focuses on data heterogeneity, spatial complexities, and adherence to FAIR principles (Findable, Accessible, Interoperable, Reusable), transforming raw data into scientific knowledge. Our framework demonstrates, by its successful application in the Toboliu project, that it <list list-type="bullet"><list-item>
      <p id="d2e943">enhances integration of high-dimensional datasets,</p></list-item><list-item>
      <p id="d2e947">streamlines data management,</p></list-item><list-item>
      <p id="d2e951">improves replicability and reproducibility,</p></list-item><list-item>
      <p id="d2e955">promotes FAIR principles, scalability, and transparency and</p></list-item><list-item>
      <p id="d2e959">benefits small research projects with limited resources by simplifying adherence to data management requirements.</p></list-item></list> Although sustained institutional investment in IT infrastructure and expertise is necessary for long-term scalability and sustainability, the framework's proven efficiency and adaptability provide academic institutions with a clear pathway to increasing scientific impact and expanding interdisciplinary collaboration.</p>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <label>Appendix A</label><title>Data Model</title>
      <p id="d2e975">The following tables depict the concrete implementation of the entity types in the OLTP module at the time of this paper's publication, along with their attributes, which take up and expand on the conceptual framework proposed by <xref ref-type="bibr" rid="bib1.bibx30" id="text.73"/> as described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>.  Attributes that refer to other objects, such as foreign keys or one-to-one and many-to-many relationships, are marked in italics. Django's generic models and m:n-tables are not marked separately. For clarity, detailed descriptions of each attribute have been omitted. This also applies to inherited attributes. Both can be viewed in the repository's source code, which has been published separately and linked.</p>
      <p id="d2e983">The OLTP's Django project, and thus the data model, is divided into several modular applications: analysis, bibliography, field_data, and laboratory. Additional Django models are defined at the project level (Base).</p><table-wrap id="TA1"><label>Table A1</label><caption><p id="d2e990">Base.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="11.8cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Table</oasis:entry>
         <oasis:entry colname="col2" align="left">Attributes</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">BaseModel</oasis:entry>
         <oasis:entry colname="col2" align="left">created_at, <italic>created_by</italic>, <italic>updated_by</italic></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">ResearchGroup</oasis:entry>
         <oasis:entry colname="col2" align="left">label, <italic>head_of_group</italic>, <italic>auth_group</italic></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Researcher</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>user</italic>, academic_rank, position, orcid</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Project</oasis:entry>
         <oasis:entry colname="col2" align="left">title, subtitle, label, <italic>principal_investigator</italic>, <italic>associated_investigator</italic>, <italic>research_group</italic>, <italic>parent</italic>, start_date, deadline, description, status, public</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">ProjectUserObjectPermission</oasis:entry>
         <oasis:entry colname="col2" align="left">content_object</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d2e993">At the project level, there are five models that are not assigned to any of the four apps. Defining research groups, researchers and projects enables the basic structuring of the stored data and access control. The <italic>ProjectUserObjectPermission</italic> model enables access control to project-related data. The BaseModel defines the basic properties inherited by all other models: created_at, created_by and updated_by. This allows tracking the creation and modification of each object in the database.</p></table-wrap-foot></table-wrap>

<table-wrap id="TA2"><label>Table A2</label><caption><p id="d2e1097">Analysis.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="13cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Table</oasis:entry>
         <oasis:entry colname="col2" align="left">Attributes</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Algorithm</oasis:entry>
         <oasis:entry colname="col2" align="left">name, version, description, link, file, programming_language</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">RawMeasurement</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>project</italic>, <italic>sample</italic>, <italic>device</italic>, <italic>accessories</italic>, <italic>researcher</italic>, file, description</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">RawProcessing</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>raw_measurement</italic>, processed_file, processing_description, <italic>processed_by</italic>, processing_date, <italic>preparation_algorithm</italic>, <italic>evaluation_algorithm</italic>, <italic>publication</italic></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Counting</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>sample</italic>, <italic>raw_data</italic>, type</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Pollen</oasis:entry>
         <oasis:entry colname="col2" align="left">name, token, name_en, name_ger, name_nor</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">PollenCount</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>counting</italic>, <italic>pollen</italic>, number</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">LuminescenceDating</oasis:entry>
         <oasis:entry colname="col2" align="left">laboratory_id, <italic>sample</italic>, <italic>raw_data</italic>, sample_id_cll, mineral, dating_approach, luminescence_age, age_error, signal, protocol, palaeodose_value, palaeodose_error, dose_rate, dose_rate_error, published, year_of_publication, thesis, comments, grain_size_min, grain_size_max, aliquot_size, aliquot_number_used_for_palaeodose, od_percent, od_percent_error, od_gy, od_gy_error, age_model, beta_source_calibration, instrumental_beta_source_error, uncertainty_beta_source_calibration, fading_correction, g_value, g_value_error, lnat_lsat_ratio, dose_rate_measurement_technique, dose_rate_calculation_software, u_ppm, u_ppm_error, th_ppm, th_ppm_error, k_percent, k_percent_error, water_content_for_dating, water_content_for_dating_error, a_value, a_value_error, alpha_dose_rate, alpha_dose_rate_error, beta_dose_rate, beta_dose_rate_error, gamma_dose_rate, gamma_dose_rate_error, cosmic_dose_rate, cosmic_dose_rate_error</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">RadiocarbonDating</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>sample</italic>, <italic>raw_data</italic>, lab, lab_id, age, error</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Parameter</oasis:entry>
         <oasis:entry colname="col2" align="left">name, token, unit, minimal_limit, maximal_limit, classes</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">MeasurementSeries</oasis:entry>
         <oasis:entry colname="col2" align="left">datetime</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">GenericMeasurement</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>sample</italic>, <italic>raw_data</italic>, <italic>measurement_series</italic>, sample_weight, <italic>method</italic>, <italic>parameter</italic>, value, error</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">GrainSize</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>sample</italic>, <italic>raw_data</italic>, sample_weight, sample_concentration, method, classes, measured_data, clay, fine_silt, medium_silt, coarse_silt, fine_sand, medium_sand, coarse_sand, mean, mode, median, std, skew, kurtosis, fwmean, fwmedian, fwsd, fwskew, fwkurt</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">MicroXRFMeasurement</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>sample</italic>, measurement_date, <italic>method</italic>, notes</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">MicroXRFElementMap</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>measurement</italic>, element, raster_file, unit</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="TA3"><label>Table A3</label><caption><p id="d2e1331">Bibliography.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="11.6cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Table</oasis:entry>
         <oasis:entry colname="col2" align="left">Attributes</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Author</oasis:entry>
         <oasis:entry colname="col2" align="left">last_name, first_name, <italic>user</italic></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">ReferenceKeyword</oasis:entry>
         <oasis:entry colname="col2" align="left">keyword, keyword_ger</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Reference</oasis:entry>
         <oasis:entry colname="col2" align="left">title, year, published, <italic>parent_publication</italic>, <italic>lead_author</italic>, <italic>second_author</italic>, <italic>supervisor</italic>, abstract, journal, volume, number, pages, publisher, location_of_publication, type, <italic>project</italic>, doi, issn, isbn_print, isbn_online, how_to_cite, <italic>keywords</italic></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="TA4"><label>Table A4</label><caption><p id="d2e1409">Field Data.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="14cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Table</oasis:entry>
         <oasis:entry colname="col2" align="left">Attributes</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Country</oasis:entry>
         <oasis:entry colname="col2" align="left">name, iso_code, geometry</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Province</oasis:entry>
         <oasis:entry colname="col2" align="left">name, <italic>country</italic>, geometry</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Tag</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>content_type</italic>, word, slug, <italic>project</italic></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">SampleType</oasis:entry>
         <oasis:entry colname="col2" align="left">word, label</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">StudyArea</oasis:entry>
         <oasis:entry colname="col2" align="left">label, <italic>project</italic>, <italic>province</italic>, geometry, climate_koeppen, ecozone_schultz</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Site</oasis:entry>
         <oasis:entry colname="col2" align="left">label, <italic>study_area</italic>, <italic>tags</italic></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">campaign</oasis:entry>
         <oasis:entry colname="col2" align="left">label, <italic>project</italic>, date_start, date_end, <italic>destination_country</italic>, <italic>study_areas</italic>, season</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Transect</oasis:entry>
         <oasis:entry colname="col2" align="left">identifier, <italic>study_area</italic>, <italic>campaign</italic>, description, multiline</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">ExposureType</oasis:entry>
         <oasis:entry colname="col2" align="left">main_type, abbreviation, name_ger, name_en</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Location</oasis:entry>
         <oasis:entry colname="col2" align="left">data_source, <italic>campaign</italic>, identifier, <italic>project</italic>, <italic>reference</italic>, date_of_record, easting, northing, srid, location, altitude, <italic>study_site</italic>, <italic>transect</italic>, <italic>processor</italic>, exposure_type, line, sampling, gradient_upslope, gradient_downslope, slope_aspect, relief_description, current_weather_conditions, past_weather_conditions, tags</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Layer</oasis:entry>
         <oasis:entry colname="col2" align="left"><italic>location</italic>, identifier, token, description, depth_top, depth_bottom, structure, fine_soil_field, munsell_hue_value, munsell_hue, munsell_value, munsell_chroma, calcite, secondary_calcite, <italic>tags</italic></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Sample</oasis:entry>
         <oasis:entry colname="col2" align="left">identifier, igsn, <italic>project</italic>, date, <italic>location</italic>, <italic>processor</italic>, <italic>parent</italic>, description, material, <italic>layer</italic>, weight, depth_top, depth_bottom, <italic>type</italic>, <italic>tags</italic></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="TA5"><label>Table A5</label><caption><p id="d2e1627">Laboratory.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Table</oasis:entry>
         <oasis:entry colname="col2">Attributes</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Manufacturer</oasis:entry>
         <oasis:entry colname="col2">name, website</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Device</oasis:entry>
         <oasis:entry colname="col2">name, description, token, <italic>manufacturer</italic></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Accessory</oasis:entry>
         <oasis:entry colname="col2"><italic>device</italic>, name, description</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">AccessoryParamter</oasis:entry>
         <oasis:entry colname="col2">method, accessory, parameter_name, parameter_value, parameter_unit</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Method</oasis:entry>
         <oasis:entry colname="col2">name, description, token, <italic>device</italic>, category, laboratory, available</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">calibration</oasis:entry>
         <oasis:entry colname="col2"><italic>device</italic>, date, <italic>researcher</italic>, remarks</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Firmware</oasis:entry>
         <oasis:entry colname="col2"><italic>device</italic>, version, installation_date, changelog</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>


</app>
  </app-group><notes notes-type="codeavailability"><title>Code availability</title>

      <p id="d2e1738">The Python source code for the OLTP module is publicly available on GitHub (<uri>https://github.com/Cologne-Geomorphological-Software-Lab/CGDB</uri>, last access: 14 May 2026) under a permissive MIT license. A persistent, citable version of the repository corresponding to this publication has been archived on Zenodo: <ext-link xlink:href="https://doi.org/10.5281/zenodo.17869730" ext-link-type="DOI">10.5281/zenodo.17869730</ext-link> <xref ref-type="bibr" rid="bib1.bibx15" id="paren.74"/>.</p>
  </notes><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d2e1753">The data presented in the case study serve an illustrative purpose of the database. These data are part of ongoing research projects and will be published alongside their corresponding research articles. Until then, requests for data access can be directed to the corresponding author.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e1759">DH, MM, and TR designed the research objectives and outline of the project. TR and MM provided resources and supervised the project. DH and MM conceptualized and implemented the framework. DH and MZ gathered present challenges in geoscientific research data management and provided data for the Toboliu case study. DH prepared the paper draft  and all authors contributed to writing, reviewing, and editing the manuscript.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e1765">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e1771">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e1777">We thank Dr. Katja Sperveslage for organizational support, and Dr. Stephan Opitz, Marie Gröbner, Dr. Dominik Brill, Dr. Anja Zander, and Florian Steininger for technical assistance. We are grateful to Professor Christina Bogner, Nicodemus Nyamari, and Felix Reize for test datasets, Christina Stollenwerk and Julia Rauchalles for system testing, and the IT staff for technical support. Special thanks to external IT experts Mark Handy and Thomas Schmidt for their critical reviews.</p><p id="d2e1779">To ensure a high standard of communication, we used AI tools to improve the grammar, style, clarity and readability of this manuscript.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e1785">This project was funded by the Key Profile Area “Intelligent Methods for Earth System Sciences” at the University of Cologne (grant number 006). The case study in Toboliu was funded by the Deutsche Forschungsgemeinschaft under project number 436834905. This open-access publication  was funded by Universität zu Köln.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e1796">This paper was edited by Lev Eppelbaum and reviewed by C. Kristina Rossavik and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Chaudhuri and Dayal(1997)</label><mixed-citation>Chaudhuri, S. and Dayal, U.: An Overview of Data Warehousing and OLAP Technology,  SIGMOD Rec., 26, 65–74, <ext-link xlink:href="https://doi.org/10.1145/248603.248616" ext-link-type="DOI">10.1145/248603.248616</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Codd(1970)</label><mixed-citation>Codd, E. F.: A Relational Model of Data for Large Shared Data Banks, Commun. ACM, 13, 377–387, <ext-link xlink:href="https://doi.org/10.1145/362384.362685" ext-link-type="DOI">10.1145/362384.362685</ext-link>, 1970.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Crusoe et al.(2022)</label><mixed-citation>Crusoe, M. R., Abeln, S., Iosup, A., Amstutz, P., Chilton, J., Tijanić, N., Ménager, H., Soiland-Reyes, S., Gavrilović, B., Goble, C., and Community, T. C.: Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language, Commun. ACM, 65, 54–63, <ext-link xlink:href="https://doi.org/10.1145/3486897" ext-link-type="DOI">10.1145/3486897</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Curdt et al.(2019)</label><mixed-citation>Curdt, C., Hoffmeister, D., Kramm, T., Lang, U., and Bareth, G.: Etablierung von Forschungsdatenmanagement-Services in geowissenschaftlichen Sonderforschungsbereichen am Beispiel des SFB/Transregio 32, SFB 1211 und SFB/ Transregio 228, <ext-link xlink:href="https://doi.org/10.17192/BFDM.2019.2.8103" ext-link-type="DOI">10.17192/BFDM.2019.2.8103</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>De Castro et al.(2023)</label><mixed-citation>De Castro, P., Herb, U., Rothfritz, L., and Schöpfel, J.: IGSN – Building and Expanding a Community-Driven PID System, Tech. rep., Zenodo, <ext-link xlink:href="https://doi.org/10.5281/zenodo.7330498" ext-link-type="DOI">10.5281/zenodo.7330498</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Degen et al.(2020)</label><mixed-citation>Degen, D., Veroy, K., and Wellmann, F.: Certified Reduced Basis Method in Geosciences: Addressing the Challenge of High-Dimensional Problems, Computat. Geosci., 24, 241–259, <ext-link xlink:href="https://doi.org/10.1007/s10596-019-09916-6" ext-link-type="DOI">10.1007/s10596-019-09916-6</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Deutsche Forschungsgemeinschaft(2022)</label><mixed-citation>Deutsche Forschungsgemeinschaft: Guidelines for Safeguarding Good Research Practice. Code of Conduct, Zenodo, <ext-link xlink:href="https://doi.org/10.5281/zenodo.6472827" ext-link-type="DOI">10.5281/zenodo.6472827</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>European Commission et al.(2018)</label><mixed-citation>European Commission, Directorate General for Research and Innovation, and PwC EU Services: Cost-Benefit Analysis for FAIR Research Data: Cost of  Not Having FAIR Research Data, Publications Office, LU, <ext-link xlink:href="https://doi.org/10.2777/02999" ext-link-type="DOI">10.2777/02999</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Faniel and Jacobsen(2010)</label><mixed-citation>Faniel, I. M. and Jacobsen, T. E.: Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues' Data, Comp. Supp. Coop. Wor., 19, 355–375, <ext-link xlink:href="https://doi.org/10.1007/s10606-010-9117-8" ext-link-type="DOI">10.1007/s10606-010-9117-8</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Fitschen et al.(2019)</label><mixed-citation>Fitschen, T., Schlemmer, A., Hornung, D., Tom Wörden, H., Parlitz, U., and Luther, S.: CaosDB – Research Data Management for Complex, Changing, and Automated Research Workflows, Data, 4, 83, <ext-link xlink:href="https://doi.org/10.3390/data4020083" ext-link-type="DOI">10.3390/data4020083</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Gahegan(2020)</label><mixed-citation>Gahegan, M.: Fourth Paradigm GIScience? Prospects for Automated Discovery and Explanation from Data, Int. J. Geogr. Inf. Sci., 34, 1–21, <ext-link xlink:href="https://doi.org/10.1080/13658816.2019.1652304" ext-link-type="DOI">10.1080/13658816.2019.1652304</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Gärtner et al.(2001)</label><mixed-citation>Gärtner, H., Bergmann, A., and Schmidt, J.: Object-Oriented Modeling of Data Sources as a Tool for the Integration of Heterogeneous Geoscientific Information, Comput. Geosci., 27, 975–985, <ext-link xlink:href="https://doi.org/10.1016/S0098-3004(00)00135-7" ext-link-type="DOI">10.1016/S0098-3004(00)00135-7</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Glaser et al.(2020)</label><mixed-citation>Glaser, B., Kienlin, T., Röpke, A., and Deutsche Forschungsgemeinschaft: Separiert Oder Integriert? Studien Zur Entwicklung, Organisation Und Sozialen Struktur Der Komplexen Bronzezeitlichen Tellsiedlung von Toboliu, Westrumänien. Teil 2: Naturwissenschaftliche Untersuchungen, <uri>https://gepris.dfg.de/gepris/projekt/436834905</uri> (last access: 14 May 2026), 2020.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Guizzardi(2020)</label><mixed-citation>Guizzardi, G.: Ontology, Ontologies and the “I” of FAIR, Data Intelligence, 2, 181–191, <ext-link xlink:href="https://doi.org/10.1162/dint_a_00040" ext-link-type="DOI">10.1162/dint_a_00040</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Handy and van der Meij(2026)</label><mixed-citation>Handy, D. and van der Meij, M.: Cologne-Geomorphological-Software-Lab/CGDB: Implemented Dagster Boilerplate (v1.1.0), Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.17869730" ext-link-type="DOI">10.5281/zenodo.17869730</ext-link>, 2026.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Hornung et al.(2024)</label><mixed-citation>Hornung, D., Spreckelsen, F., and Weiß, T.: Agile Research Data Management with Open Source: LinkAhead, ing.grid, 1, <ext-link xlink:href="https://doi.org/10.48694/INGGRID.3866" ext-link-type="DOI">10.48694/INGGRID.3866</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>IUSS Working Group (WRB)(2022)</label><mixed-citation> IUSS Working Group (WRB): World Reference Base for Soil Resources. International Soil Classification System for Naming Soils and Creating Legends for Soil Maps, 4th edn., International Union of Soil Sciences (IUSS), Vienna, Austria, ISBN 979-8-9862451-1-9, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Kingdon et al.(2016)</label><mixed-citation>Kingdon, A., Nayembil, M. L., Richardson, A. E., and Smith, A. G.: A Geodata Warehouse: Using Denormalisation Techniques as a Tool for Delivering Spatially Enabled Integrated Geological Information to Geologists, Comput. Geosci., 96, 87–97, <ext-link xlink:href="https://doi.org/10.1016/j.cageo.2016.07.016" ext-link-type="DOI">10.1016/j.cageo.2016.07.016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Klöcking et al.(2023)</label><mixed-citation>Klöcking, M., Wyborn, L., Lehnert, K. A., Ware, B., Prent, A. M., Profeta, L., Kohlmann, F., Noble, W., Bruno, I., Lambart, S., Ananuer, H., Barber, N. D., Becker, H., Brodbeck, M., Deng, H., Deng, K., Elger, K., De Souza Franco, G., Gao, Y., Ghasera, K. M., Hezel, D. C., Huang, J., Kerswell, B., Koch, H., Lanati, A. W., Ter Maat, G., Martínez-Villegas, N., Nana Yobo, L., Redaa, A., Schäfer, W., Swing, M. R., Taylor, R. J., Traun, M. K., Whelan, J., and Zhou, T.: Community Recommendations for Geochemical Data, Services and Analytical Capabilities in the 21st Century, Geochim. Cosmochim. Ac., 351, 192–205, <ext-link xlink:href="https://doi.org/10.1016/j.gca.2023.04.024" ext-link-type="DOI">10.1016/j.gca.2023.04.024</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Klump et al.(2021)</label><mixed-citation>Klump, J., Lehnert, K., Ulbricht, D., Devaraju, A., Elger, K., Fleischer, D., Ramdeen, S., and Wyborn, L.: Towards Globally Unique Identification of Physical Samples: Governance and Technical Implementation of the IGSN Global Sample Number, Data Science Journal, 20, 33, <ext-link xlink:href="https://doi.org/10.5334/dsj-2021-033" ext-link-type="DOI">10.5334/dsj-2021-033</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Kottek et al.(2006)</label><mixed-citation>Kottek, M., Grieser, J., Beck, C., Rudolf, B., and Rubel, F.: World Map of the Köppen-Geiger Climate Classification Updated, Meteorol. Z., 15, 259–263, <ext-link xlink:href="https://doi.org/10.1127/0941-2948/2006/0130" ext-link-type="DOI">10.1127/0941-2948/2006/0130</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Lannom et al.(2020)</label><mixed-citation>Lannom, L., Koureas, D., and Hardisty, A. R.: FAIR Data and Services in Biodiversity Science and Geoscience, Data Intelligence, 2, 122–130, <ext-link xlink:href="https://doi.org/10.1162/dint_a_00034" ext-link-type="DOI">10.1162/dint_a_00034</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Leo et al.(2024)</label><mixed-citation>Leo, S., Crusoe, M. R., Rodríguez-Navas, L., Sirvent, R., Kanitz, A., De Geest, P., Wittner, R., Pireddu, L., Garijo, D., Fernández, J. M., Colonnelli, I., Gallo, M., Ohta, T., Suetake, H., Capella-Gutierrez, S., De Wit, R., Kinoshita, B. P., and Soiland-Reyes, S.: Recording Provenance of Workflow Runs with RO-Crate, PLOS ONE, 19, e0309210, <ext-link xlink:href="https://doi.org/10.1371/journal.pone.0309210" ext-link-type="DOI">10.1371/journal.pone.0309210</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Li et al.(2015)</label><mixed-citation>Li, Z., Yang, C., Jin, B., Yu, M., Liu, K., Sun, M., and Zhan, M.: Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework, PLOS ONE, 10, e0116781, <ext-link xlink:href="https://doi.org/10.1371/journal.pone.0116781" ext-link-type="DOI">10.1371/journal.pone.0116781</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Liakos and Panagos(2022)</label><mixed-citation>Liakos, L. and Panagos, P.: Challenges in the Geo-Processing of Big Soil Spatial Data, Land, 11, 2287, <ext-link xlink:href="https://doi.org/10.3390/land11122287" ext-link-type="DOI">10.3390/land11122287</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Mitchell et al.(2022)</label><mixed-citation>Mitchell, S. N., Lahiff, A., Cummings, N., Hollocombe, J., Boskamp, B., Field, R., Reddyhoff, D., Zarebski, K., Wilson, A., Viola, B., Burke, M., Archibald, B., Bessell, P., Blackwell, R., Boden, L. A., Brett, A., Brett, S., Dundas, R., Enright, J., Gonzalez-Beltran, A. N., Harris, C., Hinder, I., David Hughes, C., Knight, M., Mano, V., McMonagle, C., Mellor, D., Mohr, S., Marion, G., Matthews, L., McKendrick, I. J., Mark Pooley, C., Porphyre, T., Reeves, A., Townsend, E., Turner, R., Walton, J., and Reeve, R.: FAIR Data Pipeline: Provenance-Driven Data Management for Traceable Scientific Workflows, Philos. T. Roy. Soc. A, 380, 20210300, <ext-link xlink:href="https://doi.org/10.1098/rsta.2021.0300" ext-link-type="DOI">10.1098/rsta.2021.0300</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Mork et al.(2015)</label><mixed-citation>Mork, R., Martin, P., and Zhao, Z.: Contemporary Challenges for Data-Intensive Scientific Workflow Management Systems, in: Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science, 1–11, ACM, Austin Texas, ISBN 978-1-4503-3989-6, <ext-link xlink:href="https://doi.org/10.1145/2822332.2822336" ext-link-type="DOI">10.1145/2822332.2822336</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Murillo(2019)</label><mixed-citation> Murillo, A. P.: Data Matters: How Earth and Environmental Scientists Determine Data Relevance and Reusability, Collection and Curation, 41, 77–86, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Nikparvar and Thill(2021)</label><mixed-citation>Nikparvar, B. and Thill, J.-C.: Machine Learning of Spatial Data, ISPRS Int. J. Geo-Inform., 10, 1–32, <ext-link xlink:href="https://doi.org/10.3390/ijgi10090600" ext-link-type="DOI">10.3390/ijgi10090600</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Nordsiek and Halisch(2024)</label><mixed-citation>Nordsiek, S. and Halisch, M.: Making geoscientific lab data FAIR: a conceptual model for a geophysical laboratory database, Geosci. Instrum. Method. Data Syst., 13, 63–73, <ext-link xlink:href="https://doi.org/10.5194/gi-13-63-2024" ext-link-type="DOI">10.5194/gi-13-63-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Picatto et al.(2024)</label><mixed-citation>Picatto, H., Heiler, G., and Klimek, P.: Cost-Effective Big Data Orchestration Using Dagster: A Multi-Platform Approach, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arxiv.2408.11635" ext-link-type="DOI">10.48550/arxiv.2408.11635</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Raasveldt and Mühleisen(2019)</label><mixed-citation>Raasveldt, M. and Mühleisen, H.: DuckDB: An Embeddable Analytical Database, in: Proceedings of the 2019 International Conference on Management of Data, 1981–1984, ACM, Amsterdam, the Netherlands, ISBN 978-1-4503-5643-5, <ext-link xlink:href="https://doi.org/10.1145/3299869.3320212" ext-link-type="DOI">10.1145/3299869.3320212</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Raj et al.(2020)</label><mixed-citation>Raj, A., Bosch, J., Olsson, H. H., and Wang, T. J.: Modelling Data Pipelines, in: 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 13–20, IEEE, Portoroz, Slovenia, ISBN 978-1-7281-9532-2, <ext-link xlink:href="https://doi.org/10.1109/SEAA51224.2020.00014" ext-link-type="DOI">10.1109/SEAA51224.2020.00014</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Soiland-Reyes et al.(2022)</label><mixed-citation>Soiland-Reyes, S., Sefton, P., Crosas, M., Castro, L. J., Coppens, F., Fernández, J. M., Garijo, D., Grüning, B., La Rosa, M., Leo, S., Ó Carragáin, E., Portier, M., Trisovic, A., RO-Crate Community, Groth, P., and Goble, C.: Packaging Research Artefacts with RO-Crate, Data Science, 5, 97–138, <ext-link xlink:href="https://doi.org/10.3233/DS-210053" ext-link-type="DOI">10.3233/DS-210053</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Soil Science Division Staff(2017)</label><mixed-citation> Soil Science Division Staff: Soil Survey Manual, no. 18 in USDA Handbook, Government Printing Office, Washington, D.C., ISBN 9780160937439, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Stocker et al.(2018)</label><mixed-citation>Stocker, M., Paasonen, P., Fiebig, M., Zaidan, M. A., and Hardisty, A.: Curating Scientific Information in Knowledge Infrastructures, Data Science Journal, 17, 21, <ext-link xlink:href="https://doi.org/10.5334/dsj-2018-021" ext-link-type="DOI">10.5334/dsj-2018-021</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Suter et al.(2026)</label><mixed-citation>Suter, F., Coleman, T., Altintaş, İ., Badia, R. M., Balis, B., Chard, K., Colonnelli, I., Deelman, E., Di Tommaso, P., Fahringer, T., Goble, C., Jha, S., Katz, D. S., Köster, J., Leser, U., Mehta, K., Oliver, H., Peterson, J.-L., Pizzi, G., Pottier, L., Sirvent, R., Suchyta, E., Thain, D., Wilkinson, S. R., Wozniak, J. M., and Ferreira Da Silva, R.: A Terminology for Scientific Workflow Systems, Future Gener. Comp. Sy., 174, 107974, <ext-link xlink:href="https://doi.org/10.1016/j.future.2025.107974" ext-link-type="DOI">10.1016/j.future.2025.107974</ext-link>, 2026.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Tedersoo et al.(2021)</label><mixed-citation>Tedersoo, L., Küngas, R., Oras, E., Köster, K., Eenmaa, H., Leijen, Ä., Pedaste, M., Raju, M., Astapova, A., Lukner, H., Kogermann, K., and Sepp, T.: Data Sharing Practices and Data Availability upon Request Differ across Scientific Disciplines, Scientific Data, 8, 192, <ext-link xlink:href="https://doi.org/10.1038/s41597-021-00981-0" ext-link-type="DOI">10.1038/s41597-021-00981-0</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Thomer and Wickett(2020)</label><mixed-citation>Thomer, A. K. and Wickett, K. M.: Relational Data Paradigms: What Do We Learn by Taking the Materiality of Databases Seriously?, Big Data &amp; Society, 7, 205395172093483, <ext-link xlink:href="https://doi.org/10.1177/2053951720934838" ext-link-type="DOI">10.1177/2053951720934838</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Vance et al.(2024)</label><mixed-citation>Vance, T. C., Huang, T., and Butler, K. A.: Big Data in Earth Science: Emerging Practice and Promise, Science, 383, eadh9607, <ext-link xlink:href="https://doi.org/10.1126/science.adh9607" ext-link-type="DOI">10.1126/science.adh9607</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Van Den Brink et al.(2018)</label><mixed-citation>Van Den Brink, L., Barnaghi, P., Tandy, J., Atemezing, G., Atkinson, R., Cochrane, B., Fathy, Y., García Castro, R., Haller, A., Harth, A., Janowicz, K., Kolozali, Ş., Van Leeuwen, B., Lefrançois, M., Lieberman, J., Perego, A., Le-Phuoc, D., Roberts, B., Taylor, K., and Troncy, R.: Best Practices for Publishing, Retrieving, and Using Spatial Data on the Web, Semant. Web, 10, 95–114, <ext-link xlink:href="https://doi.org/10.3233/SW-180305" ext-link-type="DOI">10.3233/SW-180305</ext-link>, 2018. </mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Wilkinson et al.(2016)</label><mixed-citation>Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J., Groth, P., Goble, C., Grethe, J. S., Heringa, J., 'T Hoen, P. A., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., Van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., Van Der Lei, J., Van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B.: The FAIR Guiding Principles for Scientific Data Management and Stewardship, Scientific Data, 3, 160018, <ext-link xlink:href="https://doi.org/10.1038/sdata.2016.18" ext-link-type="DOI">10.1038/sdata.2016.18</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Willmes et al.(2014)</label><mixed-citation>Willmes, C., Kürner, D., and Bareth, G.: Building Research Data Management Infrastructure Using Open Source Software, T. GIS, 18, 496–509, <ext-link xlink:href="https://doi.org/10.1111/tgis.12060" ext-link-type="DOI">10.1111/tgis.12060</ext-link>, 2014.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>A database-driven research data framework for integrating and processing high-dimensional geoscientific data</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Chaudhuri and Dayal(1997)</label><mixed-citation>
      
Chaudhuri, S. and Dayal, U.: An Overview of Data Warehousing and OLAP Technology,  SIGMOD Rec., 26, 65–74, <a href="https://doi.org/10.1145/248603.248616" target="_blank">https://doi.org/10.1145/248603.248616</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Codd(1970)</label><mixed-citation>
      
Codd, E. F.: A Relational Model of Data for Large Shared Data Banks, Commun. ACM, 13, 377–387, <a href="https://doi.org/10.1145/362384.362685" target="_blank">https://doi.org/10.1145/362384.362685</a>, 1970.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Crusoe et al.(2022)</label><mixed-citation>
      
Crusoe, M. R., Abeln, S., Iosup, A., Amstutz, P., Chilton, J., Tijanić, N., Ménager, H., Soiland-Reyes, S., Gavrilović, B., Goble, C., and Community, T. C.: Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language, Commun. ACM, 65, 54–63, <a href="https://doi.org/10.1145/3486897" target="_blank">https://doi.org/10.1145/3486897</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Curdt et al.(2019)</label><mixed-citation>
      
Curdt, C., Hoffmeister, D., Kramm, T., Lang, U., and Bareth, G.: Etablierung von Forschungsdatenmanagement-Services in geowissenschaftlichen Sonderforschungsbereichen am Beispiel des SFB/Transregio 32, SFB 1211 und SFB/ Transregio 228, <a href="https://doi.org/10.17192/BFDM.2019.2.8103" target="_blank">https://doi.org/10.17192/BFDM.2019.2.8103</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>De Castro et al.(2023)</label><mixed-citation>
      
De Castro, P., Herb, U., Rothfritz, L., and Schöpfel, J.: IGSN – Building and Expanding a Community-Driven PID System, Tech. rep., Zenodo,
<a href="https://doi.org/10.5281/zenodo.7330498" target="_blank">https://doi.org/10.5281/zenodo.7330498</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Degen et al.(2020)</label><mixed-citation>
      
Degen, D., Veroy, K., and Wellmann, F.: Certified Reduced Basis Method in Geosciences: Addressing the Challenge of High-Dimensional Problems, Computat. Geosci., 24, 241–259, <a href="https://doi.org/10.1007/s10596-019-09916-6" target="_blank">https://doi.org/10.1007/s10596-019-09916-6</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Deutsche
Forschungsgemeinschaft(2022)</label><mixed-citation>
      
Deutsche Forschungsgemeinschaft: Guidelines for Safeguarding Good Research
Practice. Code of Conduct, Zenodo, <a href="https://doi.org/10.5281/zenodo.6472827" target="_blank">https://doi.org/10.5281/zenodo.6472827</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>European Commission et al.(2018)</label><mixed-citation>
      
European Commission, Directorate General for Research and Innovation, and
PwC EU Services: Cost-Benefit Analysis for FAIR Research Data: Cost of  Not Having FAIR Research Data, Publications Office, LU, <a href="https://doi.org/10.2777/02999" target="_blank">https://doi.org/10.2777/02999</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Faniel and Jacobsen(2010)</label><mixed-citation>
      
Faniel, I. M. and Jacobsen, T. E.: Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues' Data, Comp. Supp. Coop. Wor., 19, 355–375, <a href="https://doi.org/10.1007/s10606-010-9117-8" target="_blank">https://doi.org/10.1007/s10606-010-9117-8</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Fitschen et al.(2019)</label><mixed-citation>
      
Fitschen, T., Schlemmer, A., Hornung, D., Tom Wörden, H., Parlitz, U., and Luther, S.: CaosDB – Research Data Management for Complex,
Changing, and Automated Research Workflows, Data, 4, 83,
<a href="https://doi.org/10.3390/data4020083" target="_blank">https://doi.org/10.3390/data4020083</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Gahegan(2020)</label><mixed-citation>
      
Gahegan, M.: Fourth Paradigm GIScience? Prospects for Automated Discovery and Explanation from Data, Int. J. Geogr. Inf. Sci., 34, 1–21, <a href="https://doi.org/10.1080/13658816.2019.1652304" target="_blank">https://doi.org/10.1080/13658816.2019.1652304</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Gärtner et al.(2001)</label><mixed-citation>
      
Gärtner, H., Bergmann, A., and Schmidt, J.: Object-Oriented Modeling of Data Sources as a Tool for the Integration of Heterogeneous Geoscientific Information, Comput. Geosci., 27, 975–985, <a href="https://doi.org/10.1016/S0098-3004(00)00135-7" target="_blank">https://doi.org/10.1016/S0098-3004(00)00135-7</a>, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Glaser et al.(2020)</label><mixed-citation>
      
Glaser, B., Kienlin, T., Röpke, A., and Deutsche Forschungsgemeinschaft: Separiert Oder Integriert? Studien Zur Entwicklung, Organisation Und Sozialen Struktur Der Komplexen Bronzezeitlichen Tellsiedlung von Toboliu, Westrumänien. Teil 2: Naturwissenschaftliche Untersuchungen, <a href="https://gepris.dfg.de/gepris/projekt/436834905" target="_blank"/> (last access: 14 May 2026), 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Guizzardi(2020)</label><mixed-citation>
      
Guizzardi, G.: Ontology, Ontologies and the “I” of FAIR, Data Intelligence, 2, 181–191, <a href="https://doi.org/10.1162/dint_a_00040" target="_blank">https://doi.org/10.1162/dint_a_00040</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Handy and van der Meij(2026)</label><mixed-citation>
      
Handy, D. and van der Meij, M.: Cologne-Geomorphological-Software-Lab/CGDB: Implemented Dagster Boilerplate (v1.1.0), Zenodo [code], <a href="https://doi.org/10.5281/zenodo.17869730" target="_blank">https://doi.org/10.5281/zenodo.17869730</a>, 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Hornung et al.(2024)</label><mixed-citation>
      
Hornung, D., Spreckelsen, F., and Weiß, T.: Agile Research Data Management with Open Source: LinkAhead, ing.grid, 1, <a href="https://doi.org/10.48694/INGGRID.3866" target="_blank">https://doi.org/10.48694/INGGRID.3866</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>IUSS Working Group (WRB)(2022)</label><mixed-citation>
      
IUSS Working Group (WRB): World Reference Base for Soil Resources. International Soil Classification System for Naming Soils and Creating Legends for Soil Maps, 4th edn., International Union of Soil Sciences (IUSS), Vienna, Austria, ISBN 979-8-9862451-1-9, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Kingdon et al.(2016)</label><mixed-citation>
      
Kingdon, A., Nayembil, M. L., Richardson, A. E., and Smith, A. G.: A Geodata Warehouse: Using Denormalisation Techniques as a Tool for Delivering Spatially Enabled Integrated Geological Information to Geologists, Comput. Geosci., 96, 87–97, <a href="https://doi.org/10.1016/j.cageo.2016.07.016" target="_blank">https://doi.org/10.1016/j.cageo.2016.07.016</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Klöcking et al.(2023)</label><mixed-citation>
      
Klöcking, M., Wyborn, L., Lehnert, K. A., Ware, B., Prent, A. M., Profeta, L., Kohlmann, F., Noble, W., Bruno, I., Lambart, S., Ananuer, H., Barber, N. D., Becker, H., Brodbeck, M., Deng, H., Deng, K., Elger, K., De Souza Franco, G., Gao, Y., Ghasera, K. M., Hezel, D. C., Huang, J., Kerswell, B., Koch, H., Lanati, A. W., Ter Maat, G., Martínez-Villegas, N., Nana Yobo, L., Redaa, A., Schäfer, W., Swing, M. R., Taylor, R. J., Traun, M. K., Whelan, J., and Zhou, T.: Community Recommendations for Geochemical Data, Services and Analytical Capabilities in the 21st Century, Geochim. Cosmochim. Ac., 351, 192–205, <a href="https://doi.org/10.1016/j.gca.2023.04.024" target="_blank">https://doi.org/10.1016/j.gca.2023.04.024</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Klump et al.(2021)</label><mixed-citation>
      
Klump, J., Lehnert, K., Ulbricht, D., Devaraju, A., Elger, K., Fleischer, D., Ramdeen, S., and Wyborn, L.: Towards Globally Unique Identification of Physical Samples: Governance and Technical Implementation of the IGSN Global Sample Number, Data Science Journal, 20, 33,
<a href="https://doi.org/10.5334/dsj-2021-033" target="_blank">https://doi.org/10.5334/dsj-2021-033</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Kottek et al.(2006)</label><mixed-citation>
      
Kottek, M., Grieser, J., Beck, C., Rudolf, B., and Rubel, F.: World Map of the Köppen-Geiger Climate Classification Updated, Meteorol. Z., 15, 259–263, <a href="https://doi.org/10.1127/0941-2948/2006/0130" target="_blank">https://doi.org/10.1127/0941-2948/2006/0130</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Lannom et al.(2020)</label><mixed-citation>
      
Lannom, L., Koureas, D., and Hardisty, A. R.: FAIR Data and Services in Biodiversity Science and Geoscience, Data Intelligence, 2, 122–130, <a href="https://doi.org/10.1162/dint_a_00034" target="_blank">https://doi.org/10.1162/dint_a_00034</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Leo et al.(2024)</label><mixed-citation>
      
Leo, S., Crusoe, M. R., Rodríguez-Navas, L., Sirvent, R., Kanitz, A., De Geest, P., Wittner, R., Pireddu, L., Garijo, D., Fernández, J. M., Colonnelli, I., Gallo, M., Ohta, T., Suetake, H., Capella-Gutierrez, S., De Wit, R., Kinoshita, B. P., and Soiland-Reyes, S.: Recording Provenance of Workflow Runs with RO-Crate, PLOS ONE, 19, e0309210,
<a href="https://doi.org/10.1371/journal.pone.0309210" target="_blank">https://doi.org/10.1371/journal.pone.0309210</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Li et al.(2015)</label><mixed-citation>
      
Li, Z., Yang, C., Jin, B., Yu, M., Liu, K., Sun, M., and Zhan, M.: Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework, PLOS ONE, 10, e0116781, <a href="https://doi.org/10.1371/journal.pone.0116781" target="_blank">https://doi.org/10.1371/journal.pone.0116781</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Liakos and Panagos(2022)</label><mixed-citation>
      
Liakos, L. and Panagos, P.: Challenges in the Geo-Processing of Big Soil Spatial Data, Land, 11, 2287, <a href="https://doi.org/10.3390/land11122287" target="_blank">https://doi.org/10.3390/land11122287</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Mitchell et al.(2022)</label><mixed-citation>
      
Mitchell, S. N., Lahiff, A., Cummings, N., Hollocombe, J., Boskamp, B., Field, R., Reddyhoff, D., Zarebski, K., Wilson, A., Viola, B., Burke, M., Archibald, B., Bessell, P., Blackwell, R., Boden, L. A., Brett, A., Brett, S., Dundas, R., Enright, J., Gonzalez-Beltran, A. N., Harris, C., Hinder, I., David Hughes, C., Knight, M., Mano, V., McMonagle, C., Mellor, D., Mohr, S., Marion, G., Matthews, L., McKendrick, I. J., Mark Pooley, C., Porphyre, T., Reeves, A., Townsend, E., Turner, R., Walton, J., and Reeve, R.: FAIR Data Pipeline: Provenance-Driven Data Management for Traceable Scientific Workflows, Philos. T. Roy. Soc. A, 380, 20210300, <a href="https://doi.org/10.1098/rsta.2021.0300" target="_blank">https://doi.org/10.1098/rsta.2021.0300</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Mork et al.(2015)</label><mixed-citation>
      
Mork, R., Martin, P., and Zhao, Z.: Contemporary Challenges for Data-Intensive Scientific Workflow Management Systems, in: Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science, 1–11, ACM, Austin Texas, ISBN 978-1-4503-3989-6,
<a href="https://doi.org/10.1145/2822332.2822336" target="_blank">https://doi.org/10.1145/2822332.2822336</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Murillo(2019)</label><mixed-citation>
      
Murillo, A. P.: Data Matters: How Earth and Environmental Scientists Determine Data Relevance and Reusability, Collection and Curation, 41, 77–86, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Nikparvar and Thill(2021)</label><mixed-citation>
      
Nikparvar, B. and Thill, J.-C.: Machine Learning of Spatial Data, ISPRS Int. J. Geo-Inform., 10, 1–32, <a href="https://doi.org/10.3390/ijgi10090600" target="_blank">https://doi.org/10.3390/ijgi10090600</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Nordsiek and Halisch(2024)</label><mixed-citation>
      
Nordsiek, S. and Halisch, M.: Making geoscientific lab data FAIR: a conceptual model for a geophysical laboratory database, Geosci. Instrum. Method. Data Syst., 13, 63–73, <a href="https://doi.org/10.5194/gi-13-63-2024" target="_blank">https://doi.org/10.5194/gi-13-63-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Picatto et al.(2024)</label><mixed-citation>
      
Picatto, H., Heiler, G., and Klimek, P.: Cost-Effective Big Data Orchestration Using Dagster: A Multi-Platform Approach, arXiv [preprint], <a href="https://doi.org/10.48550/arxiv.2408.11635" target="_blank">https://doi.org/10.48550/arxiv.2408.11635</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Raasveldt and Mühleisen(2019)</label><mixed-citation>
      
Raasveldt, M. and Mühleisen, H.: DuckDB: An Embeddable Analytical Database, in: Proceedings of the 2019 International Conference on Management of Data, 1981–1984, ACM, Amsterdam, the Netherlands, ISBN 978-1-4503-5643-5, <a href="https://doi.org/10.1145/3299869.3320212" target="_blank">https://doi.org/10.1145/3299869.3320212</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Raj et al.(2020)</label><mixed-citation>
      
Raj, A., Bosch, J., Olsson, H. H., and Wang, T. J.: Modelling Data Pipelines, in: 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 13–20, IEEE, Portoroz, Slovenia, ISBN 978-1-7281-9532-2, <a href="https://doi.org/10.1109/SEAA51224.2020.00014" target="_blank">https://doi.org/10.1109/SEAA51224.2020.00014</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Soiland-Reyes et al.(2022)</label><mixed-citation>
      
Soiland-Reyes, S., Sefton, P., Crosas, M., Castro, L. J., Coppens, F., Fernández, J. M., Garijo, D., Grüning, B., La Rosa, M., Leo, S., Ó Carragáin, E., Portier, M., Trisovic, A., RO-Crate Community, Groth, P., and Goble, C.: Packaging Research Artefacts with RO-Crate, Data Science, 5, 97–138, <a href="https://doi.org/10.3233/DS-210053" target="_blank">https://doi.org/10.3233/DS-210053</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Soil Science Division Staff(2017)</label><mixed-citation>
      
Soil Science Division Staff: Soil Survey Manual, no. 18 in USDA Handbook, Government Printing Office, Washington, D.C., ISBN 9780160937439, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Stocker et al.(2018)</label><mixed-citation>
      
Stocker, M., Paasonen, P., Fiebig, M., Zaidan, M. A., and Hardisty, A.: Curating Scientific Information in Knowledge Infrastructures, Data Science Journal, 17, 21, <a href="https://doi.org/10.5334/dsj-2018-021" target="_blank">https://doi.org/10.5334/dsj-2018-021</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Suter et al.(2026)</label><mixed-citation>
      
Suter, F., Coleman, T., Altintaş, İ., Badia, R. M., Balis, B., Chard, K., Colonnelli, I., Deelman, E., Di Tommaso, P., Fahringer, T., Goble, C., Jha, S., Katz, D. S., Köster, J., Leser, U., Mehta, K., Oliver, H., Peterson, J.-L., Pizzi, G., Pottier, L., Sirvent, R., Suchyta, E., Thain, D., Wilkinson, S. R., Wozniak, J. M., and Ferreira Da Silva, R.: A Terminology for Scientific Workflow Systems, Future Gener. Comp. Sy., 174, 107974, <a href="https://doi.org/10.1016/j.future.2025.107974" target="_blank">https://doi.org/10.1016/j.future.2025.107974</a>, 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Tedersoo et al.(2021)</label><mixed-citation>
      
Tedersoo, L., Küngas, R., Oras, E., Köster, K., Eenmaa, H., Leijen, Ä., Pedaste, M., Raju, M., Astapova, A., Lukner, H., Kogermann, K., and Sepp, T.: Data Sharing Practices and Data Availability upon Request Differ across Scientific Disciplines, Scientific Data, 8, 192, <a href="https://doi.org/10.1038/s41597-021-00981-0" target="_blank">https://doi.org/10.1038/s41597-021-00981-0</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Thomer and Wickett(2020)</label><mixed-citation>
      
Thomer, A. K. and Wickett, K. M.: Relational Data Paradigms: What Do We Learn by Taking the Materiality of Databases Seriously?, Big Data &amp; Society,
7, 205395172093483, <a href="https://doi.org/10.1177/2053951720934838" target="_blank">https://doi.org/10.1177/2053951720934838</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Vance et al.(2024)</label><mixed-citation>
      
Vance, T. C., Huang, T., and Butler, K. A.: Big Data in Earth Science: Emerging Practice and Promise, Science, 383, eadh9607, <a href="https://doi.org/10.1126/science.adh9607" target="_blank">https://doi.org/10.1126/science.adh9607</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Van Den Brink et al.(2018)</label><mixed-citation>
      
Van Den Brink, L., Barnaghi, P., Tandy, J., Atemezing, G., Atkinson, R., Cochrane, B., Fathy, Y., García Castro, R., Haller, A., Harth, A., Janowicz, K., Kolozali, Ş., Van Leeuwen, B., Lefrançois, M., Lieberman, J., Perego, A., Le-Phuoc, D., Roberts, B., Taylor, K., and Troncy, R.: Best Practices for Publishing, Retrieving, and Using Spatial Data on the Web, Semant. Web, 10, 95–114, <a href="https://doi.org/10.3233/SW-180305" target="_blank">https://doi.org/10.3233/SW-180305</a>, 2018.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Wilkinson et al.(2016)</label><mixed-citation>
      
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J., Groth, P., Goble, C., Grethe, J. S., Heringa, J., 'T Hoen, P. A., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., Van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., Van Der Lei, J., Van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B.: The FAIR Guiding Principles for Scientific Data Management and Stewardship, Scientific Data, 3, 160018, <a href="https://doi.org/10.1038/sdata.2016.18" target="_blank">https://doi.org/10.1038/sdata.2016.18</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Willmes et al.(2014)</label><mixed-citation>
      
Willmes, C., Kürner, D., and Bareth, G.: Building Research Data Management Infrastructure Using Open Source Software, T. GIS, 18, 496–509, <a href="https://doi.org/10.1111/tgis.12060" target="_blank">https://doi.org/10.1111/tgis.12060</a>, 2014.

    </mixed-citation></ref-html>--></article>
