The GPlates Geological Information Model and Markup Language

The GPlates Geological Information Model and Markup Language X. Qin, R. D. Müller, J. Cannon, T. C. W. Landgrebe, C. Heine, R. J. Watson, and M. Turner EarthByte Group, School of Geosciences, University of Sydney, Sydney, NSW 2006, Australia Geodynamics Team, Geological Survey of Norway, P.O. Box 6315, Sluppen, 7491 Trondheim, Norway Seismological Laboratory, California Institute of Technology, Pasadena, California, USA


Introduction
Representing heterogeneous geospatial information in an Earth history context is a complex issue.At one extreme, if we attempt to express all spatio-temporal information uniformly in terms of a fixed set of attributes, we risk both missing important attributes that are relevant to only some types of information, as well as coercing a user's view to be constrained to those attributes.At the other extreme, if we accept any combination of attributes and attribute types, there is no guidance about which attributes and their properties are expected or required by plate tectonic software; as a result, attributes will be chosen ad-hoc from project to project, hindering attempts to share data between projects.We, therefore, created the GPlates Geological Information Model (GPGIM) to avoid these pitfalls.An information model is an abstract, formal representation of entities in a constrained domain context.It provides formalism to the description of the chosen domain of discourse without constraining how it is mapped to an actual software implementation.The advantage of using an information model is that it can provide sharable, stable, and organised structure of information requirements for the domain context (Lee, 1999).
Traditional plate-tectonics data have consisted of vector geometries and primitive type attributes such as strings, integers and floating-point numbers.These primitive data types are insufficient to represent the next-generation data calculated and processed by GPlates (Boyden et al., 2011), such as attribute values, which vary as functions of geological time, geological time scales, structured edit-history metadata, dynamic topological plate boundaries (Gurnis et al., 2012), and plate tectonic flow lines.The GPGIM has been defined in a plate tectonic context and represents a unified framework in which relevant types of geological data are attached to a common plate tectonic reference frame.It lays a foundation for the development of plate tectonic spatio-temporal data analysis and modelling.The GPGIM specifies the geologic, geophysical and paleo-geographic entities that GPlates simulate, the conceptual building-blocks that GPlates define, and the processing and simulation constraints to which GPlates adhere.The current GPGIM specification is available on the website of the EarthByte e-research projecthttp://www.earthbyte.org/Resources/GPGIM/public/.
The GPGIM is currently able to represent both legacy geological data as well as various next-generation data for stateof-the-art functionality being developed in GPlates, for example, the time-dependent geometries defined by geospatial topological networks (Gurnis et al., 2012).The GPML is a GML application schema, which implements GPGIM with XML schema in the data serialisation context.All the data processed by GPlates can be saved as GPML files which essentially are XML files with a GPML schema (Portele, 2007).The GPML schema contains all the structural information about the data contained in the XML files, therefore it is very easy for other software systems to understand and process these XML files.Furthermore, GPML is based on GML, an ISO, AS/NZS and CEN standard.This standardcompliant nature makes GPML even more approachable.And reusing GML elements alleviate the pain of building various XML parsers to retrieve information from XML files.

Feature
A key concept in GPGIM is that of the feature: an entity of interest that might be labelled or referenced.The concept of the feature is borrowed from traditional geographic information system (GIS) (Bonham-Carter, 1994), with the important distinction that a feature in GPGIM is a generalisation of the traditional GIS concept of a feature: while a GIS feature is an entity of interest that has a geographic location (and usually some sort of identity or label), in GPGIM, every entity of interest -whether spatial or non-spatial -is modelled as a derivation of feature (Fig. 1).It is worth mentioning that the feature concept in GPGIM is similar to the feature definition in (ISO 19101, 2002), in which the feature has been defined as abstraction of real world phenomena.All data and information modelled in GPGIM take the form of feature instances, or attributes of feature instances.
In philosophy, identity has been defined as a relation each thing bears to itself and to no other thing (Zalta, 2003).In GPGIM, each feature instance bears an identity attribute which can be used to uniquely identify the feature instance at anytime and anywhere.
The feature ID and revision ID in the base class reveal an important concept in GPGIM -feature reference.To enable persistent inter-instance feature reference, a universal unique identification of feature is needed.Any feature instance can be identified explicitly by feature ID and revision ID no matter in what form the feature instance is.For example, the feature instance could be an object instance in computer memory, serialised data file on hard disk or just a conceptual entity in a UML diagram, etc. (Larman, 2004).The feature ID remains the same during the whole life cycle of a feature.The revision ID changes each time a new revision of the feature has been created.With both feature ID and revision ID, a specific revision of a feature instance can be referenced if desired.The feature reference is a pre-requisite for the featureoriented data-structures that enable GPGIM to represent and process topological geometries and deforming meshes.
In contrast to some geospatial data formats, in which data are classified primarily by geometry (as point-data, line-data, etc.), the type of a feature in GPGIM is independent of the geometries which the feature contains, determined instead by the geological, geophysical or geographic characteristics of the entity that the feature represents.

Reconstruction features
The reconstruction feature is an integral part of plate-tectonic reconstruction process.It represents the model that determines the plate hierarchy and the history of plate-tectonic reconstructions, by means of finite rotations.In this case the geographic location associated with the rotation data is a rotation pole (Greiner, 1999).
In classical plate tectonics, the tectonic plates are assumed to be internally rigid (Müller et al., 2008) and are in constant motion with respect to one another.The relative motion of two plates can be described by finite rotations.And Euler's rotation theorem states (Euler and Sten, 2012): "In whatever way a sphere is turned about its centre, it is always possible to assign a diameter, whose direction in the translated state agrees with that of the initial state." The intersection of the axis passing through the centre of the model sphere with the sphere's surface is the Euler pole.A rotation by a given angle around a fixed Euler pole is a finite rotation.A finite rotation that displaces a plate from its present-day position to its reconstructed position at some instant in the past is called a total reconstruction pole (Cox and Hart, 1986).
The TotalReconstructionSequence (Fig. 2), as a generalisation of the ReconstructionFeature, contains a sequence of total reconstruction poles, which have been defined to describe the plate's motion through time by sampling its relative displacement at key instants in the past.Each sampling is represented by a TimeSample < FiniteRotation > instance (Fig. 2).The total reconstruction poles for any given time can be calculated by interpolation.Cox and Hart have described the mathematical rules in (Cox and Hart, 1986).Using finite rotations is only one of the possible approaches to model plates-tectonic reconstructions.GPGIM is an open model; new derivations of ReconstructionFeature can be defined to introduce new reconstruction methods as needed.

Reconstructable feature
The most common features in GPGIM are reconstructable features.The reconstructable features can be reconstructed to different paleo-geographic positions at different geological times, according to the plate-motion model.A reconstructable feature possesses special properties which enable it to be associated with the motion of a tectonic plate -either a reconstruction plate ID or a reconstruction method for more complex forms of reconstruction.The geometries in a ReconstructableFeature are stored in present-day coordinates.The reconstructed geometries are calculated at runtime when they are needed.At present, more than 50 types of ReconstructableFeature are defined in GPGIM.There are four main subcategories of ReconstructableFeature (Fig. 3).

Tangible feature
The tangible feature is a type of feature that abstracts geographic entities.For example, the seamount feature, which is derived from TangibleFeature directly, represents a volcanic cone rising from the ocean seafloor that does not reach to the sea surface (Wessel, 2009).The GPGIM contains a comprehensive collection of tangible features (Fig. 4).Each feature has been modelled carefully by specifying a unique property set, which represents the geographical characteristics of the feature.
Thus, there are as many different types of features in the GPGIM as types of entities represented, ensuring that the feature which models an entity has exactly the properties it needs, no more and no less.The naming of the feature types follows geological nomenclature, to maximise the likelihood that a geologist will understand what a feature represents, when presented with the feature type.For example, an ocean floor isochron is represented by the Isochron feature; a subduction zone by the Subduction Zone feature; and a dynamic topography field by the Dynamic Topography feature.

Artificial features
In contrast to TangibleFeature, the ArtificialFeature type deals with geometries that have been arbitrarily created, or possibly derived from a TangibleFeature that no longer exists.
For example, InferredPaleoBoundary is a plate boundary that is assumed to have existed, even though it has long gone since it has been subducted at a deep sea trench and recycled into the convecting mantle.
There are four type of features in this category (Fig. 5):

Topological boundary guide
Different from a TopologicalClosedPlateBoundary, which is defined exclusively via references to ReconstructableFeatures' geometry, the TopologicalBoundaryGuide feature type contains user-defined geometry (lines or points) that defines the sections of a TopologicalPolygon.

Unclassified feature
Sometimes, data from other research domains or communities need to be imported into GPGIM to collaborate.This UnclassifiedFeature type allows alien data to be handled in GPGIM decently and therefore makes GPGIM more powerful and flexible.

Topological feature
A topological feature is a feature that lacks a reconstructable geometry of its own.A topological geometry can be calculated after each reconstruction by linking together the reconstructed geometries of other referenced feature instances.Topological features are some of the most exciting nextgeneration data in GPGIM.The most prominent topological feature in GPGIM at present is the Topological Closed Plate Boundary, which as its name suggests is a feature whose topological geometry represents a closed plate boundary.This closed plate boundary is a time-dependent shapechanging polygon (also known as a Continuously-Closing Polygon -CCP) that encloses a region corresponding in some practical sense to a tectonic plate (Gurnis et al., 2012).
In the future, topological features will also be used to model plate deformations in GPGIM.

Instantaneous features
In contrast to the dynamic reconstructable features are instantaneous features, static snapshots of entities at an instant in geological time.These features correspond most closely to the static geospatial features of traditional GIS software.
Instantaneous features are intended primarily as an interchange format that enables reconstructed features to be exported to, or imported from, other geospatial information models, which lack the ability to process plate-tectonic reconstructions or time-variance.An instantaneous feature preserves the reconstructed geometries and the current values of any time-dependent property values in a form that does not require any special processing by the recipient information model.A time-sequence of instantaneous features can be integrated into a single reconstructable feature with a continuous history.Every type of reconstructable feature in the GPGIM has an instantaneous analogue.

Feature property
Another important concept in the GPGIM is the feature property.It is inherited from GML that implements the OGC Reference Model, which in turn is based on (ISO 19101, 2002).Each concrete feature type has been associated with a group of unique feature properties, which means that each feature instance must contain and only contain a set of predefined properties, unless the property is optional.By enforcing this, GPGIM can guarantee that the data loaded into the model is valid.Each feature property contains a property name and a time dependent value (Fig. 6).

Property name
Instead of allowing arbitrary strings as property names, GPGIM defines a finite set of property names.Only these predefined names are considered as valid.And the property names carry more information than just identification.For example, the name reconstructionPlateId indicates the data contained in this property is a plate ID, which can be used in plate-tectonic reconstruction.
The property name implies a real property type, which in turn affects the behaviours of the feature containing this property.For instance, to determine if a feature is reconstructable, we search for either of the two properties recon-structionPlateId and reconstructionMethod. Any feature instance that has either of these properties can be assumed to be reconstructable.

Property value
Just as features have types, which determine the properties contained within a particular feature instance, so, too, do property-values have types, which determine the range of values that may be contained within a given property.Being a GML application schema (Portele, 2007), the GPGIM incorporates the property-value types defined by GML, which in turn incorporates the XML primitive data types.
The XML primitive data types include, but are not limited to, xs:string, xs:boolean, xs:integer and xs:double.For example, a property-value of type xs:integer contains only a single integer value, while a property-value of type xs:string contains a string of text.On top of these primitive data types, GML defines additional property-value types as building-blocks for GIS data, such as geometries (gml:Point, gml:MultiPoint, gml:LineString and gml:Polygon) and temporal primitives (gml:TimeInstant and gml:TimePeriod) (Bray et al., 1997;Portele, 2007).
Finally, the GPGIM defines a number of GPlates-specific property-value types.We have classified these GPlatesspecific types into four main categories, namely constant value, irregular sampling, piecewise aggregation and math function of time (Fig. 7).

-Time-dependent property values:
It is a fundamental assumption built into the GPGIM that the measurable characteristics of geological entities can change over geological time.In a feature-oriented model, this is represented as variation in the feature property values over geological time.The TimeDepen-dentPropertyValue is defined as the root of inheritance hierarchy to represent the concept of this assumption.All property values in GPGIM are derivations of timedependent value.
-Constant value: The constant value type is the simplest form of variation of a time-dependent value.It represents a type of value that remains constant through geological time.
-Irregular sampling: The irregular sampling represents a sequence of timesamples, each of which was sampled at some geological time-instant.An irregular sampling may also possess an optional interpolation function, to interpolate between the values of adjacent time-samples.
-Piecewise aggregation: The piecewise aggregation provides a way to compose different property value types to maximise the flexibility of GPGIM.The piecewise aggregation property value contains a sequence of zero or more time-windows, each of which is defined for a particular geological timeinterval and contains an instance of time-dependent property value type. -MathFunctionOfTime: The MathFunctionOfTime represents a variation algorithm that may be expressed mathematically as a function of geological time.The property value is calculated at a given time at runtime.

The GPlates Markup Language (GPML)
With the rapid development of Internet, collaboration and resource sharing are playing a more and more important part in the success of academic research.How to exchange and share scientific data efficiently is arising as a key issue.To alleviate the disorder brought by heterogeneous geospatial data, GML was conceived (Portele, 2007).GML is a modelling language for geographic systems, as well as an encoding specification for the serialisation of GML data to XML documents.GML is intended to provide a standardised, interoperable base for defining application-specific geospatial languages.The GML specification is defined and maintained by the Open Geospatial Consortium (OGC ® ).GML was adopted as an ISO standard in 2007 (Portele, 2007).GML defines information model building-blocks including: geometries; temporal primitives (time-instants and time-periods); units of measure; and coordinate reference systems (Lake, 2005).These building-blocks ease the process of interoperability both by providing recognisable, well-defined data components and by standardising the XML representations of these components, to simplify the parsing of foreign data.
The GPML is a GML application schema that is designed to describe and exchange plate-tectonic data within a community of interest.It enables collaboration, data sharing and grid computing in plate-tectonic domain.GPML is an extension to GML; it adopts the feature-oriented modelling approach and builds upon the GML primitives to maximise interoperability with other GML based application schemas, such as GeoSciML (Sen and Duffy, 2005).Another significant advantage of being based on GML is that the data described by GPML can be transmitted across Web Feature Service (WFS) networks by default (Peng and Zhang, 2004).It makes the data sharing and transmission easy and fuels a wide range of collaboration and grid computing.The WFS specification (Vretanos, 2005) defines a web service interface to query, transmit and manipulate geospatial data across the Internet or other networks.GML is the default encoding for the transmission of geospatial data to and from WFS servers.

Feature and feature collection
Every concrete GPML feature is derived from GPML Ab-stractFeature (Fig. 8), which in turn derives from GML Ab-stractFeatureType.The substitutionGroup attribute in XML element AbstractFeature indicates that all GPML abstract features are substitutable for the GML AbstractFeature, which is defined in GML specification.The inheritance hierarchy and the substitutionGroup attribute are essential to serving GPML data through WFS networks.
The web feature service provides data by responding to GetFeature requests from client.The root element of the response must be a wfs:FeatureCollection (Fig. 9) (Vretanos, 2005).The wfs:FeatureCollection derives from gml:AbstractFeatureCollection.
From the definition of gml: AbstractFeatureCollection (Fig. 10) and gml:featureMember (Fig. 11) we can conclude that, essentially, the abstract feature collection of gml, as the name suggests, is a collection of gml:AbstractFeature (Portele, 2007).
All GPML features are derived from GML AbstractFeature and, therefore, any GPML feature instance is a valid fea-tureMember in a FeatureCollection, which is prepared by a WFS server and sent back to client as a response to a previous GetFeature request (Vretanos, 2005).
Figure 12 shows an example of concrete GPML feature definition.The IsochronType is derived from TangibleFeatureType, which in turn is derived from gpml:AbstractFeature. The conjugates in IsochronType are feature references, which link to the pairs or twins of this isochron feature.The conjugatePlateIds are plate IDs, either general-purpose plate ID or specific plate ID, for the motion of a cluster of features, which have the pairs or twins of this isochron.The unclassifiedGeometry is an optional geometry whose purpose is unclear or not given.The centerLineOf actually defines the geometry of this isochron.
More than 100 concrete feature types have been defined in GMPL.A full definition of all GPML feature types can be found at this URL -http://www.earthbyte.org/Resources/GPGIM/public/gpml.xsd.

Property value types
The property value types are defined as a direct mapping from GPGIM into XML schema and all of them are derived from gpml:TimeDependentPropertyValueType.The XML schema for Constant (Fig. 13), Irregular Sampling (Fig. 14), and Piecewise Aggregation (Fig. 15), value types are straightforward.The fourth value type, MathFunctionOf-Time, has yet to be implemented.

Geometry
The geometries in GPML reference GML geometries directly.By using building blocks from GML, GPML geometries can be parsed and processed easily by other GML applications.In GPML, geometries are not abstract geometrical shapes anymore; they are a part of the features' characteristics.

The centerLineof geometry
The centerLineof geometry represents the centreline of a geological feature.It is generally using gml:LineString, with the possibility of gml:MultiGeometry or gml:MultiCurve (Fig. 16).

The OutlineOf geometry
The outlineOf geometry specifies the closed or partial outline of a geological feature.The possible geometry types are gml:MultiGeometry, gml:LineString and gml:Polygon (Fig. 17).

The errorBounds geometry
The errorBounds geometry defines the error boundary of a geological feature.It can be anything substitutable for gml:AbstractGeometry (Fig. 18).

The boundary geometry
The boundary geometry represents the polygon boundary of a geological feature (Fig. 19).

The position
The position is a point in latitude and longitude indicating the physical location of a geological feature (Fig. 20).

The unclassified geometry
The unclassifiedGeometry is a geometry which has been defined for the feature where the purpose is unclear or not given.The unclassified geometry type can be used to represent the geometries imported from other domains of discourse (Fig. 21).

Implementation in GPlates
GPlates is an open source software application, which is developed by an international collaboration led by the Earthbyte group at the University of Sydney, and also including the California Institute of Technology and Geological Survey of Norway.It implements GPGIM and uses GPML to serialise and exchange data.Being well-equipped with GPGIM and GPML makes GPlates a state-of-the-art scientific research and teaching assistance tool with outstanding capabilities of calculation and interactive visualisation and manipulation of plate-tectonic reconstructions through geological time and deep-time data mining/analysis (Williams et al., 2012).

Reconstruction tree
In GPGIM, the ReconstructionFeature represents the concept of an entity that contains all information pertaining to the reconstruction process.In the real world, the concept is embodied as a list of finite rotations in a rotation file.Figure 22 shows a snippet of rotation file.Each line in rotation file represents a finite rotation of moving plate relative to the fixed plate.The first column specifies the moving plate id.The second column is geological time in Million Years (Ma).The following three columns define the latitude and longitude of Euler pole and rotation angle.The sixth column is fixed plate id which is followed by comments.When a rotation file is loaded in GPlates, a set of instances of TotalReconstructionSequenece, which is a derivation of ReconstructionFeature, are created.And they will be used to calculate reconstruction tree at each given geological time.
Figure 23 shows a small subset of reconstruction tree created from Global EarthByte GPlates Rotation file.The following steps show how to calculate the total reconstruction poles of a given plate by using the reconstruction tree at a given time:   1. Locate the plate ID in the reconstruction tree at given time.
2. Calculate the rotation of moving plate relative to its parent (fixed plate).
3. Move upwards along the tree to calculate the total reconstruction poles of its parent.
4. Repeat the step 2 and 3 until we reach the root of reconstruction tree.
5. Compose the rotation relative to its parent and its parent's total reconstruction poles.
Figure 24 shows a screenshot of GPlates reconstruction tree dialog, which shows the finite rotations calculated from Global EarthByte GPlates Rotation file.

Reconstruction
Once the reconstruction tree has been built, the reconstruction process is very straightforward.The component diagram (Fig. 25) illustrates the process.
1. Reconstruction engine gets reconstructable features from feature store.
2. Reconstruction engine acquires the reconstruction tree at given time from reconstruction features.
3. Reconstruction engine rotates the geometries of each reconstructable feature according the total reconstruction poles calculated from the reconstruction tree base on the feature's plate ID. 4. Reconstruction engine returns the reconstructed features at given time to client.

Non-standard reconstruction methods
Most reconstructable features in GPlates are reconstructed by applying a rotation corresponding to the total reconstruction pole for that feature's reconstructionPlateId property (Sect.2.1.2).GPlates also allows more complex reconstructions to be performed where for example the present day geometry is reconstructed according to some more complex function of several plate ids, times, or other user-provided information.
An example of such a feature is the gpml:flow line, which is used to represent a synthetic flow line.Synthetic flow lines represent the movement of material away from a spreading centre and can be used to test reconstruction models by comparing with fracture zones.Flow lines are generated using the half stage rotations between adjacent plates at a series of user-provided reconstruction times.To create a flow line feature in GPlates, the user provides a seed point representing a location at a spreading centre.Rather than providing a single reconstructionPlateId, the user provides the plate IDs of the plates on either side of the spreading centre, and a series of times; a series of half stage rotations are generated for adjacent times in this series.
At any given reconstruction time, the flow line is generated from the seed point by applying the series of half stage rotations to produce a series of points.These points (nodes) represent the flow line and are visualised on the GPlates globe as a polyline connecting the flow line nodes.

Dynamically typed feature
In GPGIM, more than 100 feature types have been modelled.Having a large number of feature types makes GPGIM very comprehensive in terms of describing real world geological entities, but on the other side it imposes a huge effort on software development.Instead of making each feature type a C++ class derived from a proper base class in the class hierarchy, a universal feature handle type has been introduced to handle all feature types and everything inside a feature has been defined as a feature property.It is the properties contained in a feature that determine the feature type, instead of the C++ class type.This approach essentially reflects a form  of dynamic typing, although C++ is a static typing language in nature.
The approach that the properties determine feature types is called duck typing, a style of dynamic typing, following James Whitcomb Riley who said (Heim, 2007): "When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck." For example, rather than defining a ReconstructableFeature base class in C++ and deriving all concrete reconstructable feature types from this base class (which would result in a large number of C++ classes since there are more than 50 reconstructable features defined in GPGIM), GPlates instead uses a universal FeatureHandle type and searches for two properties, reconstructionPlateId and reconstruction-Method, in the FeatureHandle object.Any feature instance that has either of these properties is deemed by GPlates as a reconstructable feature.
By using duck typing, the development cycle of GPlates has been shortened significantly and it makes GPlates more flexible when expanding GPGIM.The duck typing even makes it possible to change feature type over geological time by repopulating the feature's properties.
However, there is one weak point in duck typing.To some extent, the duck typing is error-prone.With a strongly and  statically typed approach, the compiler can help developers with type-checking; hence, it is much harder to use an unexpected object type (Abadi et al., 1995).But, with duck typing, it is the developers' responsibility to make sure the type correctness.It in turn demands GPlates developers to have a much wider understanding of source code in order to avoid type misuse.To alleviate the pain, a FeatureType object has been added into FeatureHandle class, so that in some circumstance, when the exact feature type is crucial, we can be assertive.

Plate-motion velocity and geographic coverage
Another interesting implementation in GPlates is mesh-node points and plate-motion velocity.The global mesh-node points distribute evenly on the surface of globe and all tectonic plates are floating above the imaginary mesh.The mesh-node points act as tracers for the plate motion.At each reconstruction time, GPlates can calculate an instantaneous angular velocity for each mesh-node point.The velocities of mesh-node points inside a tectonic plate can illustrate the plate-motion in an explicit way.Such a platemotion velocity field may be exported from GPlates to be used as a kinematic boundary-condition to a geodynamic mantle-convection model, such as CitcomS (Tan et al., 2007).This enables GPlates to link plate kinematics to geodynamic models.
The combination of mesh-node points and velocity data essentially is a coverage (ISO 19123, 2005).A coverage, as defined by OGC and ISO/TC 211, is a function describing the distribution of some set of properties over a spatial-temporal region.The three key aspects of a coverage are the domain set (a group of geometry or temporal objects), the range set (an array of arbitrary value objects), and a coverage function (a mapping of every object in the domain set to a value in the range set) (Portele, 2007).In our velocity case, the mesh-node points are the domain set of a plate-motion velocity field.The velocity data associated with mesh-node points can be deemed as the range set.And the mapping between mesh-node points and velocity is the coverage function.

Rasters
GPlates supports the reconstruction and visualisation of geo-referenced raster image data.Also supported is raster data analysis (in the form of co-registration with point/polyline/polygon seed geometries).Both these applications of raster data are accelerated by commonly available desktop graphics hardware.A raster is typically a discrete sampling of a spatial function or coverage.A present-day raster consists of a single raster image typically representing geospatial data observations on the Earth's present-day surface.GPlates can also create time-dependent rasters by importing a time sequence of raster image files and assigning geological ages to each image.A raster (present-day or time-dependent) is reconstructed by attaching it to a tectonic polygon dataset -raster pixels inside the polygons then rotate with the polygons.This differs from reconstruction of regular geometries (points/polylines/polygons) whereby geometry partitioning, and plate ID assignment, occurs as a onceonly pre-process.This enables a raster to remain as a single, un-partitioned GPML feature and to be reconstructable without requiring a reconstruction plate ID property (and is why raster (Fig. 26) inherits from TimeVariantFeatureType instead of ReconstructableFeatureType).
In the geospatial domain many raster image formats (such as netCDF or GeoTIFF) contain not only multi-band raw image data but also information that can be used to correctly position and project a raster onto the Earth's surface, as well as  containing general raster metadata.GPlates uses the Geospatial Data Abstraction Library (GDAL), an open-source library for raster geospatial data formats, both for accessing raster data and for raster coordinate transformations.GPlates extracts the raster data into two separate file types.The raw image data is cached into a GPlates-specific multi-resolution, tiled file format for fast streaming access during raster visualisation (one cache file per raster band).The remaining raster image file information is mirrored into a raster GPML file enabling GPlates users to query a raster's coordinate reference system, geo-referencing and general raster metadata through the same query interface used for all existing GPlates feature types.The creation of the raster GPML file occurs during the raster import process whereby a link is established (and raster information imported) from the raster image file to the raster GPML file.For non-geospatial raster formats (that lack georeferencing) an affine transform can be manually entered by the user during the import process (with the default coverage being global).
Each raster GPML file contains a single GPML raster feature that uses a combination of GML and GPML properties and objects to encapsulate a time-dependent geospatial raster.A raster feature is conceptually a GML coverage feature that is essentially a regular feature with gml:domainSet and gml:rangeSet properties, which specify the positions and values of the coverage data.However, the schema content model, of raster (see Fig. 26), derives indirectly from AbstractFeatureType instead of gml:AbstractCoverageType -it cannot derive from both since multiple inheritance is not supported in GML.There are two reasons for doing this.The first is that all GPML features need to inherit properties found in AbstractFeatureType such as featureId.The second is GPlates support time-dependent rasters and therefore the rangeSet property needs to support a time sequence of image files (which requires the GPML timedependent property extension).However, in the future, the GPlates reconstruction engine could be used to serve reconstructed raster data via a Web Coverage Service (WCS).In this case raster will likely be required to derive indirectly from gml:AbstractCoverageType and we would then create a new AbstractCoverageType object that in turn derives from gml:AbstractCoverageType by restriction (to redefine the do-mainSet and rangeSet properties) and also add the necessary properties expected of a GPML feature such as featureId.However this is more of a schema change than an implementation change for GPlates.
The raster feature defines the time-dependent equivalents of gml:domainSet and gml:rangeSet (replacing the gml prefix with gpml) such that rangeSet is either a Constant-Value or a PiecewiseAggregation of gml:File objects.A similar pattern applies to domainSet with AbstractGeoRefer-enceType, although currently domainSet is always a Con-stantValue which means the same geo-referencing is used for all images in the time sequence (this might change in the future).AbstractGeoReferenceType is an abstract georeferencing type with the derivations AffineGeoReference-Type and GCPGeoReferenceType (see Fig. 27) depending on whether the raster image file (referenced by gml:File) stores geo-referencing as an affine transform or ground control points.
Each band in the raster is assigned a band name by the user during import into GPlates (stored in the bandNames property of the raster feature).Each raster has a group of metadata domains for the raster as a whole and for each raster band (for metadata that is specific to a particular band) (see Fig. 28).
Each metadata domain has a dictionary of name/value pairs containing the actual metadata.Each domain has a name except the default domain (see Fig. 29).For example, the NetCDF raster file format supports the SUBDATASETS domain which lists the multi-dimensional data arrays (known as NetCDF variables) contained within.The NetCDF has recently been formalised as an OGC standard (OGC, 2010).Typically the coordinate reference system is specified using the srsName attribute of a GML geometry object.This references a publicly available dictionary of spatial reference systems.The same usually applies to implicit geo-referencing geometries (substitutable as AbstractGeoReferenceType) enabling the coordinate reference system of geo-referenced coordinates to be specified.However, in the case of the raster feature, GPlates obtains the coordinate reference system directly from the linked raster image file (many geospatial raster formats support this).The coordinate reference system is, however, still mirrored in the raster feature so that GPlates users can query it.This is specified as the coordi-nateReferenceSystem property whose type is any GML type derived from gml:AbstractCoordinateReferenceSystem (see Fig. 26).GPlates uses the GDAL library to convert the coordinate reference system from Well Known Text (WKT) to GML.Since this property is only queried by the user, a discrepancy in conversion from WKT to GML (due to incomplete GML support) will not result in an incorrectly rendered raster in GPlates.

Spatio-temporal data analysis
The structured modelling of spatial data and associated metadata varying through time as provided by GPML and the GPGIM provides the underlying foundation necessary to develop formal data associations and statistical analysis/datamining tools.This makes it possible to cope with the substantial complexity of simultaneously analysing several datasets in which both spatial geometries and metadata properties vary through time.The data-modelling provides an abstraction to underlying properties and processes, resulting in simple, intuitive and extensible interfaces.Two components have been developed to allow for the extraction of spatio-temporal associations, namely a coregistration tool and data-mining infrastructure as presented in Landgrebe and Muller (2011).The co-registration tool allows associations between spatial datasets to be defined, abstracted from the feature types involved.Associations include spatial relationships in which distances between features can vary across time, and direct property coregistrations between datasets to assess how a particular property varies as a function of time.The co-registration tool allows users to recursively define a set of associations, resulting in time-series that encode specific dynamics of interest.A data-mining infrastructure has been interfaced with GPlates, using the open-source python-based Orange tool (Demsar et al., 2004).Orange provides a visual user interface that allows users to define a specific data-mining workflow using a library of predefined components.Importantly, a plug-in facility has allowed components specific to GPlates to be developed, thus providing all the necessary infrastructure to interface the GIS environment with powerful quantitative data analysis.

Interoperability
According to IEEE Glossary, interoperability is the ability of two or more systems or components to exchange information and to use the information that has been exchanged (Geraci et al., 1991).As an open and flexible software application, GPlates is interoperable with the most widely used GIS systems.Currently, GPlates is using property mapping technique to achieve interoperability between GPGIM and other information models.In the future, Python extensions will be introduced to handle data exchange between GPlates and varieties of systems.
GPlates utilises the GDAL geospatial data abstraction library and is hence capable of reading and writing a number different vector file formats.We will address the interoperability between GPlates and two of them, Esri shapefiles and the Generic Mapping Tools (GMT) in the following two sections.

Rotation files
Rotation files are, together with geospatial data, one of the two core data sets which turn GPlates from a geospatial information system into a plate tectonic modelling tool.In order to model plate motions over the geological time in GPlates, the user needs to load a rotation file (*.rot) in order to reconstruct plates to their past positions.Devised in the late 1980s by the PLATES plate tectonic mapping project at the Institute of Geophysics at the University of Texas (Austin), these structured ASCII plain-text files describe a rotation hierarchy of the Earth's tectonic plates as a set of relative Euler (rotation) poles (compare section Reconstruction Tree for details).The PLATES rotation file format requires only a set of 6 parameters (Moving plate ID, stage pole age, stage pole latitude, stage pole longitude, opening angle, and reference plate ID) and allows for a free form comment (Fig. 22).Since the inception of the PLATES project, this file format became the de-facto standard used by nearly all available plate tectonic modelling software packages (such as the commercially available PaleoGIS, http://www.paleogis.com/DotNetNuke/).While such a simple structured text format was appropriate at the time and with a very small user base, demands for a more structured and verbose approach have since significantly increased.One of the major drawbacks of the legacy PLATES format is the unstructured comment field and the associated loss of important metadata.These data are needed to properly quantify and verify rotation poles sequences in global plate models through the inclusion of reference data for published rotation poles, statistical information about the accuracy of any given stage pole (Hellinger, 1981), the geological time scale used to relate geological time to absolute time and other useful data such as modification authors.
Ensuring backward compatibility with legacy PLATES applications, we have invented a new, ASCII plain textbased rotation file format (*.grot) which builds upon the existing PLATES format to describe rotation trees but contains a new set of structured attribute:value tags for consistent and persistent handling of metadata and automated processing of these.The new rotation file format follows ideas of the GMT OGR format (http://www.gdal.org/ogr/drvgmt.html and http://www.soest.hawaii.edu/gmt5/gmt/pdf/GMT Docs.pdf),Dublin Core Metadata Initiative and the lightweight MultiMarkDown language (https: //en.wikipedia.org/wiki/MultiMarkdown).A detailed specification of the new rotation file format is presented in the Supplement 1.

Shapefile
The ESRI shapefile stores nontopological geometry and attribute information for the spatial features in a data set.Since its specifications have been released by ESRI, it has been widely accepted as a de-facto standard for geospatial vector data format and hence provides the interoperability among ESRI and other geospatial information systems.A shapefile consists of "a main file, an index file, and a dBASE ® table" (ESRI, 1998).The geospatial information is stored in the main file (*.shp), whereas feature attribute data area stored in the associated dBASE ® table (*.dbf), utilising the database management system (DBMS) for microcomputers (Bordwell, 1984).A one-to-one relationship is implicit between the attribute record and the associated shape record; however, the shapefile format does not have a formal information model associated with it.Shapefile attributes can be defined arbitrarily.As only recognised attributes can be accepted and processed by the GPGIM, we designed a property mapping mechanism to import shapefile attributes into GPGIM.
When loading a shapefile into GPlates, the user is presented with a wizard dialog, which allows user to specify how shapefile attributes will be mapped into GPlates model (Fig. 30).
When the mapping configuration has been specified by the user, an XML file (Fig. 31) is created to keep the setting in persistent storage so that these attributes can be mapped automatically next time.
Using an XML file to maintain a mapping configuration makes the shapefile not self-contained.Furthermore, there is no specification or standard to tell shapefile users in a plate tectonic context which shapefile attributes are crucial and which ones are optional.Therefore, a standard naming convention has been developed that regulates shapefile attributes for use with plate tectonic applications.The standard will make it easier to exchange data in shapefile format between plate tectonic software applications and define attribute names which map directly to the GPGIM (Fig. 32) and is outlined in detail in the Supplement 2.

Generic Mapping Tools (GMT) interoperability
The open-source Generic Mapping Tools, available from http://gmt.soest.hawaii.edu,(Wessel and Smith, 1998) are a set of command line programs to process (geospatial) data and generate high-quality Postscript output for publications.Since its inception in late 1980s it has become the de-facto standard tool in large parts of the global geophysics community when processing and disseminating data.GMT routines allow user to read and write ASCII based text files and binary data, such as netCDF.
Through the GDAL library, GPlates are capable of reading and writing GMT-standard xy(z) data files of various different feature collections.Through the GDAL library, GPlates can also write GMT-OGR standard files which encapsulate attribute data consistently and allow interoperability between different formats through the OGR2OGR tool (http://www.gdal.org/ogr2ogr).(Simons et al., 2006) is also a GML Application Schema.It is designed to transfer geologic data, with an emphasis on geologic maps.Since GeoSciML is associated with an information model, the structure of GeoSciML data is predefined.We use a group of XQuerys (Boag et al., 2010) to retrieve data of interest from GeoSciML data and import it into GPGIM (Fig. 33).

Conclusions
The GPGIM along with its implementation, GPML and GPlates software application represents a number of fundamental differences as compared to previous plate-tectonic reconstruction information models and software applications: -The use of a rigorous methodology to model geological, geophysical and paleo-geographic features in GPGIM and the GPML application schema, enabling GPlates to serve as a component in a larger GML-based infrastructure.
-Sophisticated information model building-blocks (such as time-dependent property values and inter-feature references) enable expression of advanced and abstract concepts.
-The paradigm shift of calculation templates for automated post-processing of reconstructions enables new data to be computed on the fly based upon plate motions.
Future GPGIM and GPML extensions will include: -Handling plate deformation; -Improve the interoperability with GeoSciML documents; -Reconstruct GeoSciML features alongside GPGIM features; -Develop the ability to read and write GML-based data, participating in a data grid as a WFS client.
With a flexible, expressive, interoperable information model, GPlates and the GPGIM represent the next generation of plate-tectonic information modelling and software -a generation in which plate-tectonic data and applications are an integrated visualisation and processing component within a data grid and computational grid; and a plate-tectonic reconstruction is no longer an isolated result, but a single stage in an adaptable workflow.

Fig. 4 .
Fig. 4. The complete list of tangible features in GPGIM.

Fig. 23 .
Fig. 23.An illustration of a reconstruction tree showing the relative rotation hierarchy of different tectonic plates to each other at a time instance in the past (AFR: Africa; ANT: Antarctica; AUS: Australia; MAD: Madagascar; SAM: South America).

Fig. 31 .
Fig. 31.The XML file keeping the mapping configuration.

Fig. 32 .
Fig. 32.Basic shapefile standard attribute names, types and description, which allow direct interoperability between GIS shapefiles and GPGIM.

Fig. 33 .
Fig. 33.Screenshot showing GPlates displaying geometries (green) defined in GeoSciML data, which is retrieved by Web Feature Service (WFS) from a Geosciences Australia website.The service base URL is http://www-a.ga.gov.au/geows/geologicunits/onegaus 2 5m/wfs.The WFS request is shown in the dialog (left).All GPlates layers are listed in the dialog (right).