Reduction of uncertainties through Data Model Integration (DMI)
Application of techniques for data model integration (DMI) are increasingly used in many fields of science, finance, economics, etc. Every day examples are improvement of geophysical model descriptions (flows, water levels, waves), improvements and optimization of daily weather forecasts, detection of errors in data series, on-line identification of stolen credit card use, detection of malfunctioning components in manufacturing processes. The one common element is the prior knowledge of the behaviour of a process in the form of an explicit model description, or a set of characteristic data. The second common element is a set of independent or new data. Neither the description of the behaviour and the data are 100% certain – they have uncertainties associated with them. If one has information on the (statistical) nature of these uncertainties, smart mathematical techniques can be used to combine these two information sources and generate new or improved information. As the examples show, this may be an improved model description (less uncertain), an improved forecast, detection of significant deviation from established patterns (faulty component, credit card use,…). In case of the former, we often speak of model calibration and calibration or parameter estimation techniques; in the latter, we speak of (sequential) data assimilation and data assimilation techniques.
2. Definition of data model integration (DMI)
A practical definition of data model integration (DMI) is the following “Data model integration is an automated structured combination of model and data by means of mathematical techniques to create a theoretically optimal combination of both by reducing the associated uncertainties in the one, the other, or in the information provided by the combination”. Since DMI techniques are being developed and used in many disciplines, other, similar definitions may be encountered.
3. Measures of agreement – Least squares norms
In geophysical science, DMI is commonly used for model improvement (“calibration”) and optimization of operational forecasts. Application of DMI essentially starts with choosing the parameters of interest that need to be combined and a quantitative measure that expresses the agreement between these parameters. Instead of agreement, we often also use the words “difference”, “disagreement”, “mismatch”, “misfit” or “error”. Least squares criteria or norms are often used measures for this, since they are symmetric and have favourable properties from a theoretical point of view. The key issue of using quantitative measures is that they are compact, quantitative, objective, reproducible, transferable, and are easy to use in automated evaluation procedures and software.
4. Role of uncertainties in models and data
Models are never “true”. Even the very best models provide schematised representations of the real world. Examples are flow models, models for transport and spreading, models for wave propagation, rainfall runoff models and morphological models. They are limited to the representation of those real world phenomena that are of specific practical interest, characterised by associated temporal and spatial scales of interest. In the derivation of these models all kind of simplifications and approximations have been applied. These are often formulated as “errors” or “uncertainties” in the model. These uncertainties occur in (1) the model concept as such, (2) in the various model parameters, (3) the driving forces, and (4) in the modelling result. Moreover, a model uncertainty of general nature is associated with (5) the representativity of model results for observed entities. Equally, field measurements or observations also suffer from errors or uncertainties. These may be the result of (1) equipment accuracy, (2) instrument drift, (3) equipment fouling or malfunctioning, (4) temporal and spatial sampling frequency, (5) data processing and interpretation, (6) spatial and temporal representativeness, and other. As a result of all this, mismatches of model results and observations are virtually unavoidable. Moreover, both sources of information involve errors in their estimate for the true state of the system. The errors in the model at one hand, and measurements on the other, can be of very different type, origin, and magnitude.
5 Combination using DMI techniques reduces the uncertainty
Depending on the DMI algorithms that are used, and/or correctness of assumptions, a combination of data and model with known uncertainty (in statistical sense) can lead to (statistically) optimal estimate for the system’s state. Such optimal estimates are achieved when the weights in the combination of model outcomes and measurements are based on the uncertainties in both. This is illustrated by a simple example, in which [math]Insert formula here[/math]is a model result for some system state variable at some spatial position and time, and the spread is its uncertainty. Similarly, and are the corresponding measured value and the uncertainty in the measurement. A (statistically) optimal combination of these two estimates leads to the estimate , with a spread that satisfies . Clearly the uncertainty in the combined estimate is less that the uncertainty in the individual estimates. This example reflects the essence of DMI and in applications of structured DMI techniques to real life numerical models (dealing with many grid points and state variables, complex and non-linear dynamics, high model computation times, etc.) the above principle is ‘merely’ generalised in an appropriate way.