Contextual
and Global Falsification of Scientific Models
An Integrated
Theory of Epistemic Validity
Abstract
Classical Popperian falsification
theory operates with a binary image of scientific rationality
according to which a single contradictory observation is sufficient
to reject a theory (Popper [1934] 1959). Modern scientific practice
systematically contradicts this idealized view: central models such
as Newtonian mechanics, classical thermodynamics, nonrelativistic
quantum mechanics, or contemporary climate models are empirically
falsified in certain subdomains yet remain epistemically
indispensable.
This paper develops an integrative framework of
model validity that systematically unifies key insights from Popper,
Kuhn, Lakatos, da Costa/French, and contemporary model theory into a
single model of rational scientific practice. The theory
distinguishes global and contextual falsification, introduces the
concept of an epistemic enabling space E(t), and provides a formally
specified structure for domain operations. The epistemic enabling
space forms the central theoretical contribution of the paper, as it
makes the methodological, technical, and institutional conditions of
model choice explicit.
“Falsification” is not treated as a
purely logical truth criterion but as a rational–pragmatic
mechanism of model assessment that integrates approximate truth,
explanatory power, and model costs. This allows us to explain why
models remain stable despite partial falsifications and under which
conditions genuine model elimination occurs. In addition, a
decision-theoretic logic of epistemic optimality is formalized that
combines approximate truth within specific domains, explanatory
power, and model costs into a unified utility function. A case study
on climate models demonstrates how ensemble methods,
parameterizations, and Bayesian updating lead to systematic domain
refinement.
The proposed theory is primarily descriptive: it
reconstructs actual model practice in modern sciences and provides an
explicitly formulated and systematically integrated account of model
structures that classical Popperian views consider only partially.
This makes idealized, simulation-based, and domain-specific models
epistemically precise and scientifically tractable.
Abstract 1
Introduction 3
State of Research 4
2.1 Popper: Falsification as a
Binary Ideal 4
2.2 Kuhn: Anomalies Without Elimination 4
2.3
Lakatos: Research Programmes 4
2.4 Model Theory 4
2.5
Approximate Truth 4
2.6 Research Gap 5
2.7 Contribution of
This Paper 6
Basic Concepts 7
Criteria of Modern Model Assessment Within the Proposed Framework 10
Proposition: Two-Level Structure of Falsification 12
When Models Disappear 13
Why Models Persist 14
7.1 Robustness Through Model
Families 14
7.2 Contexts of Use 14
Case Study: Climate Models 15
8.1 Climate Models as
Ensemble Structures 15
8.2 Parameterization as an Epistemic
Operation 15
8.3 Bayesian Updating and Adaptive Modelling
16
8.4 Domain Structure 16
8.5 Miniaturized Formal
Illustration 16
8.6 Consequences for Epistemic Status 17
Approximate Truth 18
Why an Integrated Theory Has Been Missing 19
10.1
Historical Reasons 19
10.2 Logical Reasons 19
10.3
Institutional Reasons 19
Epistemic Enabling Space 20
11.1 Dynamics of the
Epistemic Enabling Space 20
11.2 Structure of the Epistemic
Enabling Space E(t) 21
Decision Logic of Model Choice 22
12.1 Structure of the
Utility Function 22
12.2 Scaling and Empirical Grounding of the
Utility Function 24
Relation to Scientific Realism 25
Conclusion 26
References 27
Popper’s falsification theory ([1934] 1959) shaped the public
and academic image of scientific rationality for decades. It
postulates that theories must be rejected as soon as one of their
predictions fails empirically. The practice of the natural and social
sciences, however, presents a different picture. This paper develops
an integrated theory of scientific model validity that unifies
central insights from Popper, Kuhn, Lakatos, da Costa/French, and
contemporary model theory into a common structure of rational model
assessment. Many models are idealized, only partially representative,
empirically falsified in subdomains, and yet epistemically
stable.
Examples include Newtonian mechanics, which fails at
relativistic velocities (Einstein 1905), classical thermodynamics,
which is microscopically imprecise, or climate models, which require
continuous recalibration (Oreskes et al. 1994). At the same time,
some models have disappeared entirely, such as the phlogiston theory
or the classical aether.
The core question is: under what
conditions does falsification lead to the elimination of a model, and
when does a model remain epistemically stable despite partial
falsification?
This paper develops an integrated theoretical
framework that explicitly formalizes this question within a unified
model. The approach distinguishes global and contextual
falsification, describes the epistemic enabling space E(t), and
integrates approximate truth as a graded, domain-specific assessment
measure. Furthermore, it shows that falsification is not a purely
logical truth operation but a rationally reconstructible decision
within a real epistemic enabling space. The theory is primarily
descriptive: it reconstructs actual model practice in modern sciences
and is illustrated using climate models.
The utility function
U(M, D) introduced below is to be understood primarily as
reconstructive: it describes how model choice in real scientific
practice occurs when approximate truth, explanatory power, and model
costs are weighed against each other. At the same time, the structure
has a normative reading, as it makes the conditions of rational model
assessment explicit. The theory thus serves both as a descriptive
reconstruction of model practice and as a framework of rational
epistemic optimality.
Popper ([1934] 1959) defines falsification as a binary mechanism: a single contradictory observation refutes a theory. This model presupposes global claims of validity, unambiguous theory–observation mappings, and non-idealized theories—conditions that are rarely met in modern sciences. Idealized, approximate, and simulation-based models, in particular, do not fit this binary structure. Although the tension between Popper’s ideal and actual model practice has been widely discussed in the literature (Hacking 1983; Cartwright 1983), it has never been translated into a systematic theory of contextual falsification.
Kuhn (1962) shows that scientific paradigms can survive anomalies. However, he does not provide a finely structured theory of model-specific falsification. The main gap is that Kuhn does not specify how anomalies distribute across subdomains or why they undermine some models but leave others essentially unaffected. While the literature emphasizes Kuhn’s historical and sociological perspective, a formally precise model theory of subdomain-structured falsification is missing (cf. Lakatos 1970; Weisberg 2013).
Lakatos (1970) explains stability through a “hard core”, but without determining when models actually disappear. It also remains unclear how competing programmes should be compared in detail, especially when both are partially but not globally falsified. This reveals that the programme structure of Lakatos does not provide a formal instrument for assessing domain-specific approximate truth.
Model-theoretic approaches (Cartwright 1983; Weisberg 2013; Morgan & Morrison 1999) emphasize the idealized and domain-specific nature of scientific models but do not provide a theory of how falsification operates within such structures or how models are systematically compared. This concerns especially the handling of idealized simulations, for which neither Popper nor classical model theory offers a suitable falsification criterion (Morgan & Morrison 1999).
Niiniluoto (1987, 1998) and Oddie (1986) offer graded concepts of truth but do not explain why some models persist despite partial falsifications, while others disappear entirely. What is missing in particular is the link between approximate truth, contexts of use, model costs, and institutional stabilization. Without this, it remains unclear how graded truthlikeness is supposed to operate within real model portfolios.
To the best of current
knowledge, there is no systematically developed and academically
established integrated theory that simultaneously explains:
•
when falsification operates globally,
• when it remains
contextual,
• how approximate truth functions within
domains,
• how model costs and explanatory power are
rationally weighted,
• how the epistemic enabling space E(t)
determines the set of real model alternatives.
Individual
aspects of these issues have been treated separately in the
literature—for example in work on approximate truth, on research
programmes, or in model theory. What is largely missing is an
explicit, formally structured reconstruction that unifies these
dimensions within a single framework of model assessment. This is
particularly relevant for data-intensive, simulation-based sciences
in which contextual falsification and model portfolios have become
standard practice.
This paper makes four interconnected contributions to the philosophy of science and model theory. It presents an integrative framework that consolidates established insights and translates them into an explicitly formalized structure.
The concept of the epistemic enabling space
E(t).
E(t) is explicated as a dynamic structure of
methodological, technical, and institutional conditions that
determines which models can be formulated, evaluated, and
stabilized. This models a dimension that classical Popperian
falsification either leaves implicit or does not consider at all.
The two-level structure of falsification.
The
paper draws a strict distinction between contextual and global
falsification. Global falsification is no longer understood as a
binary truth break, but as a condition in which a model is no longer
epistemically competitive in any domain D within the epistemic
enabling space E(t). This structure systematically reconstructs the
stability of partially falsified models.
The integrated utility function U(M, D, t).
With
the utility function
U(M, D, t) = α · AT(M, D, t) + β ·
EK(M, D, t) − γ · C(M, D, t),
the paper proposes a
framework that unifies approximate truth, explanatory power, and
model costs into a single assessment structure explicitly anchored
in E(t). The function is formulated to remain compatible with
established quality measures and selection criteria (fit metrics,
information criteria, complexity measures) in individual
disciplines.
Application to simulation-based model
portfolios.
Using climate models as an example, the
paper shows how ensemble structures, parameterizations, and Bayesian
updating can be reconstructed within the proposed framework as cases
of contextual falsification and domain-specific approximate truth.
This aligns the theory directly with a central class of modern,
data- and simulation-intensive model practices.
Overall, the contribution of this paper lies in providing an explicitly formulated, formally structured, and empirically applicable framework that systematically integrates classical insights from falsification theory, model theory, and truthlikeness concepts, while offering a clearly codified structure for evaluating and understanding the dynamics of scientific models.
The following definitions clarify the terms introduced above.
Notation (overview)
D(M) – domain of
applicability of a model
D₁, D₂ – subdomains of the domain
of applicability
AT(M, D) – approximate truth of the model in
domain D
EK(M, D) – explanatory power
C(M) – model
costs
U(M, D) – domain-specific utility function
E(t) –
epistemic enabling space
R(M, D₂) – restriction of the model
to subdomain D₂
Definition 1: Model (M)
A model is an
idealized mathematical or conceptual structure for representing
defined aspects of a target system under specific conditions
(Cartwright 1999).
Definition 2: Domain of Applicability D(M)
The
set of all conditions under which a model yields reliable,
explanatorily adequate, or functionally optimal results.
Definition 3: Subdomains D₁, D₂
D(M) can
be decomposed into subdomains: D₁ ⊆ D(M), D₂ ⊆ D(M), with D₁
∪ D₂ = D(M). Such decompositions often result from scientific
revision.
For falsification, we additionally stipulate:
•
D₁ denotes the subdomain in which M remains epistemically viable;
•
D₂ denotes the subdomain in which M is no longer acceptable.
Definition 4: Epistemic
Optimality
A model is epistemically optimal in a domain
D if, given all alternatives available within the enabling space
E(t), it exhibits a favourable balance of approximate truth,
explanatory power, and model costs. Epistemic optimality is
operationalized by the domain-specific utility function U(M,
D).
Model costs C(M, D) comprise three dimensions:
(1)
epistemic costs (e.g., error sensitivity, variance, uncertainty
breadth of predictions, interpretability/transparency, quality of
fit),
(2) technical costs (computational effort, data
requirements, implementation complexity),
(3) institutional
costs (degree of standardization, availability of
infrastructure).
Some cost components are global (e.g., basic
implementation or infrastructure costs), others domain-specific
(e.g., data requirements or numerical stability in particular
domains). For the utility function, these are captured jointly in
C(M, D, t).
Definition 5: Contextual Falsification
An
observation O results in contextual falsification if it shows that
there exists a subdomain D₂ in which M is no longer epistemically
acceptable, while at least one subdomain D₁ remains in which M is
still optimal.
Definition 6: Global Falsification
A model M
is globally falsified at time t in scientific development if it is
epistemically optimal or competitive in no relevant domain D within
the epistemic enabling space E(t).
Epistemic optimality holds in
a domain D when U(M, D, t) is greater than or equal to the utility
value of all alternative models available in E(t).
Epistemic
competitiveness holds when U(M, D, t) is at most within a
context-dependent tolerance range ε below the utility value of the
best available alternative U(M*, D, t).
Formally, global
falsification can be characterized as follows: for all relevant
domains D, there exists at least one alternative model M* such
that
U(M, D, t) + ε < U(M*, D, t).
The tolerance
parameter ε may be set in a discipline- and context-dependent manner
and reflects the fact that models with slightly lower U-values may
remain competitive in practice.
Definition 7: Epistemic Enabling Space E(t)
The
space of methodological, mathematical, institutional, and technical
conditions that determines:
• which models can be
formulated,
• which data are available,
• which
idealizations are permissible,
• which models can be
stabilized,
• how model costs C(M) are structured in the first
place.
E(t) is dynamic: technological innovations, data
availability, and institutional norms continuously reshape the space
of possible models.
Definition 8:
Approximate Truth
Approximate truth is the degree of a
model’s similarity to relevant features of a target system relative
to a domain D:
AT(M, D).
Formally, approximate truth can be
expressed using a similarity metric:
AT(M, D) = Σᵢ wᵢ ·
sim(M, Sᵢ, D).
Here, approximate truth is not a truth
criterion in the Popperian sense but a graded similarity metric
within specific domains.
In practical model assessment,
approximate truth is always considered together with uncertainty
estimates of model predictions. High predictive uncertainty reduces
the effective epistemic value of a model even when mean fit is high,
and is therefore reflected both in AT(M, D) and in the epistemic cost
component of C(M).
Definition 9: Domain Operators
•
Restriction: R(M, D₂) = M|₍D₂₎
• Domain difference:
D(M) − D₂ = D₁
• Decomposition: Z(M) = {D₁, D₂,
…}
These operators describe model-theoretic modifications of
the domain of applicability.
On Temporal Dynamics
Since the epistemic
enabling space E(t) changes over the course of scientific
development, the assessment values AT(M, D), EK(M, D), and the costs
C(M) are, in principle, time-dependent. This is denoted explicitly by
a time index t where the dynamics of E(t) are foregrounded:
U(M,
D, t) = α · AT(M, D, t) + β · EK(M, D, t) − γ · C(M, D,
t).
Where it would hinder readability, the index t is omitted;
it remains conceptually implicit throughout.
Modern scientific model assessment comprises a clearly identifiable set of epistemic, technical, and institutional criteria that jointly determine whether a model within a domain D is retained, recalibrated, or abandoned. The framework developed here makes these criteria explicit and systematically assigns them to the components of the utility function U(M, D, t) and the epistemic enabling space E(t). The following catalogue aims to cover, within the scope of this framework, the epistemically relevant assessment criteria as comprehensively as possible, without excluding the possibility that individual disciplines may develop additional, more fine-grained subcriteria.
(1) Approximation and structural similarity
This
includes domain-specific approximate truth AT(M, D), understood as
graded similarity between model structures and relevant system
states. It comprises quality of fit, structural similarity, and
stability of predictions across subdomains.
(2) Explanatory power (EK)
A model’s
explanatory power determines the breadth, depth, and counterfactual
sensitivity of its explanations. It covers the range of phenomena
accounted for as well as the model’s capacity to explain new or
derived facts.
(3) Model uncertainty
Model uncertainty
includes variance, sensitivity, and stability of predictions within a
domain. Large uncertainty intervals reduce a model’s effective
epistemic value, even when average fit is high. Uncertainty affects
both AT(M, D) and the epistemic component of C(M).
(4) Interpretability and transparency
Models
differ in structural accessibility, the intelligibility of their
mechanisms, testability, and the diagnosability of error sources. Low
interpretability increases epistemic costs and reduces explanatory
usefulness.
(5) Technical and computational costs
These
include computational effort, energy consumption, numerical
complexity, data requirements, and implementability. Technical costs
form a central component of C(M) and significantly influence the
rationality of model choice in data-intensive sciences.
(6) Institutional and
infrastructural conditions
Models are stabilized by
norms, software ecosystems, training structures, data standards, and
organizational practices. These factors reside within the epistemic
enabling space E(t) and explain why models often persist even when
they become suboptimal in particular domains.
(7) Availability of epistemic alternatives
The
set of alternative models available in the enabling space E(t)
determines whether contextual falsification leads to model revision
or global elimination. Models persist as long as they remain
epistemically optimal in at least one domain.
This catalogue clarifies which factors carry operational weight in the utility function U(M, D) and in the enabling space E(t). At the same time, it provides a comprehensive overview of the assessment dimensions that shape modern scientific model practice. The integrated framework thereby becomes both theoretically complete and directly empirically applicable.
In the terminology proposed here, “falsification” is no longer a binary truth decision about M as a whole, but a systematically reconstructible shift in the utility values U(M, D, t) across domains, which under specific conditions leads to the complete abandonment of M.
For any model M, the following holds:
As long as there exists a subdomain D₁ in which M is
epistemically optimal according to U(M, D₁), falsification leads
to a restriction of the domain of applicability:
D(M) → D(M)'
= D₁.
M is globally falsified if and only if there is no domain in which M is epistemically optimal or competitive according to U(M, D).
Approximate truth AT(M, D), explanatory power EK(M, D), and model
costs C(M) determine the assessment of epistemic optimality; E(t)
determines the set of admissible alternatives.
Formally, the
transition from contextual to global falsification can be expressed
as the transition
from
∃D₁: U(M, D₁, t) ≥ U(M*, D₁,
t)
to
∀D: U(M, D, t) + ε < U(M*, D, t),
where ε
denotes a context-dependent tolerance range of epistemic
competitiveness.
This two-level structure formalizes a distinction that is discussed in the literature but seldom explicitly modeled. It precisely reconstructs how models may remain viable in some subdomains while failing in others, thereby complementing earlier approaches by Popper and Lakatos with a clearly structured account of domain-specific model stability. The two-level structure of falsification explicitly captures the common scientific situation in which models are recalibrated within certain subdomains while retaining their overarching epistemic role.
A model disappears when:
an alternative model is epistemically superior in all relevant domains (U(M*, D) > U(M, D) for all D),
the model costs of M are higher,
E(t) no longer contains stabilized contexts of use for the model.
Formally, global falsification can therefore be defined as the
condition that for all relevant domains D:
U(M, D) < U(M*,
D).
Examples include:
• Aether → Maxwell (1865) and Einstein
(1905),
• Phlogiston → modern chemistry,
• Epicycles
→ Kepler (1609, 1619) and Newton,
• Four-humors theory →
modern medicine.
These historical model shifts are often interpreted in philosophy of science as paradigm changes or research programme transformations (Kuhn 1962; Lakatos 1970), but the utility structure proposed here provides a more precise formal reconstruction.
Technological, mathematical, and institutional innovations reshape E(t) and accelerate global falsification. Thus E(t) determines not only which models disappear but also which ones can be considered realistic alternatives in the first place.
Many models do not exist as single structures but as entire model
families. Falsification therefore usually targets variants or
specific parameterizations rather than the entire class. Model
families possess structural redundancies that allow errors in some
submodels to be compensated without abandoning the overall approach.
This explains why falsification often results merely in a shift
within the family rather than the elimination of the class. This
observation aligns with model-theoretic literature, in which model
families are described as structured spaces of possible variants
(Weisberg 2013).
The domain structure introduced here refines
this insight by showing how stability and variation within a model
class depend on each other in a formally precise way.
Models persist because they are:
• didactically useful,
•
computationally efficient,
• technically standardized.
Contextual falsification corrects domains but does not eliminate
the model.
The persistence of a model in D₁ despite
falsification in D₂ follows from the utility structure U(M, D₁)
and from the stabilizing institutional and technical contexts within
the epistemic enabling space E(t).
The existence of stable contexts of use is a function of E(t):
institutions, data formats, software libraries, and training
structures stabilize models independently of their global approximate
truth. This explains why models with low AT(M, D₂) but high AT(M,
D₁) remain rational to use.
This corresponds to scientific
practice in many disciplines, where models are treated as tool-like
building blocks whose validity is context-dependent rather than
globally assessed.
Climate models are not single models but complex ensembles
composed of various model variants and scenarios (Oreskes et al.
1994):
E = {M₁, M₂, …, Mn}.
This ensemble combines
different physical assumptions, parameterizations, and initial
conditions.
Falsification therefore rarely affects the ensemble
as a whole but typically targets specific components:
• ocean
models,
• atmospheric parameterizations,
• cloud
processes,
• biogeochemical modules.
The ensemble as a
whole remains epistemically stable even when individual components
are modified or replaced. This corresponds to established practice in
climate science, where ensembles explicitly function as mechanisms of
epistemic stability (see Oreskes et al. 1994; IPCC methodology). The
domain structure of the ensemble is therefore more stable than that
of individual models.
Many climate-relevant processes cannot be computed from
fundamental physical equations at full resolution. Parameterizations
function as epistemic bridges between physical theory and numerical
feasibility.
Falsification typically marks the limits of such
parameterizations, leading to local domain adjustments:
AT(M,
D₂) ↓ → R(M, D₂).
Parameterizations therefore are not merely approximation techniques but operational domain definitions whose adjustment is a central mechanism of contextual falsification. The overarching modelling framework, however, remains intact. This is a paradigmatic case of contextual falsification.
Modern climate models integrate Bayesian updating to
incorporate new data systematically:
Posterior ∝ Likelihood ×
Prior.
A failed prediction reduces the likelihood of a submodel
and thus its domain-specific approximate truth AT(M, D₂). But the
ensemble absorbs this reduction by assigning higher likelihoods to
alternative parameterizations or submodels.
This shows that falsification acts not eliminatively but redistributively: it shifts epistemic weights within the ensemble. Formally, this corresponds to weight shifts in the posterior-based model portfolio, where submodels with higher likelihoods contribute more strongly to the ensemble outcome.
The functional components of climate models can be structured as
follows:
• D₁: global temperature trends,
• D₂:
regional precipitation patterns,
• D₃: extreme events,
•
D₄: short-term climate variability (ENSO, AMO).
This clear domain structure shows that falsification acts primarily in D₂–D₄, while D₁ functions as a global stability anchor of the ensemble model. D₁ has remained stable for decades, which is why the ensemble is globally epistemically optimal.
A simplified example illustrates this:
Let two models M and
M* be available.
AT(M, D₁) = 0.9, EK(M, D₁) = 0.8
AT(M*, D₁) = 0.7,
EK(M*, D₁) = 0.8
AT(M, D₂) = 0.4, EK(M, D₂) = 0.5
AT(M*, D₂) = 0.8,
EK(M*, D₂) = 0.9
For moderate costs C(M) ≈ C(M*) we have:
• In D₁, M
remains epistemically optimal.
• In D₂, M* becomes
optimal.
• Globally, both models remain part of the ensemble.
This illustrates the principle:
Contextual falsification
reduces AT(M, D₂) without affecting AT(M, D₁) or global epistemic
optimality.
This structure is characteristic of many
simulation-based sciences, not only climate research.
Climate models are often judged using a Popperian schema:
“A
wrong prediction shows that the model is false.”
This is
scientifically incorrect (Oreskes et al. 1994). Correctly understood:
falsification acts domain-specifically and indicates which
subcomponents require further development.
This demonstrates that Popper’s eliminative logic is inadequate for simulation-based model architectures and must be replaced by a domain-specific utility structure.
The climate-model case study is representative of modern, data- and simulation-intensive model architectures. The structure of the analysis is general: the theory developed here applies without additional assumptions to classical physical models, economic models, epidemiological models, or AI model architectures, because it relies solely on the formal conditions of domain-specific approximate truth, model costs, and the decision logic U(M, D).
Approximate truth explains why models can generate scientific progress despite errors. The concept, shaped especially by Niiniluoto (1987, 1998) and Oddie (1986), has rarely been systematically linked to model use, domain structure, and model costs.
Formally, approximate truth can be expressed using a similarity
metric:
AT(M, D) = Σᵢ wᵢ · sim(M, Sᵢ, D),
where
sim(M, Sᵢ, D) measures similarity between model predictions and
observed system states within domain D, and wᵢ represent relevance
weights.
In this framework, approximate truth is neither a
Popperian truth criterion nor a simple fit metric but a
domain-specific measure of structural similarity that plays a central
role in model selection.
Key points:
Approximate truth is always domain-specific.
Falsification reduces AT(M, D₂) but leaves AT(M, D₁) unchanged.
Global falsification cannot be defined by approximate truth
alone.
Even if AT(M, D) approaches zero in many cases of global
elimination, the decisive factor is the utility structure:
U(M,
D) < U(M*, D) for all relevant domains D.
By embedding approximate truth in the utility function U(M, D), the framework shows that truthlikeness does not operate in isolation but determines epistemic optimality only in interaction with explanatory power and model costs. Within domains, approximate truth provides a quantitative dimension of epistemic improvement.
For practical application, both the similarity function sim(M, Sᵢ,
D) and the weights wᵢ must be context-dependent and empirically
operationalizable.
In many scientific fields, sim(M, Sᵢ, D)
directly corresponds to established fit and error metrics, such as
likelihood functions, variance measures, forecast errors, residual
analyses, or similarity indices between model trajectories and
observed system states.
In more theoretical or structurally
oriented sciences, similarity metrics may also capture qualitative or
topological features, such as symmetry preservation, invariance
structures, or the reproduction of causal dependencies.
The weights wᵢ represent the relative relevance of different system properties within a domain. They may be empirically determined by disciplinary standards or derived from model-theoretic considerations that specify which features of a system are central for model quality.
Approximate truth thus provides an adaptable structure that captures both numerical accuracy and structural fit.
Before the digital age, many scientific models were formulated analytically, which made domain structures less visible. Only with numerical simulations and large-scale data models did contextual falsification become a central feature of scientific practice.
Popper’s approach was designed primarily as a truth criterion.
Modern models, however, are:
• approximative,
•
domain-specific,
• dynamically recalibratable.
A binary notion of truth is unsuitable because approximate truth varies across domains. Classical logical structures cannot capture the operational complexity of modern modelling; they do not represent gradational approximation or domain-specific weighting of relevance.
Scientific practice is embedded in:
• computational
capabilities,
• data norms,
• peer-review structures,
•
funding regimes,
• technological infrastructures.
These factors shape the epistemic opportunity space E(t) and determine which models can be formulated and stabilized. Popper’s theory does not consider this dimension. The proposed framework introduces E(t) precisely as the structuring force of institutional and technical conditions and thereby as a central epistemic variable.
The epistemic opportunity space E(t) evolves with
technological, methodological, and institutional developments.
A
model may be epistemically optimal at one time and no longer optimal
later, even though its “truth” has not changed. This shows that
E(t) itself functions as a dynamic epistemic variable whose evolution
endogenously influences model optimality.
Successful models also reshape the opportunity space E(t): they establish new data formats, computational infrastructures, institutional standards, and research practices. Thus, there is a reciprocal dynamic between modelling practice and E(t), jointly shaping both the set of available models and the structure of future model alternatives.
Key drivers of changes in E(t) include:
growth of data and improved measurement systems,
new mathematical and statistical methods,
increasing computational power,
institutional standardization processes,
evolution of scientific discourse.
These drivers can be conceptualized as change operators ΔE(t) that systematically expand or restrict the epistemic opportunity space.
E(t) determines:
• the set of models that are actually
available,
• the structure of their domains,
• their
costs C(M),
• the available alternatives M*.
Thus, the epistemic opportunity space is the central framework
parameter of model choice.
In a historical perspective, E(t)
explains why model transitions are often triggered by technological
and institutional innovations that make new alternatives formulable
for the first time.
For analytical precision, the epistemic opportunity space E(t) can be decomposed into three functional components that jointly determine which models can be formulated and stabilized:
Eₘ(t): Methodological and mathematical
conditions.
This includes available mathematical
techniques, statistical methods, modelling strategies, and
algorithmic tools.
They determine which types of models can be
formulated and which approximations are permissible.
Eₜ(t): Technical and data-related conditions.
This
dimension includes computational resources, data quality, software
infrastructures, numerical tools, and simulation technologies.
These
factors determine which models can be implemented in practice and at
what resolution, stability, or complexity.
Eᵢ(t): Institutional and organizational
conditions.
This includes scientific norms,
peer-review processes, funding structures, training pathways, and
established research practices.
They shape which models become
stabilized over time, which standards prevail, and which
alternatives are considered acceptable.
E(t) emerges from the interaction of these three dimensions.
Model
availability and stabilization thus depend not only on approximate
truth, explanatory power, or model costs but also on the
methodological, technical, and institutional conditions that make
formulation, implementation, and further development possible.
The choice between two models M and M* can be reconstructed using a time- and domain-specific utility function:
U(M, D, t) = α · AT(M, D, t) + β · EK(M, D, t) − γ · C(M, D, t).
This utility structure is compatible with established procedures
of model selection.
Information criteria such as AIC and BIC
operationalize specific forms of C(M) and EK(M, D), while Bayes
factors and Bayesian Model Averaging represent probabilistic variants
of approximate truth and explanatory power.
The proposed framework extends these approaches by embedding them explicitly in the epistemic opportunity space E(t) and by systematically incorporating institutional, technical, and economic model costs. Model choice is thus reconstructed not merely statistically but epistemically, grounded in real scientific practice.
A global assessment can be formulated using an aggregated utility function U(M) that weights U(M, D) across relevant domains.
Where:
• AT(M, D): domain-specific approximate truth,
•
EK(M, D): explanatory power within domain D,
• C(M): model
costs (data requirements, computational effort, institutional
infrastructure),
• α, β, γ: context-dependent weighting
factors that are institutionally and methodologically encoded in
E(t).
This utility function unites classical ideas from philosophy of science (e.g., explanatory power in Hempel, truthlikeness in Niiniluoto) with modern decision-theoretic evaluation structures, making epistemic rationality quantitatively reconstructable.
Because many terms of the utility function can be approximated
using empirical proxy metrics or established quantitative procedures,
the proposed decision logic can often be applied to real model
portfolios and empirically examined.
The function is an
ideal-typical framework showing how widely used fit metrics,
information criteria, complexity measures, and institutional
constraints can be integrated in one evaluation structure.
The operationalization of individual components—especially explanatory power and institutional model costs—is context-dependent, but in modern scientific practice usually feasible.
Contextual falsification reduces AT(M, D₂), which lowers U(M,
D₂), without affecting AT(M, D₁) or U(M, D₁).
Thus, domain
D becomes an explicit variable epistemic unit shaped by the
opportunity space E(t).
Falsification is therefore not a global
elimination operator but a condition for model adjustment:
•
falsification in D₂ → U(M, D₂) decreases,
• U(M, D₁)
remains high,
• M continues to possess epistemic value in D₁.
Global falsification occurs exactly when, for all relevant domains
D:
U(M, D) < U(M*, D).
Models disappear not because they are “false” but because their relative epistemic performance is lower in every domain.
The utility structure explains why model families remain
robust:
If U(M₁, D₂) drops, models M₂ or M₃ within the
same family may still have higher U-values. Falsification becomes a
mechanism of internal reallocation of epistemic weights, not
Popperian elimination.
To make this utility structure not only theoretical but empirically usable, U(M, D) can be refined via a standardized scaling and operationalization scheme integrating the established quality measures and error metrics of each discipline.
The linear form is a pragmatic and compatible approximation
capturing the primary evaluation dimensions—approximate truth,
explanatory power, and model costs.
But the structure is not
limited to linearity: nonlinear or interaction-based evaluation forms
may be appropriate, for example when explanatory power matters only
above a minimum level of approximate truth, or when model costs
involve domain-specific threshold effects.
Model costs C(M) may contain both global and domain-specific
components.
While base costs of implementation or infrastructure
are global, computational and data-related costs vary between
domains.
In such cases C(M, D) can replace C(M) without changing
the fundamental structure.
Crucial is that the utility function, regardless of form, reconstructs epistemic optimality by showing how approximate truth, explanatory power, and costs are weighed within the opportunity space E(t).
To ensure that the utility function U(M, D) is not only theoretically precise but directly usable, it can be operationalized using established evaluation methods of each discipline.
Calibration is relative:
Within a domain D, the model with
the best empirical performance receives value 1, the minimally
acceptable model receives 0. All other models are positioned between
these values via linear or functional normalization.
Thus, a continuous scale emerges without requiring a physical
measurement medium.
Differences between models span the
evaluation space that the utility function maps.
No new data are
required: the approach uses existing error metrics, quality measures,
likelihood structures, and complexity indicators.
The epistemic opportunity space E(t) is thereby operationalized
through available empirical information.
Models are not
evaluated in a vacuum but via the existing data and methodological
standards of a discipline.
The utility function U(M, D) becomes
a standardized projection of this information, enabling systematic
comparison and model selection across domains.
The utility function has both descriptive and weakly normative
aspects:
• Descriptively, it captures how model choice
actually works in data- and simulation-intensive sciences by
balancing approximate truth, explanatory power, and costs.
•
Normatively, it clarifies under which conditions such choices count
as epistemically rational—without imposing external norms.
Science is not externally regulated; rather, the framework renders explicit the evaluative dimensions already operative in practice.
The integrated theory occupies a position between structural
realism and instrumental model pluralism.
In contemporary
philosophy of science, this middle position has become increasingly
relevant because many modern sciences can neither operate as fully
realist nor fully instrumentalist.
Models are partially true insofar as they approximate real
structures (approximate truth as the degree of structural
similarity).
This aligns with structural realism, since truth is
not attributed to entire models but to stable relational structures
that persist within specific domains.
However, their stability also depends on practical factors:
•
costs,
• institutional stabilization,
• availability of
alternative models,
• evolution of the opportunity space E(t).
Thus, the framework extends structural realism by showing that
approximate truth is not the sole determinant of rational scientific
practice.
At the same time, it extends instrumental model
pluralism by formalizing when models remain epistemically optimal
despite partial falsification.
Within current realism debates, the approach is precisely
positionable.
It is close to structural realism (e.g., Worrall
1989; Psillos 1999) because approximate truth is interpreted as an
approximation to stable relational structures.
It takes a
broader perspective than van Fraassen’s constructive empiricism
(1980), since it shows that operative scientific rationality depends
not only on empirical adequacy but also on institutional, technical,
and explanatory factors represented in the opportunity space E(t).
In contrast to da Costa and French’s concept of partial truth
(2003), the present framework embeds approximate truth directly into
a utility function U(M, D) that also includes explanatory power and
model costs.
Scientific rationality thus becomes a decision
problem situated within a dynamic opportunity space.
Because of this integrative structure, the approach positions itself between structural realism and instrumental pluralism while combining insights from both into a unified framework that directly reflects the empirical modelling practices of modern sciences.
This structure is especially relevant in data-intensive and
simulation-based sciences such as climate research, economics,
epidemiology, and AI modelling.
The framework offers a precise
way to analyse model stability and model replacement across these
fields.
Falsification is not a binary elimination mechanism but a
domain-specific tool for refining scientific models.
The theory
developed here unifies core insights from Popper, Kuhn, Lakatos, da
Costa/French and contemporary model theory within a formally
explicit, integrated framework.
By distinguishing between global and contextual falsification, it becomes clear that empirical discrepancies primarily lead to domain restriction rather than complete abandonment of a model.
Integrating approximate truth, explanatory power, and model costs
into a domain-specific utility function shows that model stability
results from relative epistemic optimality, not merely from
truthlikeness.
Models disappear only when they are no longer
epistemically competitive in any domain and when the epistemic
opportunity space E(t) no longer provides stabilized contexts of use.
The concept of the epistemic opportunity space explains why
technological, institutional, and methodological changes can have
deep effects on scientific model landscapes.
Model choice is
thereby reconstructed as a dynamic process in which scientific
rationality is not expressed through a single logical operation but
through a structured interplay of context, approximation, and
institutional stabilization.
The proposed theory thus offers a coherent framework for analysing
modern modelling practices, particularly in data-intensive and
simulation-based sciences.
It clarifies how models remain
epistemically stable despite partial falsification and formulates the
precise conditions under which global elimination is rational.
Future
work could empirically test the utility function, further formalize
the opportunity space E(t), and apply the framework to additional
scientific model architectures.
Cartwright, Nancy. 1983.
How the Laws of
Physics Lie.
Oxford: Oxford University Press.
Cartwright, Nancy. 1999. The
Dappled World: A Study of the Boundaries of Science.
Cambridge: Cambridge University Press.
Da Costa, Newton C. A., and Steven French. 2003. Science and Partial Truth: A Unitary Approach to Models and Scientific Reasoning. Oxford: Oxford University Press.
Einstein, Albert. 1905. “On the Electrodynamics of Moving Bodies.” Annalen der Physik 17: 891–921.
Hacking, Ian. 1983. Representing and Intervening. Cambridge: Cambridge University Press.
Hempel, Carl G. 1965. Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. New York: Free Press.
IPCC. 2021. Climate Change 2021: The Physical Science Basis. Cambridge: Cambridge University Press.
Kepler, Johannes. 1609.
Astronomia Nova.
Heidelberg: Jonas Sabber.
Kepler, Johannes. 1619. Harmonices
Mundi. Linz:
Johann Planck.
Kuhn, Thomas S. 1962. The Structure of Scientific Revolutions. Chicago: University of Chicago Press.
Lakatos, Imre. 1970. “Falsification and the Methodology of Scientific Research Programmes.” In Criticism and the Growth of Knowledge, edited by Imre Lakatos and Alan Musgrave, 91–196. Cambridge: Cambridge University Press.
Maxwell, James Clerk. 1865. “A Dynamical Theory of the Electromagnetic Field.” Philosophical Transactions of the Royal Society of London 155: 459–512.
Morgan, Mary S., and Margaret Morrison. 1999. Models as Mediators: Perspectives on Natural and Social Science. Cambridge: Cambridge University Press.
Niiniluoto, Ilkka. 1987.
Truthlikeness.
Dordrecht: Reidel.
Niiniluoto, Ilkka. 1998. “Approximation in
Science.” Poznan
Studies in the Philosophy of the Sciences and the Humanities
63: 97–109.
Oddie, Graham. 1986. Likeness to Truth. Dordrecht: Reidel.
Oreskes, Naomi, Kristin Shrader-Frechette, and Kenneth Belitz. 1994. “Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences.” Science 263 (5147): 641–646.
Popper, Karl R. [1934] 1959. The Logic of Scientific Discovery. London: Hutchinson.
Psillos, Stathis. 1999. Scientific Realism: How Science Tracks Truth. London: Routledge.
Putnam, Hilary. 1978. Meaning and the Moral Sciences. London: Routledge.
van Fraassen, Bas C. 1980. The Scientific Image. Oxford: Clarendon Press.
Weisberg, Michael. 2013. Simulation and Similarity: Using Models to Understand the World. Oxford: Oxford University Press.
Worrall, John. 1989. “Structural Realism: The Best of Both Worlds?” Dialectica 43 (1–2): 99–124.