Contextual and Global Falsification of Scientific Models
An Integrated Theory of Epistemic Validity





Abstract
Classical Popperian falsification theory operates with a binary image of scientific rationality according to which a single contradictory observation is sufficient to reject a theory (Popper [1934] 1959). Modern scientific practice systematically contradicts this idealized view: central models such as Newtonian mechanics, classical thermodynamics, nonrelativistic quantum mechanics, or contemporary climate models are empirically falsified in certain subdomains yet remain epistemically indispensable.
This paper develops an integrative framework of model validity that systematically unifies key insights from Popper, Kuhn, Lakatos, da Costa/French, and contemporary model theory into a single model of rational scientific practice. The theory distinguishes global and contextual falsification, introduces the concept of an epistemic enabling space E(t), and provides a formally specified structure for domain operations. The epistemic enabling space forms the central theoretical contribution of the paper, as it makes the methodological, technical, and institutional conditions of model choice explicit.
“Falsification” is not treated as a purely logical truth criterion but as a rational–pragmatic mechanism of model assessment that integrates approximate truth, explanatory power, and model costs. This allows us to explain why models remain stable despite partial falsifications and under which conditions genuine model elimination occurs. In addition, a decision-theoretic logic of epistemic optimality is formalized that combines approximate truth within specific domains, explanatory power, and model costs into a unified utility function. A case study on climate models demonstrates how ensemble methods, parameterizations, and Bayesian updating lead to systematic domain refinement.
The proposed theory is primarily descriptive: it reconstructs actual model practice in modern sciences and provides an explicitly formulated and systematically integrated account of model structures that classical Popperian views consider only partially. This makes idealized, simulation-based, and domain-specific models epistemically precise and scientifically tractable.





This work is registered with the U.S. Copyright Office.
Registration Number: pending. Year of Registration: 2025.

Scientific-Theoretical Paper, 28 November 2025
ORCID: https://orcid.org/0009-0004-0847-9164

DOI: 10.5281/zenodo.17714967
© 2025 Stefan Rapp
Creative Commons Attribution–NonCommercial–NoDerivatives 4.0 International

Table of Contents

Abstract 1

  1. Introduction 3

  2. State of Research 4
    2.1 Popper: Falsification as a Binary Ideal 4
    2.2 Kuhn: Anomalies Without Elimination 4
    2.3 Lakatos: Research Programmes 4
    2.4 Model Theory 4
    2.5 Approximate Truth 4
    2.6 Research Gap 5
    2.7 Contribution of This Paper 6

  3. Basic Concepts 7

  4. Criteria of Modern Model Assessment Within the Proposed Framework 10

  5. Proposition: Two-Level Structure of Falsification 12

  6. When Models Disappear 13

  7. Why Models Persist 14
    7.1 Robustness Through Model Families 14
    7.2 Contexts of Use 14

  8. Case Study: Climate Models 15
    8.1 Climate Models as Ensemble Structures 15
    8.2 Parameterization as an Epistemic Operation 15
    8.3 Bayesian Updating and Adaptive Modelling 16
    8.4 Domain Structure 16
    8.5 Miniaturized Formal Illustration 16
    8.6 Consequences for Epistemic Status 17

  9. Approximate Truth 18

  10. Why an Integrated Theory Has Been Missing 19
    10.1 Historical Reasons 19
    10.2 Logical Reasons 19
    10.3 Institutional Reasons 19

  11. Epistemic Enabling Space 20
    11.1 Dynamics of the Epistemic Enabling Space 20
    11.2 Structure of the Epistemic Enabling Space E(t) 21

  12. Decision Logic of Model Choice 22
    12.1 Structure of the Utility Function 22
    12.2 Scaling and Empirical Grounding of the Utility Function 24

  13. Relation to Scientific Realism 25

  14. Conclusion 26
    References 27

1. Introduction

Popper’s falsification theory ([1934] 1959) shaped the public and academic image of scientific rationality for decades. It postulates that theories must be rejected as soon as one of their predictions fails empirically. The practice of the natural and social sciences, however, presents a different picture. This paper develops an integrated theory of scientific model validity that unifies central insights from Popper, Kuhn, Lakatos, da Costa/French, and contemporary model theory into a common structure of rational model assessment. Many models are idealized, only partially representative, empirically falsified in subdomains, and yet epistemically stable.
Examples include Newtonian mechanics, which fails at relativistic velocities (Einstein 1905), classical thermodynamics, which is microscopically imprecise, or climate models, which require continuous recalibration (Oreskes et al. 1994). At the same time, some models have disappeared entirely, such as the phlogiston theory or the classical aether.
The core question is: under what conditions does falsification lead to the elimination of a model, and when does a model remain epistemically stable despite partial falsification?
This paper develops an integrated theoretical framework that explicitly formalizes this question within a unified model. The approach distinguishes global and contextual falsification, describes the epistemic enabling space E(t), and integrates approximate truth as a graded, domain-specific assessment measure. Furthermore, it shows that falsification is not a purely logical truth operation but a rationally reconstructible decision within a real epistemic enabling space. The theory is primarily descriptive: it reconstructs actual model practice in modern sciences and is illustrated using climate models.
The utility function U(M, D) introduced below is to be understood primarily as reconstructive: it describes how model choice in real scientific practice occurs when approximate truth, explanatory power, and model costs are weighed against each other. At the same time, the structure has a normative reading, as it makes the conditions of rational model assessment explicit. The theory thus serves both as a descriptive reconstruction of model practice and as a framework of rational epistemic optimality.





2. State of Research

2.1 Popper: Falsification as a Binary Ideal

Popper ([1934] 1959) defines falsification as a binary mechanism: a single contradictory observation refutes a theory. This model presupposes global claims of validity, unambiguous theory–observation mappings, and non-idealized theories—conditions that are rarely met in modern sciences. Idealized, approximate, and simulation-based models, in particular, do not fit this binary structure. Although the tension between Popper’s ideal and actual model practice has been widely discussed in the literature (Hacking 1983; Cartwright 1983), it has never been translated into a systematic theory of contextual falsification.

2.2 Kuhn: Anomalies Without Elimination

Kuhn (1962) shows that scientific paradigms can survive anomalies. However, he does not provide a finely structured theory of model-specific falsification. The main gap is that Kuhn does not specify how anomalies distribute across subdomains or why they undermine some models but leave others essentially unaffected. While the literature emphasizes Kuhn’s historical and sociological perspective, a formally precise model theory of subdomain-structured falsification is missing (cf. Lakatos 1970; Weisberg 2013).

2.3 Lakatos: Research Programmes

Lakatos (1970) explains stability through a “hard core”, but without determining when models actually disappear. It also remains unclear how competing programmes should be compared in detail, especially when both are partially but not globally falsified. This reveals that the programme structure of Lakatos does not provide a formal instrument for assessing domain-specific approximate truth.

2.4 Model Theory

Model-theoretic approaches (Cartwright 1983; Weisberg 2013; Morgan & Morrison 1999) emphasize the idealized and domain-specific nature of scientific models but do not provide a theory of how falsification operates within such structures or how models are systematically compared. This concerns especially the handling of idealized simulations, for which neither Popper nor classical model theory offers a suitable falsification criterion (Morgan & Morrison 1999).

2.5 Approximate Truth

Niiniluoto (1987, 1998) and Oddie (1986) offer graded concepts of truth but do not explain why some models persist despite partial falsifications, while others disappear entirely. What is missing in particular is the link between approximate truth, contexts of use, model costs, and institutional stabilization. Without this, it remains unclear how graded truthlikeness is supposed to operate within real model portfolios.



2.6 Research Gap

To the best of current knowledge, there is no systematically developed and academically established integrated theory that simultaneously explains:
• when falsification operates globally,
• when it remains contextual,
• how approximate truth functions within domains,
• how model costs and explanatory power are rationally weighted,
• how the epistemic enabling space E(t) determines the set of real model alternatives.
Individual aspects of these issues have been treated separately in the literature—for example in work on approximate truth, on research programmes, or in model theory. What is largely missing is an explicit, formally structured reconstruction that unifies these dimensions within a single framework of model assessment. This is particularly relevant for data-intensive, simulation-based sciences in which contextual falsification and model portfolios have become standard practice.

2.7 Contribution of This Paper

This paper makes four interconnected contributions to the philosophy of science and model theory. It presents an integrative framework that consolidates established insights and translates them into an explicitly formalized structure.

  1. The concept of the epistemic enabling space E(t).
    E(t) is explicated as a dynamic structure of methodological, technical, and institutional conditions that determines which models can be formulated, evaluated, and stabilized. This models a dimension that classical Popperian falsification either leaves implicit or does not consider at all.

  2. The two-level structure of falsification.
    The paper draws a strict distinction between contextual and global falsification. Global falsification is no longer understood as a binary truth break, but as a condition in which a model is no longer epistemically competitive in any domain D within the epistemic enabling space E(t). This structure systematically reconstructs the stability of partially falsified models.

  3. The integrated utility function U(M, D, t).
    With the utility function
    U(M, D, t) = α · AT(M, D, t) + β · EK(M, D, t) − γ · C(M, D, t),
    the paper proposes a framework that unifies approximate truth, explanatory power, and model costs into a single assessment structure explicitly anchored in E(t). The function is formulated to remain compatible with established quality measures and selection criteria (fit metrics, information criteria, complexity measures) in individual disciplines.

  4. Application to simulation-based model portfolios.
    Using climate models as an example, the paper shows how ensemble structures, parameterizations, and Bayesian updating can be reconstructed within the proposed framework as cases of contextual falsification and domain-specific approximate truth. This aligns the theory directly with a central class of modern, data- and simulation-intensive model practices.

Overall, the contribution of this paper lies in providing an explicitly formulated, formally structured, and empirically applicable framework that systematically integrates classical insights from falsification theory, model theory, and truthlikeness concepts, while offering a clearly codified structure for evaluating and understanding the dynamics of scientific models.





3. Basic Concepts

The following definitions clarify the terms introduced above.



Notation (overview)
D(M) – domain of applicability of a model
D₁, D₂ – subdomains of the domain of applicability
AT(M, D) – approximate truth of the model in domain D
EK(M, D) – explanatory power
C(M) – model costs
U(M, D) – domain-specific utility function
E(t) – epistemic enabling space
R(M, D₂) – restriction of the model to subdomain D₂



Definition 1: Model (M)
A model is an idealized mathematical or conceptual structure for representing defined aspects of a target system under specific conditions (Cartwright 1999).



Definition 2: Domain of Applicability D(M)
The set of all conditions under which a model yields reliable, explanatorily adequate, or functionally optimal results.



Definition 3: Subdomains D₁, D₂
D(M) can be decomposed into subdomains: D₁ ⊆ D(M), D₂ ⊆ D(M), with D₁ ∪ D₂ = D(M). Such decompositions often result from scientific revision.
For falsification, we additionally stipulate:
• D₁ denotes the subdomain in which M remains epistemically viable;
• D₂ denotes the subdomain in which M is no longer acceptable.





Definition 4: Epistemic Optimality
A model is epistemically optimal in a domain D if, given all alternatives available within the enabling space E(t), it exhibits a favourable balance of approximate truth, explanatory power, and model costs. Epistemic optimality is operationalized by the domain-specific utility function U(M, D).
Model costs C(M, D) comprise three dimensions:
(1) epistemic costs (e.g., error sensitivity, variance, uncertainty breadth of predictions, interpretability/transparency, quality of fit),
(2) technical costs (computational effort, data requirements, implementation complexity),
(3) institutional costs (degree of standardization, availability of infrastructure).
Some cost components are global (e.g., basic implementation or infrastructure costs), others domain-specific (e.g., data requirements or numerical stability in particular domains). For the utility function, these are captured jointly in C(M, D, t).



Definition 5: Contextual Falsification
An observation O results in contextual falsification if it shows that there exists a subdomain D₂ in which M is no longer epistemically acceptable, while at least one subdomain D₁ remains in which M is still optimal.



Definition 6: Global Falsification
A model M is globally falsified at time t in scientific development if it is epistemically optimal or competitive in no relevant domain D within the epistemic enabling space E(t).
Epistemic optimality holds in a domain D when U(M, D, t) is greater than or equal to the utility value of all alternative models available in E(t).
Epistemic competitiveness holds when U(M, D, t) is at most within a context-dependent tolerance range ε below the utility value of the best available alternative U(M*, D, t).
Formally, global falsification can be characterized as follows: for all relevant domains D, there exists at least one alternative model M* such that
U(M, D, t) + ε < U(M*, D, t).
The tolerance parameter ε may be set in a discipline- and context-dependent manner and reflects the fact that models with slightly lower U-values may remain competitive in practice.



Definition 7: Epistemic Enabling Space E(t)
The space of methodological, mathematical, institutional, and technical conditions that determines:
• which models can be formulated,
• which data are available,
• which idealizations are permissible,
• which models can be stabilized,
• how model costs C(M) are structured in the first place.
E(t) is dynamic: technological innovations, data availability, and institutional norms continuously reshape the space of possible models.



Definition 8: Approximate Truth
Approximate truth is the degree of a model’s similarity to relevant features of a target system relative to a domain D:
AT(M, D).
Formally, approximate truth can be expressed using a similarity metric:
AT(M, D) = Σᵢ wᵢ · sim(M, Sᵢ, D).
Here, approximate truth is not a truth criterion in the Popperian sense but a graded similarity metric within specific domains.
In practical model assessment, approximate truth is always considered together with uncertainty estimates of model predictions. High predictive uncertainty reduces the effective epistemic value of a model even when mean fit is high, and is therefore reflected both in AT(M, D) and in the epistemic cost component of C(M).



Definition 9: Domain Operators
• Restriction: R(M, D₂) = M|₍D₂₎
• Domain difference: D(M) − D₂ = D₁
• Decomposition: Z(M) = {D₁, D₂, …}
These operators describe model-theoretic modifications of the domain of applicability.



On Temporal Dynamics
Since the epistemic enabling space E(t) changes over the course of scientific development, the assessment values AT(M, D), EK(M, D), and the costs C(M) are, in principle, time-dependent. This is denoted explicitly by a time index t where the dynamics of E(t) are foregrounded:
U(M, D, t) = α · AT(M, D, t) + β · EK(M, D, t) − γ · C(M, D, t).
Where it would hinder readability, the index t is omitted; it remains conceptually implicit throughout.











4. Criteria of Modern Model Assessment Within the Proposed Framework

Modern scientific model assessment comprises a clearly identifiable set of epistemic, technical, and institutional criteria that jointly determine whether a model within a domain D is retained, recalibrated, or abandoned. The framework developed here makes these criteria explicit and systematically assigns them to the components of the utility function U(M, D, t) and the epistemic enabling space E(t). The following catalogue aims to cover, within the scope of this framework, the epistemically relevant assessment criteria as comprehensively as possible, without excluding the possibility that individual disciplines may develop additional, more fine-grained subcriteria.



(1) Approximation and structural similarity
This includes domain-specific approximate truth AT(M, D), understood as graded similarity between model structures and relevant system states. It comprises quality of fit, structural similarity, and stability of predictions across subdomains.



(2) Explanatory power (EK)
A model’s explanatory power determines the breadth, depth, and counterfactual sensitivity of its explanations. It covers the range of phenomena accounted for as well as the model’s capacity to explain new or derived facts.



(3) Model uncertainty
Model uncertainty includes variance, sensitivity, and stability of predictions within a domain. Large uncertainty intervals reduce a model’s effective epistemic value, even when average fit is high. Uncertainty affects both AT(M, D) and the epistemic component of C(M).



(4) Interpretability and transparency
Models differ in structural accessibility, the intelligibility of their mechanisms, testability, and the diagnosability of error sources. Low interpretability increases epistemic costs and reduces explanatory usefulness.



(5) Technical and computational costs
These include computational effort, energy consumption, numerical complexity, data requirements, and implementability. Technical costs form a central component of C(M) and significantly influence the rationality of model choice in data-intensive sciences.



(6) Institutional and infrastructural conditions
Models are stabilized by norms, software ecosystems, training structures, data standards, and organizational practices. These factors reside within the epistemic enabling space E(t) and explain why models often persist even when they become suboptimal in particular domains.



(7) Availability of epistemic alternatives
The set of alternative models available in the enabling space E(t) determines whether contextual falsification leads to model revision or global elimination. Models persist as long as they remain epistemically optimal in at least one domain.

This catalogue clarifies which factors carry operational weight in the utility function U(M, D) and in the enabling space E(t). At the same time, it provides a comprehensive overview of the assessment dimensions that shape modern scientific model practice. The integrated framework thereby becomes both theoretically complete and directly empirically applicable.





5. Proposition: Two-Level Structure of Falsification

In the terminology proposed here, “falsification” is no longer a binary truth decision about M as a whole, but a systematically reconstructible shift in the utility values U(M, D, t) across domains, which under specific conditions leads to the complete abandonment of M.

For any model M, the following holds:

  1. As long as there exists a subdomain D₁ in which M is epistemically optimal according to U(M, D₁), falsification leads to a restriction of the domain of applicability:
    D(M) → D(M)' = D₁.

  2. M is globally falsified if and only if there is no domain in which M is epistemically optimal or competitive according to U(M, D).

Approximate truth AT(M, D), explanatory power EK(M, D), and model costs C(M) determine the assessment of epistemic optimality; E(t) determines the set of admissible alternatives.
Formally, the transition from contextual to global falsification can be expressed as the transition
from
∃D₁: U(M, D₁, t) ≥ U(M*, D₁, t)
to
∀D: U(M, D, t) + ε < U(M*, D, t),
where ε denotes a context-dependent tolerance range of epistemic competitiveness.

This two-level structure formalizes a distinction that is discussed in the literature but seldom explicitly modeled. It precisely reconstructs how models may remain viable in some subdomains while failing in others, thereby complementing earlier approaches by Popper and Lakatos with a clearly structured account of domain-specific model stability. The two-level structure of falsification explicitly captures the common scientific situation in which models are recalibrated within certain subdomains while retaining their overarching epistemic role.





6. When Models Disappear



A model disappears when:

  1. an alternative model is epistemically superior in all relevant domains (U(M*, D) > U(M, D) for all D),

  2. the model costs of M are higher,

  3. E(t) no longer contains stabilized contexts of use for the model.

Formally, global falsification can therefore be defined as the condition that for all relevant domains D:
U(M, D) < U(M*, D).

Examples include:
• Aether → Maxwell (1865) and Einstein (1905),
• Phlogiston → modern chemistry,
• Epicycles → Kepler (1609, 1619) and Newton,
• Four-humors theory → modern medicine.

These historical model shifts are often interpreted in philosophy of science as paradigm changes or research programme transformations (Kuhn 1962; Lakatos 1970), but the utility structure proposed here provides a more precise formal reconstruction.

Technological, mathematical, and institutional innovations reshape E(t) and accelerate global falsification. Thus E(t) determines not only which models disappear but also which ones can be considered realistic alternatives in the first place.



7. Why Models Persist



7.1 Robustness Through Model Families

Many models do not exist as single structures but as entire model families. Falsification therefore usually targets variants or specific parameterizations rather than the entire class. Model families possess structural redundancies that allow errors in some submodels to be compensated without abandoning the overall approach. This explains why falsification often results merely in a shift within the family rather than the elimination of the class. This observation aligns with model-theoretic literature, in which model families are described as structured spaces of possible variants (Weisberg 2013).
The domain structure introduced here refines this insight by showing how stability and variation within a model class depend on each other in a formally precise way.

7.2 Contexts of Use

Models persist because they are:
• didactically useful,
• computationally efficient,
• technically standardized.

Contextual falsification corrects domains but does not eliminate the model.
The persistence of a model in D₁ despite falsification in D₂ follows from the utility structure U(M, D₁) and from the stabilizing institutional and technical contexts within the epistemic enabling space E(t).

The existence of stable contexts of use is a function of E(t): institutions, data formats, software libraries, and training structures stabilize models independently of their global approximate truth. This explains why models with low AT(M, D₂) but high AT(M, D₁) remain rational to use.
This corresponds to scientific practice in many disciplines, where models are treated as tool-like building blocks whose validity is context-dependent rather than globally assessed.





8. Case Study: Climate Models



8.1 Climate Models as Ensemble Structures

Climate models are not single models but complex ensembles composed of various model variants and scenarios (Oreskes et al. 1994):
E = {M₁, M₂, …, Mn}.
This ensemble combines different physical assumptions, parameterizations, and initial conditions.
Falsification therefore rarely affects the ensemble as a whole but typically targets specific components:
• ocean models,
• atmospheric parameterizations,
• cloud processes,
• biogeochemical modules.
The ensemble as a whole remains epistemically stable even when individual components are modified or replaced. This corresponds to established practice in climate science, where ensembles explicitly function as mechanisms of epistemic stability (see Oreskes et al. 1994; IPCC methodology). The domain structure of the ensemble is therefore more stable than that of individual models.



8.2 Parameterization as an Epistemic Operation


Many climate-relevant processes cannot be computed from fundamental physical equations at full resolution. Parameterizations function as epistemic bridges between physical theory and numerical feasibility.
Falsification typically marks the limits of such parameterizations, leading to local domain adjustments:
AT(M, D₂) ↓ → R(M, D₂).

Parameterizations therefore are not merely approximation techniques but operational domain definitions whose adjustment is a central mechanism of contextual falsification. The overarching modelling framework, however, remains intact. This is a paradigmatic case of contextual falsification.



8.3 Bayesian Updating and Adaptive Modelling


Modern climate models integrate Bayesian updating to incorporate new data systematically:
Posterior ∝ Likelihood × Prior.
A failed prediction reduces the likelihood of a submodel and thus its domain-specific approximate truth AT(M, D₂). But the ensemble absorbs this reduction by assigning higher likelihoods to alternative parameterizations or submodels.

This shows that falsification acts not eliminatively but redistributively: it shifts epistemic weights within the ensemble. Formally, this corresponds to weight shifts in the posterior-based model portfolio, where submodels with higher likelihoods contribute more strongly to the ensemble outcome.



8.4 Domain Structure

The functional components of climate models can be structured as follows:
• D₁: global temperature trends,
• D₂: regional precipitation patterns,
• D₃: extreme events,
• D₄: short-term climate variability (ENSO, AMO).

This clear domain structure shows that falsification acts primarily in D₂–D₄, while D₁ functions as a global stability anchor of the ensemble model. D₁ has remained stable for decades, which is why the ensemble is globally epistemically optimal.



8.5 Miniaturized Formal Illustration

A simplified example illustrates this:
Let two models M and M* be available.

AT(M, D₁) = 0.9, EK(M, D₁) = 0.8
AT(M*, D₁) = 0.7, EK(M*, D₁) = 0.8

AT(M, D₂) = 0.4, EK(M, D₂) = 0.5
AT(M*, D₂) = 0.8, EK(M*, D₂) = 0.9

For moderate costs C(M) ≈ C(M*) we have:
• In D₁, M remains epistemically optimal.
• In D₂, M* becomes optimal.
• Globally, both models remain part of the ensemble.

This illustrates the principle:
Contextual falsification reduces AT(M, D₂) without affecting AT(M, D₁) or global epistemic optimality.
This structure is characteristic of many simulation-based sciences, not only climate research.



8.6 Consequences for Epistemic Status

Climate models are often judged using a Popperian schema:
“A wrong prediction shows that the model is false.”
This is scientifically incorrect (Oreskes et al. 1994). Correctly understood: falsification acts domain-specifically and indicates which subcomponents require further development.

This demonstrates that Popper’s eliminative logic is inadequate for simulation-based model architectures and must be replaced by a domain-specific utility structure.

The climate-model case study is representative of modern, data- and simulation-intensive model architectures. The structure of the analysis is general: the theory developed here applies without additional assumptions to classical physical models, economic models, epidemiological models, or AI model architectures, because it relies solely on the formal conditions of domain-specific approximate truth, model costs, and the decision logic U(M, D).





9. Approximate Truth

Approximate truth explains why models can generate scientific progress despite errors. The concept, shaped especially by Niiniluoto (1987, 1998) and Oddie (1986), has rarely been systematically linked to model use, domain structure, and model costs.

Formally, approximate truth can be expressed using a similarity metric:
AT(M, D) = Σᵢ wᵢ · sim(M, Sᵢ, D),
where sim(M, Sᵢ, D) measures similarity between model predictions and observed system states within domain D, and wᵢ represent relevance weights.
In this framework, approximate truth is neither a Popperian truth criterion nor a simple fit metric but a domain-specific measure of structural similarity that plays a central role in model selection.

Key points:

  1. Approximate truth is always domain-specific.

  2. Falsification reduces AT(M, D₂) but leaves AT(M, D₁) unchanged.

  3. Global falsification cannot be defined by approximate truth alone.
    Even if AT(M, D) approaches zero in many cases of global elimination, the decisive factor is the utility structure:
    U(M, D) < U(M*, D) for all relevant domains D.

By embedding approximate truth in the utility function U(M, D), the framework shows that truthlikeness does not operate in isolation but determines epistemic optimality only in interaction with explanatory power and model costs. Within domains, approximate truth provides a quantitative dimension of epistemic improvement.

For practical application, both the similarity function sim(M, Sᵢ, D) and the weights wᵢ must be context-dependent and empirically operationalizable.
In many scientific fields, sim(M, Sᵢ, D) directly corresponds to established fit and error metrics, such as likelihood functions, variance measures, forecast errors, residual analyses, or similarity indices between model trajectories and observed system states.
In more theoretical or structurally oriented sciences, similarity metrics may also capture qualitative or topological features, such as symmetry preservation, invariance structures, or the reproduction of causal dependencies.

The weights wᵢ represent the relative relevance of different system properties within a domain. They may be empirically determined by disciplinary standards or derived from model-theoretic considerations that specify which features of a system are central for model quality.

Approximate truth thus provides an adaptable structure that captures both numerical accuracy and structural fit.



10. Why an Integrated Theory Has Been Missing



10.1 Historical Reasons

Before the digital age, many scientific models were formulated analytically, which made domain structures less visible. Only with numerical simulations and large-scale data models did contextual falsification become a central feature of scientific practice.



10.2 Logical Reasons

Popper’s approach was designed primarily as a truth criterion. Modern models, however, are:
• approximative,
• domain-specific,
• dynamically recalibratable.

A binary notion of truth is unsuitable because approximate truth varies across domains. Classical logical structures cannot capture the operational complexity of modern modelling; they do not represent gradational approximation or domain-specific weighting of relevance.



10.3 Institutional Reasons

Scientific practice is embedded in:
• computational capabilities,
• data norms,
• peer-review structures,
• funding regimes,
• technological infrastructures.

These factors shape the epistemic opportunity space E(t) and determine which models can be formulated and stabilized. Popper’s theory does not consider this dimension. The proposed framework introduces E(t) precisely as the structuring force of institutional and technical conditions and thereby as a central epistemic variable.





11. Epistemic Opportunity Space



11.1 Dynamics of the Epistemic Opportunity Space


The epistemic opportunity space E(t) evolves with technological, methodological, and institutional developments.
A model may be epistemically optimal at one time and no longer optimal later, even though its “truth” has not changed. This shows that E(t) itself functions as a dynamic epistemic variable whose evolution endogenously influences model optimality.

Successful models also reshape the opportunity space E(t): they establish new data formats, computational infrastructures, institutional standards, and research practices. Thus, there is a reciprocal dynamic between modelling practice and E(t), jointly shaping both the set of available models and the structure of future model alternatives.

Key drivers of changes in E(t) include:

  1. growth of data and improved measurement systems,

  2. new mathematical and statistical methods,

  3. increasing computational power,

  4. institutional standardization processes,

  5. evolution of scientific discourse.

These drivers can be conceptualized as change operators ΔE(t) that systematically expand or restrict the epistemic opportunity space.

E(t) determines:
• the set of models that are actually available,
• the structure of their domains,
• their costs C(M),
• the available alternatives M*.

Thus, the epistemic opportunity space is the central framework parameter of model choice.
In a historical perspective, E(t) explains why model transitions are often triggered by technological and institutional innovations that make new alternatives formulable for the first time.





11.2 Structure of the Epistemic Opportunity Space E(t)



For analytical precision, the epistemic opportunity space E(t) can be decomposed into three functional components that jointly determine which models can be formulated and stabilized:

  1. Eₘ(t): Methodological and mathematical conditions.
    This includes available mathematical techniques, statistical methods, modelling strategies, and algorithmic tools.
    They determine which types of models can be formulated and which approximations are permissible.

  2. Eₜ(t): Technical and data-related conditions.
    This dimension includes computational resources, data quality, software infrastructures, numerical tools, and simulation technologies.
    These factors determine which models can be implemented in practice and at what resolution, stability, or complexity.

  3. Eᵢ(t): Institutional and organizational conditions.
    This includes scientific norms, peer-review processes, funding structures, training pathways, and established research practices.
    They shape which models become stabilized over time, which standards prevail, and which alternatives are considered acceptable.

E(t) emerges from the interaction of these three dimensions.
Model availability and stabilization thus depend not only on approximate truth, explanatory power, or model costs but also on the methodological, technical, and institutional conditions that make formulation, implementation, and further development possible.





12. Decision Logic of Model Choice



12.1 Structure of the Utility Function

The choice between two models M and M* can be reconstructed using a time- and domain-specific utility function:

U(M, D, t) = α · AT(M, D, t) + β · EK(M, D, t) − γ · C(M, D, t).

This utility structure is compatible with established procedures of model selection.
Information criteria such as AIC and BIC operationalize specific forms of C(M) and EK(M, D), while Bayes factors and Bayesian Model Averaging represent probabilistic variants of approximate truth and explanatory power.

The proposed framework extends these approaches by embedding them explicitly in the epistemic opportunity space E(t) and by systematically incorporating institutional, technical, and economic model costs. Model choice is thus reconstructed not merely statistically but epistemically, grounded in real scientific practice.

A global assessment can be formulated using an aggregated utility function U(M) that weights U(M, D) across relevant domains.

Where:
• AT(M, D): domain-specific approximate truth,
• EK(M, D): explanatory power within domain D,
• C(M): model costs (data requirements, computational effort, institutional infrastructure),
• α, β, γ: context-dependent weighting factors that are institutionally and methodologically encoded in E(t).

This utility function unites classical ideas from philosophy of science (e.g., explanatory power in Hempel, truthlikeness in Niiniluoto) with modern decision-theoretic evaluation structures, making epistemic rationality quantitatively reconstructable.

Because many terms of the utility function can be approximated using empirical proxy metrics or established quantitative procedures, the proposed decision logic can often be applied to real model portfolios and empirically examined.
The function is an ideal-typical framework showing how widely used fit metrics, information criteria, complexity measures, and institutional constraints can be integrated in one evaluation structure.

The operationalization of individual components—especially explanatory power and institutional model costs—is context-dependent, but in modern scientific practice usually feasible.

Contextual falsification reduces AT(M, D₂), which lowers U(M, D₂), without affecting AT(M, D₁) or U(M, D₁).
Thus, domain D becomes an explicit variable epistemic unit shaped by the opportunity space E(t).
Falsification is therefore not a global elimination operator but a condition for model adjustment:
• falsification in D₂ → U(M, D₂) decreases,
• U(M, D₁) remains high,
• M continues to possess epistemic value in D₁.

Global falsification occurs exactly when, for all relevant domains D:
U(M, D) < U(M*, D).

Models disappear not because they are “false” but because their relative epistemic performance is lower in every domain.

The utility structure explains why model families remain robust:
If U(M₁, D₂) drops, models M₂ or M₃ within the same family may still have higher U-values. Falsification becomes a mechanism of internal reallocation of epistemic weights, not Popperian elimination.

To make this utility structure not only theoretical but empirically usable, U(M, D) can be refined via a standardized scaling and operationalization scheme integrating the established quality measures and error metrics of each discipline.

The linear form is a pragmatic and compatible approximation capturing the primary evaluation dimensions—approximate truth, explanatory power, and model costs.
But the structure is not limited to linearity: nonlinear or interaction-based evaluation forms may be appropriate, for example when explanatory power matters only above a minimum level of approximate truth, or when model costs involve domain-specific threshold effects.

Model costs C(M) may contain both global and domain-specific components.
While base costs of implementation or infrastructure are global, computational and data-related costs vary between domains.
In such cases C(M, D) can replace C(M) without changing the fundamental structure.

Crucial is that the utility function, regardless of form, reconstructs epistemic optimality by showing how approximate truth, explanatory power, and costs are weighed within the opportunity space E(t).





12.2 Scaling and Empirical Grounding of the Utility Function

To ensure that the utility function U(M, D) is not only theoretically precise but directly usable, it can be operationalized using established evaluation methods of each discipline.

Calibration is relative:
Within a domain D, the model with the best empirical performance receives value 1, the minimally acceptable model receives 0. All other models are positioned between these values via linear or functional normalization.

Thus, a continuous scale emerges without requiring a physical measurement medium.
Differences between models span the evaluation space that the utility function maps.
No new data are required: the approach uses existing error metrics, quality measures, likelihood structures, and complexity indicators.

The epistemic opportunity space E(t) is thereby operationalized through available empirical information.
Models are not evaluated in a vacuum but via the existing data and methodological standards of a discipline.
The utility function U(M, D) becomes a standardized projection of this information, enabling systematic comparison and model selection across domains.

The utility function has both descriptive and weakly normative aspects:
• Descriptively, it captures how model choice actually works in data- and simulation-intensive sciences by balancing approximate truth, explanatory power, and costs.
• Normatively, it clarifies under which conditions such choices count as epistemically rational—without imposing external norms.

Science is not externally regulated; rather, the framework renders explicit the evaluative dimensions already operative in practice.



13. Relation to Scientific Realism

The integrated theory occupies a position between structural realism and instrumental model pluralism.
In contemporary philosophy of science, this middle position has become increasingly relevant because many modern sciences can neither operate as fully realist nor fully instrumentalist.

Models are partially true insofar as they approximate real structures (approximate truth as the degree of structural similarity).
This aligns with structural realism, since truth is not attributed to entire models but to stable relational structures that persist within specific domains.

However, their stability also depends on practical factors:
• costs,
• institutional stabilization,
• availability of alternative models,
• evolution of the opportunity space E(t).

Thus, the framework extends structural realism by showing that approximate truth is not the sole determinant of rational scientific practice.
At the same time, it extends instrumental model pluralism by formalizing when models remain epistemically optimal despite partial falsification.

Within current realism debates, the approach is precisely positionable.
It is close to structural realism (e.g., Worrall 1989; Psillos 1999) because approximate truth is interpreted as an approximation to stable relational structures.
It takes a broader perspective than van Fraassen’s constructive empiricism (1980), since it shows that operative scientific rationality depends not only on empirical adequacy but also on institutional, technical, and explanatory factors represented in the opportunity space E(t).

In contrast to da Costa and French’s concept of partial truth (2003), the present framework embeds approximate truth directly into a utility function U(M, D) that also includes explanatory power and model costs.
Scientific rationality thus becomes a decision problem situated within a dynamic opportunity space.

Because of this integrative structure, the approach positions itself between structural realism and instrumental pluralism while combining insights from both into a unified framework that directly reflects the empirical modelling practices of modern sciences.

This structure is especially relevant in data-intensive and simulation-based sciences such as climate research, economics, epidemiology, and AI modelling.
The framework offers a precise way to analyse model stability and model replacement across these fields.



14. Conclusion



Falsification is not a binary elimination mechanism but a domain-specific tool for refining scientific models.
The theory developed here unifies core insights from Popper, Kuhn, Lakatos, da Costa/French and contemporary model theory within a formally explicit, integrated framework.

By distinguishing between global and contextual falsification, it becomes clear that empirical discrepancies primarily lead to domain restriction rather than complete abandonment of a model.

Integrating approximate truth, explanatory power, and model costs into a domain-specific utility function shows that model stability results from relative epistemic optimality, not merely from truthlikeness.
Models disappear only when they are no longer epistemically competitive in any domain and when the epistemic opportunity space E(t) no longer provides stabilized contexts of use.

The concept of the epistemic opportunity space explains why technological, institutional, and methodological changes can have deep effects on scientific model landscapes.
Model choice is thereby reconstructed as a dynamic process in which scientific rationality is not expressed through a single logical operation but through a structured interplay of context, approximation, and institutional stabilization.

The proposed theory thus offers a coherent framework for analysing modern modelling practices, particularly in data-intensive and simulation-based sciences.
It clarifies how models remain epistemically stable despite partial falsification and formulates the precise conditions under which global elimination is rational.
Future work could empirically test the utility function, further formalize the opportunity space E(t), and apply the framework to additional scientific model architectures.



References



Cartwright, Nancy. 1983. How the Laws of Physics Lie. Oxford: Oxford University Press.
Cartwright, Nancy. 1999.
The Dappled World: A Study of the Boundaries of Science. Cambridge: Cambridge University Press.

Da Costa, Newton C. A., and Steven French. 2003. Science and Partial Truth: A Unitary Approach to Models and Scientific Reasoning. Oxford: Oxford University Press.

Einstein, Albert. 1905. “On the Electrodynamics of Moving Bodies.” Annalen der Physik 17: 891–921.

Hacking, Ian. 1983. Representing and Intervening. Cambridge: Cambridge University Press.

Hempel, Carl G. 1965. Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. New York: Free Press.

IPCC. 2021. Climate Change 2021: The Physical Science Basis. Cambridge: Cambridge University Press.

Kepler, Johannes. 1609. Astronomia Nova. Heidelberg: Jonas Sabber.
Kepler, Johannes. 1619.
Harmonices Mundi. Linz: Johann Planck.

Kuhn, Thomas S. 1962. The Structure of Scientific Revolutions. Chicago: University of Chicago Press.

Lakatos, Imre. 1970. “Falsification and the Methodology of Scientific Research Programmes.” In Criticism and the Growth of Knowledge, edited by Imre Lakatos and Alan Musgrave, 91–196. Cambridge: Cambridge University Press.

Maxwell, James Clerk. 1865. “A Dynamical Theory of the Electromagnetic Field.” Philosophical Transactions of the Royal Society of London 155: 459–512.

Morgan, Mary S., and Margaret Morrison. 1999. Models as Mediators: Perspectives on Natural and Social Science. Cambridge: Cambridge University Press.

Niiniluoto, Ilkka. 1987. Truthlikeness. Dordrecht: Reidel.
Niiniluoto, Ilkka. 1998. “Approximation in Science.”
Poznan Studies in the Philosophy of the Sciences and the Humanities 63: 97–109.

Oddie, Graham. 1986. Likeness to Truth. Dordrecht: Reidel.

Oreskes, Naomi, Kristin Shrader-Frechette, and Kenneth Belitz. 1994. “Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences.” Science 263 (5147): 641–646.

Popper, Karl R. [1934] 1959. The Logic of Scientific Discovery. London: Hutchinson.

Psillos, Stathis. 1999. Scientific Realism: How Science Tracks Truth. London: Routledge.

Putnam, Hilary. 1978. Meaning and the Moral Sciences. London: Routledge.

van Fraassen, Bas C. 1980. The Scientific Image. Oxford: Clarendon Press.

Weisberg, Michael. 2013. Simulation and Similarity: Using Models to Understand the World. Oxford: Oxford University Press.

Worrall, John. 1989. “Structural Realism: The Best of Both Worlds?” Dialectica 43 (1–2): 99–124.