PD Model Backtesting in the Spotlight: What the EBA's 2026 Paper Means for European Banks

For most of the past two decades, PD models played a central role in the European banking sector with little public evidence from PD model backtesting on how they actually perform. They drive credit risk measurement under the IRB approach, they translate directly into regulatory capital, and they shape pricing across entire portfolios. Yet the evidence on whether they work was available inside internal validation reports, and while supervisory initiatives such as the ECB's Targeted Review of Internal Models (TRIM) probed them intensively, the findings stayed largely within confidential supervisory channels.
The new EBA Staff Paper Systematic Backtesting of Probability of Default Models with Regulatory Data, published in April 2026 by Casellina, Chionsini, Kopp, and Riabi, takes a deliberate step to change that. It applies systematic PD model backtesting to an EU-wide regulatory dataset, proposes a sharper statistical method, and — for the first time at this scale — translates the findings into a number the board is interested in: the impact on Tier 1 capital. For risk and capital functions, that is a shift that matters: there is comparable, sector-wide evidence on whether PD models hold up, and a capital number is attached to the answer.
The implications of the paper reach beyond the modelling teams. This is relevant for risk committees, capital planning, supervisory dialogue, and the long-running debate over the comparability of internal models across European banks.
Executive Summary
- Between roughly 3% and 17% of SME corporate exposures appear miscalibrated, depending on which statistical method is applied.
- Dispersion across banks is wide: around 20% show no miscalibrations at all, while a small number exceed 50%.
- A prudent recalibration of those models would reduce system-wide Tier 1 capital ratios by 4 to 10 basis points — material at firm level for some institutions, manageable in aggregate.
- The authors propose a corrected binomial test that accounts for both asset correlation and serial correlation, addressing a gap the existing PD model validation literature had largely neglected.
- The paper floats the idea of a public EBA backtesting dashboard for PD models — a quiet but consequential shift from private supervisory dialogue to transparent peer comparison.
Why PD Model Validation Became a Boardroom Topic
The trust placed in internal models — and the IRB model validation that is meant to safeguard it — has been under steady pressure for years. Output floors under the finalised Basel framework, IRB repair guidelines, prolonged model approval timelines, and the broader debate around regulatory simplification — from the Draghi report to the Letta paper — are all related to an uncomfortable question: can we trust banks' internal models, and how would anyone outside the bank actually know?
That question has historically been answered through bilateral supervisory dialogue. A bank ran its validation. The Joint Supervisory Team reviewed it. Findings flowed back through confidential channels. Outsiders — investors, peers — had little to go on beyond aggregate RWA variability studies.
The new EBA paper is the first sign that this arrangement is changing. By demonstrating that supervisory backtesting can be done systematically across EU IRB banks using data the EBA already collects, it opens the door to comparable, repeatable, sector-wide evidence on PD model performance. Whether that ends in a public dashboard or simply sharper supervision, the direction of travel is one banks should plan around.
Inside the EBA's Approach to PD Model Backtesting
The paper does three things, each worth understanding on its own terms.
First, the authors put the EBA's proprietary backtesting dataset to work. The dataset stitches together obligor counts, exposures at default, reported PDs, and observed default rates per rating grade, for every IRB bank reporting under the CRR. That in itself is a significant contribution to the field of credit risk model backtesting: most academic research has been hampered by a lack of data, and most studies by banking supervisors have focused only on single countries.
Second, they refine the workhorse statistical test used in PD model validation — the binomial test — to better capture actual default behaviour. We will return to that in a moment, because it is the central technical focus of the paper.
Third, and from a business standpoint most interestingly, they take the bridge from "rating grade X failed its test" all the way to "this is what it would cost in regulatory capital if you recalibrated it." That last step is what turns the paper from a methodology note into a piece of work risk executives should actually read.
The Corrected Binomial Test for PD Model Validation: Asset Correlation, Serial Correlation, and the Trouble with Independence Assumptions
The standard binomial test assumes that defaults occur independently of one another, as if each obligor were tossing its own private coin. Anyone who has lived through 2008, 2020, or even a regional recession knows that defaults do not happen that way. Defaults cluster across firms (asset correlation, driven by shared exposure to macroeconomic shocks) and across time (serial correlation, the simple observation that bad years tend to follow bad years).
When you ignore clustering, the test becomes oversensitive: it classifies well-functioning PD models as faulty. The existing literature has long offered corrections for asset correlation, drawing on Vasicek-style single-factor models. What the literature has largely lacked is a correction for serial correlation. The authors of the paper show with extensive Monte Carlo evidence that even the asset-correlation-only correction over-rejects sound models when defaults are persistent over time. Their proposed correction handles both effects at once and brings the false-positive rate back down to the target 5%.
For senior risk managers, the practical takeaway is sobering. The validation test most banks rely on probably tells you that more of your PD models are broken than actually are. That has tangible consequences: unnecessary recalibrations, instability in model governance, repeated supervisory back-and-forth, and a quiet erosion of organisational confidence as teams chase statistical ghosts.
What Miscalibration Costs: From PD Estimates to Tier 1 Capital
The authors do not stop at counting failed rating grades. They trace the capital impact: from failures in the rating assignments through to PDs. They then incorporate this into the IRB risk weighting formula and finally aggregate the result for the entire EU banking system.
The headline numbers: between 3% and 17% of SME corporate exposures look miscalibrated, depending on whether you apply the uncorrected binomial test, the asset-correlation-corrected version, or the new fully corrected version. Recalibrating prudently would cost the system 4 to 10 basis points of Tier 1 capital.
A few observations are worth pulling out:
- The range itself is the message. The same portfolio can look 3% broken or 17% broken; not because the model changed, but because the choice of the statistical test materially changes the conclusion about whether a model is broken. Banks and supervisors that rely on the wrong correction will reach the wrong answer.
- A 4-to-10 basis point system-wide hit is absorbable. But system averages hide outliers. For a specific institution sitting at the wrong end of the dispersion, the local capital impact could be a multiple of that.
- The exercise covers SME corporate PD models specifically. The methodology is portable to other portfolios — retail mortgages, large corporates, specialised lending — effectively a template for systematic credit risk model backtesting, and there is every reason to expect the EBA to extend it.
Calibration choices move regulatory capital. This makes PD validation a capital allocation question for the board, not just a technical exercise for the model team.
Dispersion Across Banks and the Path Toward Comparability of Internal Models
One of the more uncomfortable tables in the paper is the bank-by-bank breakdown of SME corporate PD model miscalibration. Around 20% of institutions in the sample show no PD model miscalibrations at all. A handful sit above 50%. The middle of the distribution is broad and unevenly populated.
That spread is exactly the kind of evidence the long-running debate about RWA variability has been missing. For years, regulators and academics have asked whether two banks looking at similar portfolios produce similar risk weights — and if not, why not. Studies have hinted at meaningful divergence, but few have been able to attribute it cleanly to model calibration versus portfolio composition versus methodological choice. The EBA's backtesting framework offers a direct lens onto one piece of that puzzle: are reported PDs holding up against observed default rates, and where are the gaps largest?
The proposed public dashboard would take this from supervisory tool to peer benchmark. Banks should assume that, even without a formal dashboard, this kind of comparison is already being run inside the EBA, the ECB, and national competent authorities.
What This Means for Risk Executives Right Now
A few priorities follow from all of this, in roughly the order they should be picked up.
Replicate the test internally. The methodology in the paper is reproducible. Validation teams should apply the corrected binomial test — handling both asset correlation and serial correlation — to their own PD model portfolios, starting with SME corporates. The point is less to predict whether the supervisor will flag a given rating grade than to remove surprise from the conversation.
Map your position in the peer distribution. Even without the dashboard, banks have enough information from their own data, public disclosures, and EBA transparency exercises to form a well-founded picture of their current situation. Institutions with above-peer margins of conservatism should be ready to explain why. Institutions with below-peer margins should be ready to explain that too.
Review the role of margins of conservatism (MoC) and recalibration triggers. The paper notes a declining share of miscalibrated exposures over recent years, plausibly tied to higher MoCs and improved data quality following the IRB repair. Banks that have already added cushions are partly insulated; those that have not should examine whether their current triggers would catch the kind of mismatch the EBA is now flagging.
Translate the capital impact into board language. The capital impact is small in aggregate but real at the institution level. Risk, finance, and capital planning should be aligned on what a prudent recalibration of the most exposed PD models would mean for the CET1 ratio under both base and stress scenarios.
Prepare for transparency. The era in which internal model performance was largely a private conversation between a bank and its supervisor is drawing to a close. The banks that get ahead of this by stress-testing their own models against the EBA's methods, and being ready to explain their calibration choices, will be the ones that benefit most when supervisory and market attention turns to internal-model performance.
Frequently Asked Questions
How does the EBA backtest PD models across European banks?
The EBA uses a proprietary regulatory dataset built from COREP C 08.05 returns, which capture obligor counts, exposures, reported PDs, and observed default rates per rating grade for every IRB bank in the EU. The April 2026 Staff Paper applies a corrected binomial test to this data to compare predicted PDs against realised defaults, rating grade by rating grade, bank by bank.
What is the difference between asset correlation and serial correlation in PD model validation?
Asset correlation describes the tendency of defaults to cluster across firms in the same year because they share exposure to common macroeconomic factors. Serial correlation describes the tendency of defaults to cluster across time, so that bad years often follow bad years. The standard binomial test ignores both. Existing corrections typically address only the first; the EBA paper provides a correction that handles both at once.
What is the impact of PD model recalibration on Tier 1 capital?
Across EU IRB banks, the EBA paper estimates that prudent recalibration of miscalibrated SME corporate PD models would reduce system-wide Tier 1 capital ratios by approximately 4 to 10 basis points. The figure at individual bank level can be materially higher or lower, depending on portfolio composition and how conservative existing estimates already are.
Does the new methodology mean fewer PD models will fail validation?
In most cases, yes. The corrected binomial test is less prone to flagging well-calibrated models as faulty when defaults cluster across time. For banks, this should reduce the volume of unnecessary recalibrations and the supervisory back-and-forth that accompanies them.
What is the EBA backtesting dashboard for PD models, and is it coming?
The authors propose a public dashboard summarising PD model performance across EU banks using the EBA's regulatory dataset. It is not yet policy. But the methodology and data infrastructure are in place, and the broader supervisory direction in Europe is toward more transparency on internal model performance, not less.
How does this relate to IRB repair and the output floor?
The IRB repair programme tightened the rules around PD, LGD, and EAD estimation and has likely contributed to the declining share of miscalibrated exposures observed in recent years. The output floor caps the benefit institutions can derive from internal models versus the standardised approach. The EBA's backtesting work complements both: it provides empirical evidence on whether the post-repair calibrations are actually holding up against realised default experience.
What should banks do before any public disclosure of PD model performance becomes reality?
Run the EBA's methodology internally, identify rating grades and portfolios at risk of being flagged, review margins of conservatism, and prepare a clear narrative for the board, the supervisor, and — eventually — the market on how the bank's PD models perform relative to peers.
Related articles
Continue exploring with related insights from our experts.

Credit Risk Modeling Trends 2026: Five Shifts Risk Managers Should Prepare For
The credit risk function of 2026 looks materially different from the one most banks still operate. Here are the five shifts, from generative AI to ESG integration, that risk managers should plan for now.

Less & Faster IRB Model Changes — What Actually Changed (and Why It Matters)
How the new IRB rules transform many previously time-consuming model changes into simple notifications—thereby drastically shortening approval times and significantly accelerating implementation

ESG Dashboard: Structure, KPIs & Tools for CSRD Sustainability Reporting
An ESG dashboard makes sustainability performance visible and auditable. This guide covers essential environmental, social, and governance KPIs, CSRD/ESRS alignment, data collection strategies, and tool selection for organizations building audit-ready ESG reporting.