Computes a size-neutral, trend-aware stress score S(bank) ∈ [0,1] for the latest year using 2023–2025 data. Current position (2025) + trajectory (slope across 3 years) feed a single PCA — ready as the initial shock vector in a contagion model.
A bank's raw financial numbers cannot be compared directly. Two banks might both have strong fundamentals, but because one is ten times larger, its absolute Gross NPA and Net Profit numbers will dwarf the smaller bank's — creating a false signal of stress.
Canara Bank's Gross NPA: ₹46,159 cr | IDFC First's Gross NPA: ₹3,884 cr
Using absolute values, Canara scores stress = 1.0 even though its Net NPA % (0.7%) is lower
than IDFC's (0.86%). Every metric must be a ratio — normalised by assets, deposits,
or advances — so large and small banks are on equal footing.
A bank at NPA = 5% trending toward 8% is riskier than one stable at 5% — even if their 2025 snapshots look identical. v2 adds 8 trend-slope features (one per metric) computed from 2023→2024→2025 z-scores. PCA now sees both where the bank stands and where it is heading. Slopes are computed on the already-z-scored values so they capture relative deterioration vs. peers, not industry-wide macro moves.
performance_metrics.
Each document contains yearly sub-objects (2023, 2024, 2025). Flatten into a single
DataFrame with one row per (bank, year) pair — up to 3× more rows than the old
2-year version.stressed_z = z × (−1 × direction).
This also makes trend slopes directionally consistent: positive slope = worsening.trend_NPA, trend_CAR, …). Slope > 0 means the metric is
worsening relative to peers year-on-year. Banks with only one year of data receive
slope = 0. These slopes are then standardised (z-scored) before entering PCA so their
scale matches the snapshot features.[trend] prefix on trend features so you can
see which dimensions drive the score.fundamental_stress_scores.csv. The column
fundamental_stress_normalized is S(bank) — your initial shock vector
for contagion. One row per bank, no per-year duplication.
All metrics are percentages or ratios normalised by a size denominator (assets, deposits, advances). Absolute values like Gross NPA and Net Profit were deliberately excluded to prevent size bias. Each metric also generates a corresponding trend feature in v2.
The pipeline reports missing percentages before imputation. Understanding these is critical — high missingness means the imputed median is doing the work, not real data.
| Metric | Missing % | Verdict |
|---|---|---|
| Net NPA as % to Net Advances | ACCEPTABLE | |
| Capital Adequacy Ratio (Basel-III) | ACCEPTABLE | |
| Provision Coverage Ratio (%) | PROBLEM | |
| Credit Deposit Ratio | MODERATE | |
| Investment Deposit Ratio | ACCEPTABLE | |
| Return on Assets (%) | ACCEPTABLE | |
| Spread as % of Total Assets | ACCEPTABLE | |
| Operating Expenses as % to Total Expenses | ACCEPTABLE |
77.3% of Provision Coverage Ratio values are filled with the cross-sectional median — meaning
only ~23% of banks actually reported this metric. The imputed values are identical for all
non-reporters, contributing near-zero unique signal to PCA. Consider either sourcing this data
from a more complete dataset, or removing PCR from METRICS until coverage improves.
The 11–14% missing on other metrics is expected: foreign/small banks and cooperative banks often
skip IBA sub-metrics.
Example of what the ranked output looks like (one row per bank, no per-year duplication). Higher bar = more stressed relative to the 2025 peer universe, incorporating 2023–2025 trajectory.