AI in Credit Risk
Summary: AI plays very different roles across the credit risk workflow — LLMs don’t improve default prediction on tabular data, but they excel as reasoning partners for model design, and agents can compress end-to-end model development to hours.
Sources: raw/articles/subbu-venkataramanan-2026-05-12.md, raw/articles/simon-taylor-2026-04-26.md, raw/call-notes/shrikant-2026-05-11.md, raw/call-notes/carlos-2026-05-10.md, raw/call-notes/jie-2026-05-16.md
Last updated: 2026-05-17
The Three-Part Answer
Per Subbu Venkataramanan (DiscreteKernel Systems):
| Question | Answer | Why |
|---|---|---|
| Can AI replace credit scoring models? | No | Predicting future events on tabular data — LLMs overfit; classical ML wins |
| Can AI help design credit models? | Yes | LLMs as expert reasoning partners synthesize domain knowledge instantly |
| Can AI agents execute model development? | Yes | End-to-end workflow compression to ~1 hour with grounded agents |
Where AI Does NOT Help: Default Prediction on Tabular Data
A retail credit risk dataset is highly structured — bureau attributes, application fields, alternate data (LexisNexis), and month-wise payment behavior used to construct a binary “BAD” flag.
Best-performing techniques for this task remain:
- Logistic Regression (parametric, fully explainable)
- GAMs — Generalized Additive Models (semi-parametric)
- Random Forests
- Gradient Boosting — XGBoost, LightGBM
Why LLMs lose here: credit risk modeling is a future-prediction problem. LLMs are prone to overfitting on structured tabular data and lack the statistical rigor these methods provide.
Regulatory constraint: Adverse action reasoning, full explainability, and repeatability are required by compliance. Complexity is an enemy, not an asset.
Decisioning rules use simple decision trees (CART or CHAID) — still inherently an interactive process between human judgment and algorithm, with visual inspection and expert overrides. (Source: jie-2026-05-16 — Capital One’s decisioning team faces interpretability requirements from the fair lending act.)
Where AI DOES Help: Design and Strategy
LLMs function as expert thought partners in the design phase. Subbu’s example:
Three focused questions to ChatGPT about a Student Loan Credit Risk Score produced a fully articulated data procurement strategy and model development framework — output that would normally take hours of internal discussion.
The LLM didn’t build the model. It synthesized domain knowledge into an actionable blueprint.
Contrast with foundation models: Revolut’s PRAGMA achieved +130% PR-AUC uplift in credit scoring vs. production ML models — but this was a behavioral foundation model trained on 24B banking events, not a general-purpose LLM. The distinction matters. See pragma-revolut.
Where AI DOES Help: Execution Agents
Formula:
Reasoning Machine + Internal Best Practice Document + Modeling Platform + Agent = True Improvement in Total Factor Productivity
Sequencing:
- LLM as reasoning machine for design and data procurement strategy
- Capture institutional knowledge in an internal document (conversational format works)
- Agent executes: EDA → Feature Engineering → Target Variable → Segmentation → Modeling → Pitch Deck → SR 11-7 documentation
Result: ~1 hour end-to-end, with agents grounded in a modeling platform (e.g., Model Dragon) to eliminate hallucination surface area.
Reject Inferencing
Whether to perform reject inference — and which methodology — will always remain a judgment-driven decision (influenced by budget and operational constraints). AI-enabled workflows support that judgment by providing transparent reasoning and rapidly generating results across methodological choices.
Key Tension: Explainability vs. Performance
The biggest constraint in production credit AI is regulatory: interpretability is required for any model influencing a credit decision. This creates a ceiling on how complex models can be — and means foundation models face adoption hurdles in credit decisioning even if they perform better. The fair lending act and similar regulations apply.
Revolut’s PRAGMA result (+130% in credit scoring) is notable precisely because it cuts through this. Whether regulators will accept transformer-based credit models at scale is an open question. Subbu believes a shared responsibility framework for AI (analogous to cloud) is needed — and expects V1 to emerge at industry forums.