AI in Credit Risk

Summary: AI plays very different roles across the credit risk workflow — LLMs don’t improve default prediction on tabular data, but they excel as reasoning partners for model design, and agents can compress end-to-end model development to hours.

Sources: raw/articles/subbu-venkataramanan-2026-05-12.md, raw/articles/simon-taylor-2026-04-26.md, raw/call-notes/shrikant-2026-05-11.md, raw/call-notes/carlos-2026-05-10.md, raw/call-notes/jie-2026-05-16.md

Last updated: 2026-05-17


The Three-Part Answer

Per Subbu Venkataramanan (DiscreteKernel Systems):

QuestionAnswerWhy
Can AI replace credit scoring models?NoPredicting future events on tabular data — LLMs overfit; classical ML wins
Can AI help design credit models?YesLLMs as expert reasoning partners synthesize domain knowledge instantly
Can AI agents execute model development?YesEnd-to-end workflow compression to ~1 hour with grounded agents

Where AI Does NOT Help: Default Prediction on Tabular Data

A retail credit risk dataset is highly structured — bureau attributes, application fields, alternate data (LexisNexis), and month-wise payment behavior used to construct a binary “BAD” flag.

Best-performing techniques for this task remain:

  • Logistic Regression (parametric, fully explainable)
  • GAMs — Generalized Additive Models (semi-parametric)
  • Random Forests
  • Gradient Boosting — XGBoost, LightGBM

Why LLMs lose here: credit risk modeling is a future-prediction problem. LLMs are prone to overfitting on structured tabular data and lack the statistical rigor these methods provide.

Regulatory constraint: Adverse action reasoning, full explainability, and repeatability are required by compliance. Complexity is an enemy, not an asset.

Decisioning rules use simple decision trees (CART or CHAID) — still inherently an interactive process between human judgment and algorithm, with visual inspection and expert overrides. (Source: jie-2026-05-16 — Capital One’s decisioning team faces interpretability requirements from the fair lending act.)


Where AI DOES Help: Design and Strategy

LLMs function as expert thought partners in the design phase. Subbu’s example:

Three focused questions to ChatGPT about a Student Loan Credit Risk Score produced a fully articulated data procurement strategy and model development framework — output that would normally take hours of internal discussion.

The LLM didn’t build the model. It synthesized domain knowledge into an actionable blueprint.

Contrast with foundation models: Revolut’s PRAGMA achieved +130% PR-AUC uplift in credit scoring vs. production ML models — but this was a behavioral foundation model trained on 24B banking events, not a general-purpose LLM. The distinction matters. See pragma-revolut.


Where AI DOES Help: Execution Agents

Formula:

Reasoning Machine + Internal Best Practice Document + Modeling Platform + Agent = True Improvement in Total Factor Productivity

Sequencing:

  1. LLM as reasoning machine for design and data procurement strategy
  2. Capture institutional knowledge in an internal document (conversational format works)
  3. Agent executes: EDA → Feature Engineering → Target Variable → Segmentation → Modeling → Pitch Deck → SR 11-7 documentation

Result: ~1 hour end-to-end, with agents grounded in a modeling platform (e.g., Model Dragon) to eliminate hallucination surface area.


Reject Inferencing

Whether to perform reject inference — and which methodology — will always remain a judgment-driven decision (influenced by budget and operational constraints). AI-enabled workflows support that judgment by providing transparent reasoning and rapidly generating results across methodological choices.


Key Tension: Explainability vs. Performance

The biggest constraint in production credit AI is regulatory: interpretability is required for any model influencing a credit decision. This creates a ceiling on how complex models can be — and means foundation models face adoption hurdles in credit decisioning even if they perform better. The fair lending act and similar regulations apply.

Revolut’s PRAGMA result (+130% in credit scoring) is notable precisely because it cuts through this. Whether regulators will accept transformer-based credit models at scale is an open question. Subbu believes a shared responsibility framework for AI (analogous to cloud) is needed — and expects V1 to emerge at industry forums.