In Lending as in Baseball, Moneyball Is No Longer Enough

by Jim Bander 5 min read October 26, 2018

In 2011, data scientists and credit risk managers finally found an appropriate analogy to explain what we do for a living. “You know Moneyball? What Paul DePodesta and Billy Beane did for the Oakland A’s, I do for XYZ Bank.” You probably remember the story: Oakland had to squeeze the most value out of its limited budget for hiring free agents, so it used analytics — the new baseball “sabermetrics” created by Bill James — to make data-driven decisions that were counterintuitive to the experienced scouts. Michael Lewis told the story in a book that was an incredible bestseller and led to a hit movie. The year after the movie was made, Harvard Business Review declared that data science was “the sexiest job of the 21st century.”

Coincidence?

The importance of data

Moneyball emphasized the recognition, through sabermetrics, that certain players’ abilities had been undervalued. In Travis Sawchik’s bestseller Big Data Baseball: Math, Miracles, and the End of a 20-Year Losing Streak, he notes that the analysis would not have been possible without the data. Early visionaries, including John Dewan, began collecting baseball data at games all over the country in a volunteer program called Project Scoresheet. Eventually they were collecting a million data points per season. In a similar fashion, credit data pioneers, such as TRW’s Simon Ramo, began systematically compiling basic credit information into credit files in the 1960s.

Recognizing that data quality is the key to insights and decision-making and responding to the demand for objective data, Dewan formed two companies — Sports Team Analysis and Tracking Systems (STATS) and Baseball Info Solutions (BIS). It seems quaint now, but those companies collected and cleaned data using a small army of video scouts with stopwatches. Now data is collected in real time using systems from Pitch F/X and the radar tracking system Statcast to provide insights that were never possible before. It’s hard to find a news article about Game 1 of this year’s World Series that doesn’t discuss the launch angle or exit velocity of Eduardo Núñez’s home run, but just a couple of years ago, neither statistic was even measured. Teams use proprietary biometric data to keep players healthy for games. Even neurological monitoring promises to provide new insights and may lead to changes in the game.

Similarly, lenders are finding that so-called “nontraditional data” can open up credit to consumers who might have been unable to borrow money in the past. This includes nontraditional Fair Credit Reporting Act (FCRA)–compliant data on recurring payments such as rent and utilities, checking and savings transactions, and payments to alternative lenders like payday and short-term loans. Newer fintech lenders are innovating constantly — using permissioned, behavioral and social data to make it easier for their customers to open accounts and borrow money. Similarly, some modern banks use techniques that go far beyond passwords and even multifactor authentication to verify their customers’ identities online. For example, identifying consumers through their mobile device can improve the user experience greatly. Some lenders are even using behavioral biometrics to improve their online and mobile customer service practices.

 

Continuously improving analytics

Bill James and his colleagues developed a statistic called wins above replacement (WAR) that summarized the value of a player as a single number. WAR was never intended to be a perfect summary of a player’s value, but it’s very convenient to have a single number to rank players.

Using the same mindset, early credit risk managers developed credit scores that summarized applicants’ risk based on their credit history at a single point in time. Just as WAR is only one measure of a player’s abilities, good credit managers understand that a traditional credit score is an imperfect summary of a borrower’s credit history. Newer scores, such as VantageScore® credit scores, are based on a broader view of applicants’ credit history, such as credit attributes that reflect how their financial situation has changed over time. More sophisticated financial institutions, though, don’t rely on a single score. They use a variety of data attributes and scores in their lending strategies.

Just a few years ago, simply using data to choose players was a novel idea. Now new measures such as defense-independent pitching statistics drive changes on the field.

Sabermetrics, once defined as the application of statistical analysis to evaluate and compare the performance of individual players, has evolved to be much more comprehensive. It now encompasses the statistical study of nearly all in-game baseball activities.

 

A wide variety of data-driven decisions

Sabermetrics began being used for recruiting players in the 1980’s. Today it’s used on the field as well as in the back office. Big Data Baseball gives the example of the “Ted Williams shift,” a defensive technique that was seldom used between 1950 and 2010. In the world after Moneyball, it has become ubiquitous. Likewise, pitchers alter their arm positions and velocity based on data — not only to throw more strikes, but also to prevent injuries.

Similarly, when credit scores were first introduced, they were used only in originations. Lenders established a credit score cutoff that was appropriate for their risk appetite and used it for approving and declining applications. Now lenders are using Experian’s advanced analytics in a variety of ways that the credit scoring pioneers might never have imagined:

  • Improving the account opening experience — for example, by reducing friction online
  • Detecting identity theft and synthetic identities
  • Anticipating bust-out activity and other first-party fraud
  • Issuing the right offer to each prescreened customer
  • Optimizing interest rates
  • Reviewing and adjusting credit lines
  • Optimizing collections

Analytics is no substitute for wisdom

Data scientists like those at Experian remind me that in banking, as in baseball, predictive analytics is never perfect. What keeps finance so interesting is the inherent unpredictability of the economy and human behavior. Likewise, the play on the field determines who wins each ball game: anything can happen. Rob Neyer’s book Power Ball: Anatomy of a Modern Baseball Game quotes the Houston Astros director of decision sciences: “Sometimes it’s just about reminding yourself that you’re not so smart.”

Related Posts

How Consumer Vehicle Choices Are Shaping Automotive Loan Trends

Conversations about rising auto loan balances and higher monthly payments has often centered around increasing vehicle prices and elevated interest rates; and while those factors have undoubtedly played a role, another important piece of the puzzle is the type of vehicles consumers are choosing to purchase. According to Experian’s Automotive Consumer Trends Report: Q1 2026, consumers are continuing to opt for SUVs over other vehicle types, a trend that may be contributing to higher average loan amounts and monthly payments. SUVs accounted for 63.5% of all new retail vehicle registrations over the last 12 months, up from 62.8% a year ago. Additionally, more than 117 million SUVs were in operation across the United States in the first quarter of 2026, making up 42.2% of the market share. At the same time, traditional passenger cars continue to fall in share, coming in at 16.5%, a decrease from 18.4% last year. As consumers increasingly gravitate towards the larger vehicle segment, it reflects the ongoing desire for versatility, cargo capacity, and family-friendly functionality. Electrification’s growing role in consumer purchasing behavior Interestingly, electrified SUVs continue to gain traction, representing 27.7% of all new SUV registrations, these vehicles include battery-electric, hybrids, plug-in hybrids, and other alternative fuel types. Diving a bit deeper, the Tesla Model Y was the market share leader for new, retail electrified SUV registrations in the last 12 months, coming in at 15.8%. Rounding out the top five were Honda CR-V (9.6%), Toyota RAV4 (7.2%), Chevrolet Trax (7.2%), and Toyota Grand Highlander (3.4%). As model availability and familiarity with the electrification segment grows, the broader adoption of these vehicles are playing an increasingly important role in vehicle pricing and overall consumer demand. While average loan amounts and monthly payments are being driven by a combination of factors such as financing costs and consumer purchasing behavior, data in Q1 2026 demonstrates the continued interest in SUVs. This suggests that the industry’s shift toward larger vehicles is likely playing a meaningful role in today’s financing environment. To learn more about SUV insights, view the full Automotive Consumer Trends Report: Q1 2026 presentation.

Published: June 17, 2026 by Kirsten Von Busch
When New Data Impacts MBS Pricing: Student Loan Debt

In our previous post, we described the Current Second Lien Balance field, which is one of over 2,000 fields in the new Experian Mortgage Loan Performance (MLP) dataset. We showed that the Current Second Lien Balance field meets our three-pronged materiality standard for new data delivery: New: Provides information not available in existing datasets (i.e., orthogonal to currently available data). Material: Impacts a sizeable portion of the MBS universe. Significant: Differentiates collateral performance by a large enough margin to influence trading and risk management decisions. In this article, we discuss another field that satisfies the above criteria: Student Loan Balance.  We evaluate this field in the context of these criteria. First, however, we provide a summary of the MLP dataset and how it compares to standard GSE loan-level data available today. Standard GSE Data vs. Experian Mortgage Loan Performance (MLP) Data The MLP dataset contains thousands of fields describing mortgage performance from each borrower, loan, and property perspective, all refreshed monthly (including, amongst other things, new credit scores and refinance inquiry activity, loan performance, filed junior liens, and AVM values).  MLP differs from loan-level data provided byFreddie Mac, Fannie Mae, and Ginnie Mae, which the vast majority of market participants solely rely on, in a number of ways: Standard data provided by the GSEs and GNMA does not contain all the information necessary for accurate forecasting of mortgage prepayment and credit performance. Basic, critical fields like borrower’s current credit score and current junior liens on the property are missing. The new Mortgage Loan Performance (MLP) dataset from Experian contains borrower, loan, and property data fields covering the entire mortgage universe, including Agency, Non-Agency, and Esoteric mortgage products (CES, HELOC, Reverse), both securitized and non-securitized. MLP enables full three-dimensional (borrower + loan + property) tracking with persistent keys for borrower (before and after refinancing), loan (in securities/deals even after exit due to payoffs or buyouts, including before and after MSR sales), and property.  This enables end-to-end analysis of each borrower’s (and property’s) mortgage experience throughout their credit lifecycle. New, Material and Significant Field:  Student Loan Debt MLP contains a number of fields describing each mortgage borrower’s student debt load, including amounts in repayment, forbearance and collections; estimated interest rate, time remaining until forbearance expiration, and more. In the interest of simplicity, for this article we’ll focus on a single student loan-related field within MLP: Student Loans Balance, which is defined as the total balance on open non-deferred student trades reported in the last 3 months. Is Information Regarding Student Loans New to Markets? Standard loan-level data disclosed by the GSEs and GNMA contain no student-loan-specific fields. Theoretically, fields related to DTI at origination might capture some aspect of student loan debt. So, in the best case scenario for an investor relying solely on standard disclosure, a DTI value as of origination is provided -- yet is never updated as the loan seasons and the borrower’s debt and income change (see more here).  But in the case of federal student loan debt attached to mortgages originated from early 2020 to late 2023, the level of detail provided by disclosure may be even more unknown due to COVID-era repayment and reporting moratoriums. The student loan repayment moratorium was a temporary federal policy that paused required payments, set interest rates to 0%, and suspended collections on most federally-held student loans. The moratorium began in March 2020, with payments resuming in October 2023, making it approximately 3.5 years in duration—the longest consumer credit payment pause in U.S. history. (Source: NCUA ) During the moratorium, student loan-related debt loads may have been understated as federal loans were in a temporary state of $0 repayment.  As an alternative to leaving student loan debt completely out of DTI calculations, an imputed payment equal to only 0.50% of the outstanding balance was often used as a placeholder for a borrower’s DTI calculation. As a result, mortgages originated during the moratorium may have artificially low reported DTIs for borrowers with student loan debt, materially understating true post-moratorium debt .  Accordingly, prepayment risk for these loans is likely overstated in mainstream market models. Standard data only reports information related to the primary mortgage and does not include any details on the borrower’s other debts with the exception of DTI at origination, which is never updated throughout the life of the loan. In contrast, MLP provides a comprehensive view of the borrower’s full credit profile, including other obligations such as credit cards, mortgages on other properties, student loan balances, and much more. Is Student Loan debt material to the residential mortgage market? Approximately $11 trillion of residential mortgage loans were originated during the student loan payment moratorium (Source: Experian MLP Dataset), a period marked by historically low mortgage rates during the COVID era.  As discussed above, DTI data contained in standard market disclosure may be particularly inaccurate for these loans. As the Wall Street Journal recently reported, a new report from the Federal Reserve of New York shows a rise in student loan default rates by age group.  Student l Of today’s $13 trillion in outstanding mortgage debt, more than 10% of that debt ($1.5 trillion) is associated with borrowers who carry student loan debt.  For these borrowers, the average amount of student loan debt outstanding is approximately $50,000, versus a mortgage balance of approximately ~$289,000. In other words, the average student loan debt balance is almost 20% of the mortgage balance for the average borrower who carries both. For this set of borrowers, the average monthly payment is approximately $400 for student loan vs. approximately $2,200 for 1st lien mortgage—so that monthly student loan payments are a significant debt load, approximately 20% of the monthly mortgage payment.  (Source:  Experian MLP Dataset) Is the effect of student loan debt a significant driver of performance? Figure 1 illustrates prepayments by student loan balance for a sample of loans drawn from MLP. The chart illustrates that borrowers with larger student loan balances prepay much more slowly, likely because some are effectively locked out of refinancing once student loan payments resume due to elevated DTI. The debt-to-income (DTI) ratio calculated using actual student loan payments may be significantly higher than the DTI calculated during the moratorium, in some cases exceeding GSE eligibility thresholds. As illustrated in Figure 1, for in-the-money (ITM) collateral, the differential between loans with material student loan balances (greater than $200,000) and loans with no student debt can reach up to 5 CPR. Notably, even for out-of-the-money (OTM) collateral, loans with student debt prepay 1 to 3 CPR slower, likely reflecting reduced mobility due to tighter financing constraints when purchasing a new home. Pools with otherwise similar prepayment characteristics may exhibit different prepayment behavior depending on the distribution of student loan exposure within their collateral. In addition, because loans with student debt tend to prepay more slowly, this effect increases over time due to burnout: loans without student debt prepay and exit the pools more quickly, leaving a higher concentration of slower-paying loans behind.  Given that 10% of the $13 trillion outstanding mortgage market is associated with borrowers who have student loans (Source:  Experian MLP dataset)—and that student loans have a meaningful impact on prepayments—many pools issued between March 2020 and October 2023 may be subject to this student loan debt CPR throttle, and therefore mispriced by investors relying exclusively on standard market data. Fig 1. Prepayment S-Curve: Student Loans Balance Source:  Experian MLP dataset hosted on IVolatility Data-Driven Platform   _____________________________________________________ Michael Pyatski advises MBS traders, portfolio managers, quants, risk managers, loan originators, and technology professionals on making informed, data-driven business decisions that drive revenue growth, enhance risk management, and reduce trading costs. With more than 15 years of experience as an Agency RMBS trader—including serving as Head of the Proprietary Trading Desk at BNP Paribas—Michael developed and successfully implemented relative-value, data-driven profitable trading strategies to capture market opportunities embedded in data but not fully priced by the market. His trading experience, combined with a Ph.D. in econometrics, led him to found the Data-Driven Portal (https://datadrivenportal.com/), a platform that provides advanced technology for MBS trading and risk management. The platform’s No-Model Data-Driven technology leverages big data, econometric analysis, and AI to help traders identify relative-value opportunities in RMBS markets and generate above-market, risk-adjusted returns. _____________________________________________________

Published: June 17, 2026 by Perry DeFelice
Empowering merchants to reduce first-party fraud and chargebacks

Reduce first-party fraud and chargebacks with data-driven strategies that help merchants prevent disputes, protect revenue and improve customer trust.

Published: June 15, 2026 by Charles Hunter

Subscribe to our Auto blog

Enter your name and email for the latest updates.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.