Technology has dramatically transformed the financial services landscape, fostering innovation and enhancing operational efficiency. In an interview at this year’s Money20/20 conference, Scott Brown, Group President of Financial and Marketing Services for Experian, sat down with Fintech Futures’ North America Correspondent Heather Sugg to share how Experian is leveraging data, analytics, and artificial intelligence (AI) to modernize the financial services industry. During the discussion, Scott highlighted the recent launch of Experian Assistant — our newest generative AI tool designed to accelerate the modeling lifecycle, resulting in greater productivity, improved data visibility, and reduced delays and expenses. While Experian Assistant is a business-to-business solution built alongside our clients, Scott also noted its broader impact — helping increase credit access for underserved consumers. “At Experian, we’re really focused on addressing the underserved community who doesn’t have access to credit,” said Scott. “And we think that this tool helps lenders reach those customers in an easier way.” Learn more about Experian Assistant and watch our tech showcase to see the solution in action. Learn more Watch tech showcase
Developing machine learning (ML) credit risk models can be more challenging than traditional credit risk modeling approaches. But once deployed, ML models can increase automation and expand a lender’s credit universe. For example, by using ML-driven credit risk models and combining traditional credit data with transactional bank data, a type of alternative credit data* , some lenders see a Gini uplift of 60 to 70 percent compared to a traditional credit risk model.1 New approaches to model operations are also helping lenders accelerate their machine learning model development processes and go from collecting data to deploying a new model in days instead of months. READ MORE: Getting AI-driven decisioning right in financial services What is machine learning model development? Machine learning model development is what happens before the model gets deployed. It's often broken down into several steps. Define the problem: If you’re building an ML credit risk model, the problem you may be trying to solve is anticipating defaults, improving affordability for borrowers or expanding your lending universe by scoring more thin-file and previously unscorable consumers. Gather, clean and stage data: Identify helpful data sources, such as internal, credit bureau and alternative credit data. The data will then need to be consolidated, structured, labeled and categorized. Machine learning can be useful here as well, as ML models can be trained to label and categorize raw data. Feature engineering: The data is then analyzed to identify the individual variables and clusters of variables that may offer the most lift. Features that may directly or unintentionally create bias should be removed or limited. Create the model: Deciding which algorithms and techniques to use when developing a model can be part art and part science. Because lenders need to be able to explain the decisions they make to consumers and regulators, many lenders build model explainability into new ML-driven credit risk models. Validate and deploy: New models are validated and rigorously tested, often as challengers to the existing champion model. If the new model can consistently outperform, it may move on to production. The work doesn’t stop once a model is live — it needs to be continuously monitored for drift, and potentially recalibrated or replaced with a new model. About 10 percent of lenders use tools to automatically alert them when their models start to drift. But around half make a point of checking deployed models for drift every month or quarter.3 READ MORE: Journey of an ML Model What is model deployment? Model deployment is one of the final steps in the model lifecycle — it’s when you move the model from development and validation to live production. New models can be deployed in various ways, including via API integration and cloud service deployment using public, private or hybrid architecture. However, integrating a new model with existing systems can be challenging. About a third (33 percent) of consumer lending organizations surveyed in 2023 said it took them one to two months for model deployment-related activities. A little less (29 percent) said it took them three to six months. Overall, it often takes up to 15 months for the entire development to deployment process — and 55 percent of lenders report building models that never get deployed.2 READ MORE: Accelerating the Model Development and Deployment Lifecycle Benefits of deploying machine learning credit risk models Developing, deploying, monitoring and recalibrating ML models can be difficult and costly. But financial institutions have a lot to gain from embracing the future of underwriting. Improve credit risk assessment: ML-driven models can incorporate more data sources and more precisely assess credit risk to help lenders price credit offers and decrease charge-offs. Expand automation: More precise scoring can also increase automation by reducing how many applications need to go to manual review. Increase financial inclusion: ML-models may be able to evaluate consumers who don’t have recent credit information or thick enough credit files to be scorable by traditional models. In short, ML models can help lenders make better loan offers to more people while taking on less risk and using fewer internal resources to review applications. CASE STUDY: Atlas Credit, a small-dollar lender, partnered with Experian® to develop a fully explainable machine learning credit risk model that incorporated internal data, trended data, alternative financial services data and Experian’s attributes. Atlas Credit can use the new model to make instant decisions and is expected to double its approvals while decreasing losses by up to 20 percent. How we can help Experian offers many machine learning solutions for different industries and use cases via the Experian Ascend Technology Platform™. For example, with Ascend ML Builder™, lenders can access an on-demand development environment that can increase model velocity — the time it takes to complete a new model’s lifecycle. You can configure Ascend ML Builder based on the compute you allocate and your use cases, and the included code templates (called Accelerators) can help with data wrangling, analysis and modeling. There’s also Ascend Ops™, a cloud-based model operations solution. You can use Ascend Ops to register, test and deploy custom features and models. Automated model monitoring and management can also help you track feature and model data drift and model performance to improve models in production. Learn more about our machine learning and model deployment solutions *When we refer to “Alternative Credit Data,” this refers to the use of alternative data and its appropriate use in consumer credit lending decisions, as regulated by the Fair Credit Reporting Act. Hence, the term “Expanded FCRA Data” may also apply and can be used interchangeably. 1. Experian (2023). Raising the AI Bar 2. Experian (2023). Accelerating Model Velocity in Financial Institutions 3. Ibid.
Today's lenders use expanded data sources and advanced analytics to predict credit risk more accurately and optimize their lending and operations. The result may be a win-win for lenders and customers. What is credit risk? Credit risk is the possibility that a borrower will not repay a debt as agreed. Credit risk management encompasses the policies, tools and systems that lenders use to understand this risk. These can be important throughout the customer lifecycle, from marketing and sending preapproved offers to underwriting and portfolio management. Poor risk management can lead to unnecessary losses and missed opportunities, especially because risk departments need to manage risk with their organization's budgetary, technical and regulatory constraints in mind. How is it assessed? Credit risk is often assessed with credit risk analytics — statistical modeling that predicts the risk involved with credit lending. Lenders may create and use credit risk models to help drive decisions. Additionally (or alternatively), they rely on generic or custom credit risk scores: Generic scores: Analytics companies create predictive models that rank order consumers based on the likelihood that a person will fall 90 or more days past due on any credit obligation in the next 24 months. Lenders can purchase these risk scores to help them evaluate risk. Custom scores: Custom credit risk modeling solutions help organizations tailor risk scores for particular products, markets, and customers. Custom scores can incorporate generic risk scores, traditional credit data, alternative credit data* (or expanded FCRA-regulated data), and a lender's proprietary data to increase their effectiveness. About 41 percent of consumer lending organizations use a model-first approach, and 55 percent use a score-first approach to credit decisioning.1 However, these aren't entirely exclusive groupings. For example, a credit score may be an input in a lender's credit risk model — almost every lender (99 percent) that uses credit risk models for decisioning also uses credit scores.2 Similarly, lenders that primarily rely on credit scores may also have business policies that affect their decisions. What are the current challenges? Risk departments and teams are facing several overarching challenges today: Staying flexible: Volatile market conditions and changing consumer preferences can lead to unexpected shifts in risk. Organizations need to actively monitor customer accounts and larger economic trends to understand when, if, and how they should adjust their risk policies. Digesting an overwhelming amount of data: More data can be beneficial, but only if it offers real insights and the organization has the resources to understand and use it efficiently. Artificial intelligence (AI) and machine learning (ML) are often important for turning raw data into actionable insights. Retaining IT talent: Many organizations are trying to figure out how to use vast amounts of data and AI/ML effectively. However, 82 percent of lenders have trouble hiring and retaining data scientists and analysts.3 Separating fraud and credit losses: Understanding a portfolio's credit losses can be important for improving credit risk models and performance. But some organizations struggle to properly distinguish between the two, particularly when synthetic identity fraud is involved. Best practices for credit risk management Leading financial institutions have moved on from legacy systems and outdated risk models or scores. And they're looking at the current challenges as an opportunity to pull away from the competition. Here's how they're doing it: Using additional data to gain a holistic picture: Lenders have an opportunity to access more data sources, including credit data from alternative financial services and consumer-permissioned data. When combined with traditional credit data, credit scores, and internal data, the outcome can be a more complete picture of a consumer's credit risk. Implementing AI/ML-driven models: Lenders can leverage AI/ML to analyze large amounts of data to improve organizational efficiency and credit risk assessments. 16 percent of consumer lending organizations expect to solely use ML algorithms for credit decisioning, while two-thirds expect to use both traditional and ML models going forward.4 Increasing model velocity: On average, it takes about 15 months to go from model development to deployment. But some organizations can do it in less than six.5 Increasing model velocity can help organizations quickly respond to changing consumer and economic conditions. Even if rapid model creation and deployment isn't an option, monitoring model health and recalibrating for drift is important. Nearly half (49 percent) of lenders check for model drift monthly or quarterly — one out of ten get automated alerts when their models start to drift.6 WATCH: Accelerating Model Velocity in Financial Institutions Improving automation and customer experience Lenders are using AI to automate their application, underwriting, and approval processes. Often, automation and ML-driven risk models go hand-in-hand. Lenders can use the models to measure the credit risk of consumers who don't qualify for traditional credit scores and automation to expedite the review process, leading to an improved customer experience. Learn more by exploring Experian's credit risk solutions. Learn more * When we refer to “Alternative Credit Data," this refers to the use of alternative data and its appropriate use in consumer credit lending decisions as regulated by the Fair Credit Reporting Act (FCRA). Hence, the term “Expanded FCRA Data" may also apply in this instance and both can be used interchangeably. 1-6. Experian (2023). Accelerating Model Velocity in Financial Institutions
Data-driven machine learning model development is a critical strategy for financial institutions to stay ahead of their competition, and according to IDC, remains a strategic priority for technology buyers. Improved operational efficiency, increased innovation, enhanced customer experiences and employee productivity are among the primary business objectives for organizations that choose to invest in artificial intelligence (AI) and machine learning (ML), according to IDC’s 2022 CEO survey. While models have been around for some time, the volume of models and scale at which they are utilized has proliferated in recent years. Models are also now appearing in more regulated aspects of the business, which demand increased scrutiny and transparency. Implementing an effective model development process is key to achieving business goals and complying with regulatory requirements. While ModelOps, the governance and life cycle management of a wide range of operationalized AI models, is becoming more popular, most organizations are still at relatively low levels of maturity. It's important for key stakeholders to implement best practices and accelerate the model development and deployment lifecycle. Read the IDC Spotlight Challenges impeding machine learning model development Model development involves many processes, from wrangling data, analysis, to building a model that is ready for deployment, that all need to be executed in a timely manner to ensure proper outcomes. However, it is challenging to manage all these processes in today’s complex environment. Modeling challenges include: Infrastructure: Necessary factors like storage and compute resources incur significant costs, which can keep organizations from evolving their machine learning capabilities. Organizational: Implementing machine learning applications requires talent, like data scientists and data and machine learning engineers. Operational: Piece meal approaches to ML tools and technologies can be cumbersome, especially on top of data being housed in different places across an organization, which can make pulling everything together challenging. Opportunities for improvement are many While there are many places where individuals can focus on improving model development and deployment, there are a few key places where we see individuals experiencing some of the most time-consuming hang-ups. Data wrangling and preparation Respondents to IDC's 2022 AI StrategiesView Survey indicated that they spend nearly 22% of their time collecting and preparing data. Pinpointing the right data for the right purpose can be a big challenge. It is important for organizations to understand the entire data universe and effectively link external data sources with their own primary first party data. This way, stakeholders can have enough data that they trust to effectively train and build models. Model building While many tools have been developed in recent years to accelerate the actual building of models, the volume of models that often need to be built can be difficult given the many conflicting priorities for data teams within given institutions. Where possible, it is important for organizations to use templates or sophisticated platforms to ease the time to build a model and be able to repurpose elements that may already be working for other models within the business. Improving Model Velocity Experian’s Ascend ML BuilderTM is an on-demand advanced model development environment optimized to support a specific project. Features include a dedicated environment, innovative compute optimization, pre-built code called ‘Accelerators’ that simply, guide, and speed data wrangling, common analyses and advanced modeling methods with the ability to add integrated deployment. To learn more about Experian’s Ascend ML Builder, click here. To read the full Technology Spotlight, download “Accelerating Model Velocity with a Flexible Machine Learning Model Development Environment for Financial Institutions” here. Download spotlight *This article includes content created by an AI language model and is intended to provide general information.
Changes in your portfolio are a constant. To accelerate growth while proactively identifying risk, you’ll need a well-informed portfolio risk management strategy. What is portfolio risk management? Portfolio risk management is the process of identifying, assessing, and mitigating risks within a portfolio. It involves implementing strategies that allow lenders to make more informed decisions, such as whether to offer additional credit products to customers or identify credit problems before they impact their bottom line. Leveraging the right portfolio risk management solution Traditional approaches to portfolio risk management may lack a comprehensive view of customers. To effectively mitigate risk and maximize revenue within your portfolio, you’ll need a portfolio risk management tool that uses expanded customer data, advanced analytics, and modeling. Expanded data. Differentiated data sources include marketing data, traditional credit and trended data, alternative financial services data, and more. With robust consumer data fueling your portfolio risk management solution, you can gain valuable insights into your customers and make smarter decisions. Advanced analytics. Advanced analytics can analyze large volumes of data to unlock greater insights, resulting in increased predictiveness and operational efficiency. Model development. Portfolio risk modeling methodologies forecast future customer behavior, enabling you to better predict risk and gain greater precision in your decisions. Benefits of portfolio risk management Managing portfolio risk is crucial for any organization. With an advanced portfolio risk management solution, you can: Minimize losses. By monitoring accounts for negative performance, you can identify risks before they occur, resulting in minimized losses. Identify growth opportunities. With comprehensive consumer data, you can connect with customers who have untapped potential to drive cross-sell and upsell opportunities. Enhance collection efforts. For debt portfolios, having the right portfolio risk management tool can help you quickly and accurately evaluate collections recovery. Maximize your portfolio potential Experian offers portfolio risk analytics and portfolio risk management tools that can help you mitigate risk and maximize revenue with your portfolio. Get started today. Learn more
Financial institutions have long been on the cutting edge of technology trends, and it continues to be true as we look at artificial intelligence and machine learning. Large analytics teams are using models to solve for lending decisions, account management, investments, and more. However, unlike other industries taking advantage of modeling, financial institutions have the added complexity of regulation and transparency requirements to ensure fairness and explainability. That means institutions need highly sophisticated model operations and a highly skilled workforce to ensure that decisions are accurate and accountability is maintained. According to new research from Experian, we see that while financial institutions plan to use or are using models for a wide range of use cases, there is a range of ModelOps maturity across the industry. Just under half of financial institutions are in the early stages of model building, where projects are more ad-hoc in nature and experimental. Only a quarter of institutions seem to be more mature, where processes are well defined and models can be developed in a reliable timeframe. With more than two-thirds of lenders saying that ModelOps will play a key role in shaping the industry over the next five years, the race to maturity is critical. One of the biggest challenges we see in the space is that it takes too long for models to make it into production. On average, financial institutions estimate that the end-to-end process for creating a new model for credit decisioning takes an average of 15 months. Organizations need to accelerate model velocity, meaning the time that it takes to get a model into production and generating value, to take advantage of this powerful technology. Having the right technology, the right talent, and the right data at the right time continue to drag down operational speed and tracking of models after they are in production. For more information on Experian’s recent study, download the new report ‘Accelerating Model Velocity in Financial Institutions’. We are also hosting an upcoming webinar with tips on how to tackle some of the biggest model development and deployment challenges. You can register for the webinar here.
Intuitively we all know that people with higher credit risk scores tend to get more favorable loan terms. Since a higher credit risk score corresponds to lower chance of delinquency, a lender can grant: a higher credit line, a more favorable APR or a mix of those and other loan terms. Some people might wonder if there is a way to quantify the relationship between a credit risk score and the loan terms in a more mathematically rigorous way. For example, what is an appropriate credit limit for a given score band? Early in my career I worked a lot with mathematical optimization. This optimization used a software product called Marketswitch (later purchased by Experian). One caveat of optimization is in order to choose an optimal decision you must first simulate all possible decisions. Basically, one decision cannot be deemed better than another if the consequences of those decisions are unknown. So how does this relate to credit risk scores? Credit scores are designed to give lenders an overall view of a borrower’s credit worthiness. For example, a generic risk score might be calibrated to perform across: personal loans, credit cards, auto loans, real estate, etc. Per lending category, the developer of the credit risk score will provide an “odds chart;” that is, how many good outcomes can you expect per bad outcome. Here is an odds chart for VantageScore® 3 (overall - demi-decile). Score Range How Many Goods for 1 Bad 823-850 932.3 815-823 609.0 808-815 487.6 799-808 386.1 789-799 272.5 777-789 228.1 763-777 156.1 750-763 115.6 737-750 85.5 723-737 60.3 709-723 45.1 693-709 33.0 678-693 24.3 662-678 18.3 648-662 14.1 631-648 10.8 608-631 7.9 581-608 5.5 542-581 3.5 300-542 1.5 Per the above chart, there will be 932.3 good accounts for every one “bad” (delinquent) account in the score range of 823-850. Now, it’s a simple calculation to turn that into a bad rate (i.e. what percentage of accounts in this band will go bad). So, if there are 932.3 good accounts for every one bad account, we have (1 expected bad)/(1 expected bad + 932.3 expected good accounts) = 1/(1+932.3) = 0.1071%. So, in the credit risk band of 823-850 an account has a 0.1071% chance of going bad. It’s very simple to apply the same formula to the other risk bands as seen in the table below. Score Range How Many Goods for 1 Bad Bad Rate 823-850 932.3 0.1071% 815-823 609.0 0.1639% 808-815 487.6 0.2047% 799-808 386.1 0.2583% 789-799 272.5 0.3656% 777-789 228.1 0.4365% 763-777 156.1 0.6365% 750-763 115.6 0.8576% 737-750 85.5 1.1561% 723-737 60.3 1.6313% 709-723 45.1 2.1692% 693-709 33.0 2.9412% 678-693 24.3 3.9526% 662-678 18.3 5.1813% 648-662 14.1 6.6225% 631-648 10.8 8.4746% 608-631 7.9 11.2360% 581-608 5.5 15.3846% 542-581 3.5 22.2222% 300-542 1.5 40.0000% Now that we have a bad percentage per risk score band, we can define dollars at risk per risk score band as: bad rate * loan amount = dollars at risk. For example, if the loan amount in the 823-850 band is set as $10,000 you would have 0.1071% * $10,000 = $10.71 at risk from a probability standpoint. So, to have constant dollars at risk, set credit limits per band so that in all cases there is $10.71 at risk per band as indicated below. Score Range How Many Goods for 1 Bad Bad Rate Loan Amount $ at Risk 823-850 932.3 0.1071% $ 10,000.00 $ 10.71 815-823 609.0 0.1639% $ 6,535.95 $ 10.71 808-815 487.6 0.2047% $ 5,235.19 $ 10.71 799-808 386.1 0.2583% $ 4,147.65 $ 10.71 789-799 272.5 0.3656% $ 2,930.46 $ 10.71 777-789 228.1 0.4365% $ 2,454.73 $ 10.71 763-777 156.1 0.6365% $ 1,683.27 $ 10.71 750-763 115.6 0.8576% $ 1,249.33 $ 10.71 737-750 85.5 1.1561% $ 926.82 $ 10.71 723-737 60.3 1.6313% $ 656.81 $ 10.71 709-723 45.1 2.1692% $ 493.95 $ 10.71 693-709 33.0 2.9412% $ 364.30 $ 10.71 678-693 24.3 3.9526% $ 271.08 $ 10.71 662-678 18.3 5.1813% $ 206.79 $ 10.71 648-662 14.1 6.6225% $ 161.79 $ 10.71 631-648 10.8 8.4746% $ 126.43 $ 10.71 608-631 7.9 11.2360% $ 95.36 $ 10.71 581-608 5.5 15.3846% $ 69.65 $ 10.71 542-581 3.5 22.2222% $ 48.22 $ 10.71 300-542 1.5 40.0000% $ 26.79 $ 10.71 In this manner, the output is now set credit limits per band so that we have achieved constant dollars at risk across bands. Now in practice it’s unlikely that a lender will grant $1,683.27 for the 763 to 777 credit score band but this exercise illustrates how the numbers are generated. More likely, a lender will use steps of $100 or something similar to make the credit limits seem more logical to borrowers. What I like about this constant dollars at risk approach is that we aren’t really favoring any particular credit score band. Credit limits are simply set in a manner that sets dollars at risk consistently across bands. One final thought on this: Actual observations of delinquencies (not just predicted by the scores odds table) could be gathered and used to generate a new odds tables per score band. From there, the new delinquency rate could be generated based on actuals. Though, if this is done, the duration of the sample must be long enough and comprehensive enough to include both good and bad observations so that the delinquency calculation is robust as small changes in observations can affect the final results. Since the real world does not always meet our expectations, it might also be necessary to “smooth” the odds-chart so that its looks appropriate.
Your model is only as good as your data, right? Actually, there are many considerations in developing a sound model, one of which is data. Yet if your data is bad or dirty or doesn’t represent the full population, can it be used? This is where sampling can help. When done right, sampling can lower your cost to obtain data needed for model development. When done well, sampling can turn a tainted and underrepresented data set into a sound and viable model development sample. First, define the population to which the model will be applied once it’s finalized and implemented. Determine what data is available and what population segments must be represented within the sampled data. The more variability in internal factors — such as changes in marketing campaigns, risk strategies and product launches — and external factors — such as economic conditions or competitor presence in the marketplace — the larger the sample size needed. A model developer often will need to sample over time to incorporate seasonal fluctuations in the development sample. The most robust samples are pulled from data that best represents the full population to which the model will be applied. It’s important to ensure your data sample includes customers or prospects declined by the prior model and strategy, as well as approved but nonactivated accounts. This ensures full representation of the population to which your model will be applied. Also, consider the number of predictors or independent variables that will be evaluated during model development, and increase your sample size accordingly. When it comes to spotting dirty or unacceptable data, the golden rule is know your data and know your target population. Spend time evaluating your intended population and group profiles across several important business metrics. Don’t underestimate the time needed to complete a thorough evaluation. Next, select the data from the population to aptly represent the population within the sampled data. Determine the best sampling methodology that will support the model development and business objectives. Sampling generates a smaller data set for use in model development, allowing the developer to build models more quickly. Reducing the data set’s size decreases the time needed for model computation and saves storage space without losing predictive performance. Once the data is selected, weights are applied so that each record appropriately represents the full population to which the model will be applied. Several traditional techniques can be used to sample data: Simple random sampling — Each record is chosen by chance, and each record in the population has an equal chance of being selected. Random sampling with replacement — Each record chosen by chance is included in the subsequent selection. Random sampling without replacement — Each record chosen by chance is removed from subsequent selections. Cluster sampling — Records from the population are sampled in groups, such as region, over different time periods. Stratified random sampling — This technique allows you to sample different segments of the population at different proportions. In some situations, stratified random sampling is helpful in selecting segments of the population that aren’t as prevalent as other segments but are equally vital within the model development sample. Learn more about how Experian Decision Analytics can help you with your custom model development needs.
As I mentioned in my previous blog, model validation is an essential step in evaluating a recently developed predictive model’s performance before finalizing and proceeding with implementation. An in-time validation sample is created to set aside a portion of the total model development sample so the predictive accuracy can be measured on a data sample not used to develop the model. However, if few records in the target performance group are available, splitting the total model development sample into the development and in-time validation samples will leave too few records in the target group for use during model development. An alternative approach to generating a validation sample is to use a resampling technique. There are many different types and variations of resampling methods. This blog will address a few common techniques. Jackknife technique — An iterative process whereby an observation is removed from each subsequent sample generation. So if there are N number of observations in the data, jackknifing calculates the model estimates on N - 1 different samples, with each sample having N - 1 observations. The model then is applied to each sample, and an average of the model predictions across all samples is derived to generate an overall measure of model performance and prediction accuracy. The jackknife technique can be broadened to a group of observations removed from each subsequent sample generation while giving equal opportunity for inclusion and exclusion to each observation in the data set. K-fold cross-validation — Generates multiple validation data sets from the holdout sample created for the model validation exercise, i.e., the holdout data is split into K subsets. The model then is applied to the K validation subsets, with each subset held out during the iterative process as the validation set while the model scores the remaining K-1 subsets. Again, an average of the predictions across the multiple validation samples is used to create an overall measure of model performance and prediction accuracy. Bootstrap technique — Generates subsets from the full model development data sample, with replacement, producing multiple samples generally of equal size. Thus, with a total sample size of N, this technique generates N random samples such that a single observation can be present in multiple subsets while another observation may not be present in any of the generated subsets. The generated samples are combined into a simulated larger data sample that then can be split into a development and an in-time, or holdout, validation sample. Before selecting a resampling technique, it’s important to check and verify data assumptions for each technique against the data sample selected for your model development, as some resampling techniques are more sensitive than others to violations of data assumptions. Learn more about how Experian Decision Analytics can help you with your custom model development.
An introduction to the different types of validation samples Model validation is an essential step in evaluating and verifying a model’s performance during development before finalizing the design and proceeding with implementation. More specifically, during a predictive model’s development, the objective of a model validation is to measure the model’s accuracy in predicting the expected outcome. For a credit risk model, this may be predicting the likelihood of good or bad payment behavior, depending on the predefined outcome. Two general types of data samples can be used to complete a model validation. The first is known as the in-time, or holdout, validation sample and the second is known as the out-of-time validation sample. So, what’s the difference between an in-time and an out-of-time validation sample? An in-time validation sample sets aside part of the total sample made available for the model development. Random partitioning of the total sample is completed upfront, generally separating the data into a portion used for development and the remaining portion used for validation. For instance, the data may be randomly split, with 70 percent used for development and the other 30 percent used for validation. Other common data subset schemes include an 80/20, a 60/40 or even a 50/50 partitioning of the data, depending on the quantity of records available within each segment of your performance definition. Before selecting a data subset scheme to be used for model development, you should evaluate the number of records available in your target performance group, such as number of bad accounts. If you have too few records in your target performance group, a 50/50 split can leave you with insufficient performance data for use during model development. A separate blog post will present a few common options for creating alternative validation samples through a technique known as resampling. Once the data has been partitioned, the model is created using the development sample. The model is then applied to the holdout validation sample to determine the model’s predictive accuracy on data that wasn’t used to develop the model. The model’s predictive strength and accuracy can be measured in various ways by comparing the known and predefined performance outcome to the model’s predicted performance outcome. The out-of-time validation sample contains data from an entirely different time period or customer campaign than what was used for model development. Validating model performance on a different time period is beneficial to further evaluate the model’s robustness. Selecting a data sample from a more recent time period having a fully mature set of performance data allows the modeler to evaluate model performance on a data set that may more closely align with the current environment in which the model will be used. In this case, a more recent time period can be used to establish expectations and set baseline parameters for model performance, such as population stability indices and performance monitoring. Learn more about how Experian Decision Analytics can help you with your custom model development needs.
Marketers are keenly aware of how important it is to “Know thy customer.” Yet customer knowledge isn’t restricted to the marketing-savvy. It’s also essential to credit risk managers and model developers. Identifying and separating customers into distinct groups based on various types of behavior is foundational to building effective custom models. This integral part of custom model development is known as segmentation analysis. Segmentation is the process of dividing customers or prospects into groupings based on similar behaviors such as length of time as a customer or payment patterns like credit card revolvers versus transactors. The more similar or homogeneous the customer grouping, the less variation across the customer segments are included in each segment’s custom model development. So how many scorecards are needed to aptly score and mitigate credit risk? There are several general principles we’ve learned over the course of developing hundreds of models that help determine whether multiple scorecards are warranted and, if so, how many. A robust segmentation analysis contains two components. The first is the generation of potential segments, and the second is the evaluation of such segments. Here I’ll discuss the generation of potential segments within a segmentation scheme. A second blog post will continue with a discussion on evaluation of such segments. When generating a customer segmentation scheme, several approaches are worth considering: heuristic, empirical and combined. A heuristic approach considers business learnings obtained through trial and error or experimental design. Portfolio managers will have insight on how segments of their portfolio behave differently that can and often should be included within a segmentation analysis. An empirical approach is data-driven and involves the use of quantitative techniques to evaluate potential customer segmentation splits. During this approach, statistical analysis is performed to identify forms of behavior across the customer population. Different interactive behavior for different segments of the overall population will correspond to different predictive patterns for these predictor variables, signifying that separate segment scorecards will be beneficial. Finally, a combination of heuristic and empirical approaches considers both the business needs and data-driven results. Once the set of potential customer segments has been identified, the next step in a segmentation analysis is the evaluation of those segments. Stay tuned as we look further into this topic. Learn more about how Experian Decision Analytics can help you with your segmentation or custom model development needs.