If you’re a credit risk manager or a data scientist responsible for modeling consumer credit risk at a lender, a fintech, a telecommunications company or even a utility company you’re certainly exploring how machine learning (ML) will make you even more successful with predictive analytics. You know your competition is looking beyond the algorithms that have long been used to predict consumer payment behavior: algorithms with names like regression, decision trees and cluster analysis. Perhaps you’re experimenting with or even building a few models with artificial intelligence (AI) algorithms that may be less familiar to your business: neural networks, support vector machines, gradient boosting machines or random forests. One recent survey found that 25 percent of financial services companies are ahead of the industry; they’re already implementing or scaling up adoption of advanced analytics and ML.
My alma mater, the Virginia Cavaliers, recently won the 2019 NCAA national championship in nail-biting overtime. With the utmost respect to Coach Tony Bennett, this victory got me thinking more about John Wooden, perhaps the greatest college coach ever. In his book Coach Wooden and Me, Kareem Abdul-Jabbar recalled starting at UCLA in 1965 with what was probably the greatest freshman team in the history of basketball. What was their new coach’s secret as he transformed UCLA into the best college basketball program in the country? I can only imagine their surprise at the first practice when the coach told them, “Today we are going to learn how to put on our sneakers and socks correctly. … Wrinkles cause blisters. Blisters force players to sit on the sideline. And players sitting on the sideline lose games.”
What’s that got to do with machine learning? Simply put, the financial services companies ready to move beyond the exploration stage with AI are those that have mastered the tasks that come before and after modeling with the new algorithms. Any ML library — whether it’s TensorFlow, PyTorch, extreme gradient boosting or your company’s in-house library — simply enables a computer to spot patterns in training data that can be generalized for new customers. To win in the ML game, the team and the process are more important than the algorithm. If you’ve assembled the wrong stakeholders, if your project is poorly defined or if you’ve got the wrong training data, you may as well be sitting on the sideline.
Consider these important best practices before modeling:
- Careful project planning is a prerequisite — Assemble all the key project stakeholders, and insist they reach a consensus on specific and measurable project objectives. When during the project life cycle will the model be used? A wealth of new data sources are available. Which data sources and attributes are appropriate candidates for use in the modeling project? Does the final model need to be explainable, or is a black box good enough? If the model will be used to make real-time decisions, what data will be available at runtime? Good ML consultants (like those at Experian) use their experience to help their clients carefully define the model development parameters.
- Data collection and data preparation are incredibly important — Explore the data to determine not only how important and appropriate each candidate attribute is for your project, but also how you’ll handle missing or corrupt data during training and implementation. Carefully select the training and validation data samples and the performance definition. Any biases in the training data will be reflected in the patterns the algorithm learns and therefore in your future business decisions. When ML is used to build a credit scoring model for loan originations, a common source of bias is the difference between the application population and the population of booked accounts. ML experts from outside the credit risk industry may need to work with specialists to appreciate the variety of reject inference techniques available.
- Segmentation analysis — In most cases, more than one ML model needs to be built, because different segments of your population perform differently. The segmentation needs to be done in a way that makes sense — both statistically and from a business perspective. Intriguingly, some credit modeling experts have had success using an AI library to inform segmentation and then a more tried-and-true method, such as regression, to develop the actual models.
- With a good plan and well-designed data sets, the modeling project has a very good chance of succeeding. But no automated tool can make the tough decisions that can make or break whether the model is suitable for use in your business — such as trade-offs between the ML model’s accuracy and its simplicity and transparency. Engaged leadership is important.
- Model validation — Your project team should be sure the analysts and consultants appreciate and mitigate the risk of over fitting the model parameters to the training data set. Validate that any ML model is stable. Test it with samples from a different group of customers — preferably a different time period from which the training sample was taken.
- Documentation — AI models can have important impacts on people’s lives. In our industry, they determine whether someone gets a loan, a credit line increase or an unpleasant loss mitigation experience. Good model governance practice insists that a lender won’t make decisions based on an unexplained black box. In a globally transparent model, good documentation thoroughly explains the data sources and attributes and how the model considers those inputs. With a locally transparent model, you can further explain how a decision is reached for any specific individual — for example, by providing FCRA-compliant adverse action reasons.
- Model implementation — Plan ahead. How will your ML model be put into production? Will it be recoded into a new computer language, or can it be imported into one of your systems using a format such as the Predictive Model Markup Language (PMML)? How will you test that it works as designed?
- Post-implementation — Just as with an old-fashioned regression model, it’s important to monitor both the usage and the performance of the ML model. Your governance team should check periodically that the model is being used as it was intended. Audit the model periodically to know whether changing internal and external factors — which might range from a change in data definition to a new customer population to a shift in the economic environment — might impact the model’s strength and predictive power.
Coach Wooden used to say, “It isn’t what you do. It’s how you do it.” Just like his players, the most successful ML practitioners understand that a process based on best practices is as important as the “game” itself.