Beyond Basic Data Sampling for Model Development

by Guest Contributor 3 min read November 7, 2018

Your model is only as good as your data, right? Actually, there are many considerations in developing a sound model, one of which is data. Yet if your data is bad or dirty or doesn’t represent the full population, can it be used? This is where sampling can help. When done right, sampling can lower your cost to obtain data needed for model development. When done well, sampling can turn a tainted and underrepresented data set into a sound and viable model development sample.

First, define the population to which the model will be applied once it’s finalized and implemented. Determine what data is available and what population segments must be represented within the sampled data. The more variability in internal factors — such as changes in marketing campaigns, risk strategies and product launches — and external factors — such as economic conditions or competitor presence in the marketplace — the larger the sample size needed. A model developer often will need to sample over time to incorporate seasonal fluctuations in the development sample.

The most robust samples are pulled from data that best represents the full population to which the model will be applied. It’s important to ensure your data sample includes customers or prospects declined by the prior model and strategy, as well as approved but nonactivated accounts. This ensures full representation of the population to which your model will be applied. Also, consider the number of predictors or independent variables that will be evaluated during model development, and increase your sample size accordingly.

When it comes to spotting dirty or unacceptable data, the golden rule is know your data and know your target population. Spend time evaluating your intended population and group profiles across several important business metrics. Don’t underestimate the time needed to complete a thorough evaluation.

Next, select the data from the population to aptly represent the population within the sampled data. Determine the best sampling methodology that will support the model development and business objectives. Sampling generates a smaller data set for use in model development, allowing the developer to build models more quickly. Reducing the data set’s size decreases the time needed for model computation and saves storage space without losing predictive performance.

Once the data is selected, weights are applied so that each record appropriately represents the full population to which the model will be applied. Several traditional techniques can be used to sample data:

  • Simple random sampling — Each record is chosen by chance, and each record in the population has an equal chance of being selected.
  • Random sampling with replacement — Each record chosen by chance is included in the subsequent selection.
  • Random sampling without replacement — Each record chosen by chance is removed from subsequent selections.
  • Cluster sampling — Records from the population are sampled in groups, such as region, over different time periods.
  • Stratified random sampling — This technique allows you to sample different segments of the population at different proportions. In some situations, stratified random sampling is helpful in selecting segments of the population that aren’t as prevalent as other segments but are equally vital within the model development sample.

Learn more about how Experian Decision Analytics can help you with your custom model development needs.

Related Posts

How Consumer Vehicle Choices Are Shaping Automotive Loan Trends

Conversations about rising auto loan balances and higher monthly payments has often centered around increasing vehicle prices and elevated interest rates; and while those factors have undoubtedly played a role, another important piece of the puzzle is the type of vehicles consumers are choosing to purchase. According to Experian’s Automotive Consumer Trends Report: Q1 2026, consumers are continuing to opt for SUVs over other vehicle types, a trend that may be contributing to higher average loan amounts and monthly payments. SUVs accounted for 63.5% of all new retail vehicle registrations over the last 12 months, up from 62.8% a year ago. Additionally, more than 117 million SUVs were in operation across the United States in the first quarter of 2026, making up 42.2% of the market share. At the same time, traditional passenger cars continue to fall in share, coming in at 16.5%, a decrease from 18.4% last year. As consumers increasingly gravitate towards the larger vehicle segment, it reflects the ongoing desire for versatility, cargo capacity, and family-friendly functionality. Electrification’s growing role in consumer purchasing behavior Interestingly, electrified SUVs continue to gain traction, representing 27.7% of all new SUV registrations, these vehicles include battery-electric, hybrids, plug-in hybrids, and other alternative fuel types. Diving a bit deeper, the Tesla Model Y was the market share leader for new, retail electrified SUV registrations in the last 12 months, coming in at 15.8%. Rounding out the top five were Honda CR-V (9.6%), Toyota RAV4 (7.2%), Chevrolet Trax (7.2%), and Toyota Grand Highlander (3.4%). As model availability and familiarity with the electrification segment grows, the broader adoption of these vehicles are playing an increasingly important role in vehicle pricing and overall consumer demand. While average loan amounts and monthly payments are being driven by a combination of factors such as financing costs and consumer purchasing behavior, data in Q1 2026 demonstrates the continued interest in SUVs. This suggests that the industry’s shift toward larger vehicles is likely playing a meaningful role in today’s financing environment. To learn more about SUV insights, view the full Automotive Consumer Trends Report: Q1 2026 presentation.

Published: June 17, 2026 by Kirsten Von Busch
Empowering merchants to reduce first-party fraud and chargebacks

When disputes become a fraud strategy  First-party fraud is quietly reshaping the risk landscape for merchants. Unlike third-party fraud, it originates from the consumer, often through a dispute that triggers a chargeback. Mastercard’s research highlights a shift in consumer dispute behavior: when consumers dispute a transaction and later realize it was a mistake, many do not rectify their error and reverse the dispute. Across 4,500 surveyed consumers, 775 admitted to disputing a transaction, and up to 37% admitted to not correcting a mistaken dispute (consumer fraud originates with). Convenience remains the driving force for consumers, who increasingly turn to their bank first when a transaction looks questionable rather than contacting the merchant. In fact, 76% of consumers prefer resolving disputes through their bank rather than the merchant. This removes the merchant’s ability to resolve the issue and avoid costly chargebacks, creating higher operational costs and risk exposure. This is especially problematic considering ClearSale estimates that 40% of consumers who request a chargeback will do so again within 90 days.  What could be causing more consumers to use the dispute process?  Mastercard’s consumer research sheds light into the shift of behavior. Among Gen Z, 26% admitted they did not contact the merchant or app to return funds after realizing the dispute was wrong, compared with 22% of Millennials and 18% of Gen X. What’s driving this trend? Globally, chargebacks are on the rise, projected to reach 324 million transactions by 2028, a 24% increase over 2025 estimates, according to Mastercard. So, what is driving this trend? Economic pressure  U.S. household debt reached $18.39 trillion in Q2 2025, with credit card balances at $1.21 trillion (up $27 billion in a quarter). At the same time, 39% of households report declining income, and 70% expect a recession within 12 months. These pressures make short-term financial relief, even through disputes — tempting.  BNPL and buyer’s remorse  Buy now,pay later (BNPL) usage is surging 52% of U.S. consumers have used BNPL in 2025, and Gen Z leads the trend, with 59% opting for BNPL. The average BNPL borrower originated 9.5 loans in a year, often stacking multiple loans across providers. This creates a cycle of deferred pain and buyer remorse, which can lead to disputes. Lack of transparency and complex subscription models   One of the most significant accelerators of first-party fraud is the ease with which consumers can file disputes today. According to Mastercard's 2025 State of Chargeback Report, mobile banking apps and digital wallets have transformed dispute initiation from a multistep process into something that can be completed in seconds. If the consumer doesn’t recognize a transaction or the name of the merchant, they are able to raise a dispute in a couple of taps. Recurring billing models and complex subscription models also amplifies the problem. If a consumer forgets about a subscription service or doesn’t recognize a billing descriptor, this can lead to a dispute that could have been avoided with better transparency.  “Disputes are no longer just a backend operational issue — they’re becoming a frontline fraud vector. When consumers default to their bank instead of the merchant, context is lost, resolution slows, and chargebacks escalate. The opportunity now is to reintroduce transparency and collaboration earlier in the journey, so issues are resolved before they turn into costly disputes.” Gaurav Mittal, Executive Vice President of Ethoca at Mastercard Dispute systems designed for consumer protection can sometimes be misused, increasing the frequency of disputes. As card-not-present transactions grow, protecting against both third-party fraud and first-party fraud is essential.   The solution: tools consumers want — and merchants need Consumers aren’t opposed to security. In fact, 85% prioritize security over convenience, and 83% expect businesses to address their security and privacy concerns. They want visible and invisible protections that make them feel safe without slowing them down.  Merchants can meet this expectation, and reduce fraud, by adding intelligent safeguards at checkout: Behavioral biometrics: In Experian’s consumer survey, consumers ranked behavioral biometrics among the most trusted methods (72% feel it’s secure). These tools analyze typing speed, mouse movement, and hesitation patterns to distinguish genuine users from bots or fraudsters, invisibly and in real time. Physical biometrics: 76% of consumers trust physical biometrics (fingerprint, facial recognition) more than passwords. Offering biometric login or checkout options gives consumers confidence while reducing reliance on vulnerable credentials.  Passive identity verification: Experian’s patented account ownership verification matches payment card numbers to identity attributes without requiring extra input. This protects merchants from stolen card fraud while keeping checkout friction low. Device and network intelligence: Secondary device checks and network analysis can silently validate identity during guest checkout or BNPL flows, reducing risk without slowing conversion.   Enhancing transaction clarity: Consumers are open to sharing more data for security: 77% would share more when shopping online, and 76% with financial institutions. Secure, real-time data exchange between merchants and issuers, such as through Mastercard’s First-Party Trust program, can strengthen fraud detection and reduce false declines.  Better purchase recognition: Improving purchase recognition in digital banking apps can help reduce disputes caused by consumers confusing their own transactions. Providing clear purchase descriptors, itemized receipts and better subscription management gives users the details they need to understand their purchase history and prevent first-party fraud.  “Reducing first-party fraud isn’t about adding friction; it’s about adding clarity. When merchants can surface the right information at the right moment, they not only prevent disputes, but they also strengthen trust and protect long-term customer relationships.” Gaurav Mittal, Executive Vice President of Ethoca at Mastercard Closing thought  First-party fraud’s impact extends beyond operations, affecting profitability, customer trust and brand reputation. Merchants that act now to strengthen checkout security with visible and invisible protections will reduce losses, protect trust and deliver the seamless experiences consumers expect. Learn more Read part 1

Published: June 15, 2026 by Charles Hunter
Fuel Type Choices Continue to Reshape Vehicle Registration Trends

Electric vehicle (EV) registration growth has become a common topic of discussion throughout the automotive industry for the last few years, but the bigger story may lie in what consumers are choosing when they return to market for their next vehicle. According to Experian’s Automotive Market Trends Report: Q1 2026, the bulk of EV owners (72.6%) purchased another EV, while 17.7% replaced their EV with a gas-powered vehicle and 5.6% switched to a hybrid this quarter. A similar trend was seen in hybrid owners, as 54.9% remained loyal to the fuel type through the quarter, while 32.7% replaced their hybrid with a gas-powered vehicle and 7.5% switched to an EV. Notably, 78.2% of consumers with gas-powered vehicles stayed with the same fuel type, with 5.6% swapping their gas vehicle for a hybrid and only 4.5% transitioning to an EV through Q1 2026. These purchase styles suggest that while most consumers are not making a direct leap from gasoline to fully electric vehicles, some are beginning their electrified journey through hybrid ownership. At the same time, the high rate of fuel-type loyalty across all powertrain categories highlights the importance of the ownership experience. Consumers who are satisfied with their current vehicle can often be inclined to remain with the same segment rather than exploring alternative fuel types. New vehicle registration trends reflect changing consumer preferences Looking at the new vehicle registration data from a broader level, gas-powered vehicles experienced a slight uptick, coming in at 69.5% through Q1 2026, from 67.3% last year. Meanwhile, hybrids continue to grow, going from 12.1% to 13.5% year-over-year while EVs steadily decline from 7.8% last year to 5.6% this quarter. As consumers weigh their next vehicle purchase, many seem to be sticking with the standard gas-powered choice, and others are finding a happy medium in hybrid vehicles. And while EVs receive much of the industry’s attention, buyers are exploring alternatives that allow them to adopt the electrified vehicles incrementally rather than all at once. To learn more about vehicle market trends, view the full Automotive Market Trends Report: Q1 2026 presentation on demand.

Published: June 12, 2026 by John Howard

Subscribe to our thought leadership

Enter your name and email for the latest updates.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Subscribe to our thought leadership

Don't miss out on the latest industry trends and insights!
Subscribe