Measuring data performance

Meat and potatoes
Data are the meat and potatoes of fraud detection.  You can have the brightest and most capable statistical modeling team in the world.  But if they have crappy data, they will build crappy models.  Fraud prevention models, predictive scores, and decisioning strategies in general are only as good as the data upon which they are built.

How do you measure data performance?
If a key part of my fraud risk strategy deals with the ability to match a name with an address, for example, then I am going to be interested in overall coverage and match rate statistics.  I will want to know basic metrics like how many records I have in my database with name and address populated.  And how many addresses do I typically have for consumers?  Just one, or many?  I will want to know how often, on average, we are able to match a name with an address.  It doesn’t do much good to tell you your name and address don’t match when, in reality, they do.

With any fraud product, I will definitely want to know how often we can locate the consumer in the first place.  If you send me a name, address, and social security number, what is the likelihood that I will be able to find that particular consumer in my database?  This process of finding a consumer based on certain input data (such as name and address) is called pinning.  If you have incomplete or stale data, your pin rate will undoubtedly suffer.  And my fraud tool isn’t much good if I don’t recognize many of the people you are sending me.

Data need to be fresh.  Old and out-of-date information will hurt your strategies, often punishing good consumers.  Let’s say I moved one year ago, but your address data are two-years old, what are the chances that you are going to be able to match my name and address?  Stale data are yucky.

Quality Data = WIN
It is all too easy to focus on the more sexy aspects of fraud detection (such as predictive scoring, out of wallet questions, red flag rules, etc.) while ignoring the foundation upon which all of these strategies are built.