Listen to the podcast:
Every week, we talk about important data and analytics topics with data science leaders from around the world on Facebook Live. You can subscribe to the DataTalk podcast on Google Play, Stitcher, SoundCloud and Spotify.
This data science video series is part of Experian’s effort to help people understand how data-powered decisions can help organizations develop innovative solutions to drive more business.
To keep up with upcoming events, join our Data Science Community on Facebook or check out the archive of recent data science live videos. To suggest future data science topics or guests, please contact Mike Delgado.
Here’s a full transcript:
Mike Delgado: Hello and welcome to Experian’s weekly Data Talk, a show where we’re featuring data science leaders from around the world. Today we’re chatting with Matthew Dubins and he is the founder of Donor Science Consulting, super excited to have him. Friends, he is doing work in the non-profit space and so it’s always an honor to talk to people who are doing data science for good, helping organizations get the money that they need, and doing analytics in the background. Matthew, it’s an honor to have you in our chat today.
Matthew Dubins: Thanks Mike, it’s an honor to be here.
Mike Delgado: So Matthew I always like to ask our guests a little bit about their journey. You know, we have a lot people in our data science community that are always curious about what led you to start to work in data science.
Matthew Dubins: Sure, happy to talk about that. Really all of this started in my undergraduate psychology degree in my stats class, my Intro to Statistics. I think when I was in that stats class I really loved the way that it was able to transport me to so many different subject areas, subject matter areas. Such that I could use statistical analysis to learn about so many different things. It was really quite an interesting and freeing feeling.
You know, not only did I find that it enabled me to learn about some of the different subject areas but I loved the way it enabled me to create new understandings of those subject areas. Also along the way I really just loved how helpful data visualization was in a pursuit of creating and conveying those new understanding. Really later in the work I found that I really enjoyed the way that data analysis and data visualization enabled me to share those understandings across the organization. Data analysis wasn’t spoken of as data science at that point but the term grew on me. Data analysis really became a driver of change and I loved that.
Now another thing to mention about the journey to data science was that I really fell in love with the way that statistical programming, such as what you would get in SAS or R or Python, enabled me to carry out analyses in a scripted and repeatable way but also I really loved the way that it enabled me to track each and every one of the logical steps that was taken along the way to a particular analytical result.
Let me tell you, you know using the point and click interfaces such as SPSS just led to forgetting what I did along the way. Obviously if someone was interested in a particular analytical result than how I got there it would have been a lot more difficult to describe my process after having used a point and click interface versus a more scripted interface like SAS or R and what that enabled me to do. Once I realized the power of statistical programming I never looked back.
Mike Delgado: Along the way you mentioned how taking that statistics class was really the launch pad for you where you kind of fell in love with data and analytics so along the way you ended up picking up these different languages. Can you kind of talk a little bit about that?
Matthew Dubins: Sure, sure. I think the most amusing part to talk about with regards to picking up languages is that first and foremost in that stats class when they told me we were going to learn to use SAS to do statistical programming I go so afraid. I go so afraid, I just though oh my god, this sounds overly complicated, I don’t know what I’m gonna do, is there anyway that I can escape?
Mike Delgado: “Can I take some extra credit? What else can I do to avoid this?”
Matthew Dubins: Yeah, exactly, exactly. But you know eventually I bit the bullet and I started learning SAS. I found that there was something very logical and even supportive of statistical programming where you could be very thoughtful about the analysis that you were doing, you know plotting it out in code, making sure that what you’re doing was logical, even getting error messages along the way. More and more the practice of statistical programming really grew on me.
Now of course because we’re talking about SAS currently I have nothing intrinsically against SAS apart from the fact that it’s expensive. SAS, while it was lovely for doing data analysis on the one hand, on the other hand it was not a portable software to bring with me, it wasn’t a portable skill because after my undergrad class, when we were learning SAS, nobody was willing to foot the bill for paying for it.
That’s why it was so important after my undergraduate career I was done but that my mentor or one of my main mentors in the world of data analysis who’s name is Michael Friendly of York University he introduced me to R and so obviously by this point I wasn’t in creative statistical programming anymore.
I’d already gotten my feet wet and so to be introduced to R, which was obviously something that was free and something where the boundaries of what you could do with it were not easy to see. It was such a flexible paradigm with so many different packages for doing so many different kinds of analysis and then later obviously data visualization this really opened up doors for me. It was [inaudible 00:08:26].
Something about statistic after my undergraduate career was done and it enabled me to carry something so useful, so effective, and so powerful with me wherever I went and this was just enormous and like I kept learning about R, I kept growing in R, eventually I discovered Ggplot2 which was just fantastic and awesome. Really the best development for me in my R learning over the years or in recent years I should say was learning Shiny, learning how to make web apps with R was just awesome. I really can’t stress enough how great R has been to me and how great the packages that are made available to people can be to other like our studio server. Obviously I had already mentioned Shiny and so forth. It’s just great, it is just great.
Mike Delgado: That’s awesome. Before we can jump into all the work you’re doing with non-profits I wanted to ask one question because you talked about the fear that you had originally when picking up SAS. Can you kind of encourage those listeners here in our data stats community who are on their way, maybe they’re taking their first class in coding, maybe it’s R, maybe it’s Python, but they’re sensing that fear. What would you say to them?
Matthew Dubins: What would I say to them? I would say you know, you probably were afraid of jumping into the deep end of the pool at first and eventually someone convinced you that it was worthwhile enough and you had to face your fear and you had to dive in. You know what? In terms of diving in yes there’s a lot to learn but there’s also in both the R and Python world there are very supportive communities that have been built around data analytics or data science using both of these platforms. You don’t have to be alone, number one, and number two on the other side of this process is a lot of power to do good.
Mike Delgado: I like that, I like that. Which leads us into your move from acidemia into doing what you’re doing now, you’re the founder of Donor Science Consulting.
Matthew Dubins: Yes.
Mike Delgado: Can you talk a little bit about what led you to start up a company focused on helping non-profits?
Matthew Dubins: You know the whole non-profit [inaudible 00:11:56] history really comes down to the luck of the draw because my first job after grad school was at a non-profit. I wasn’t even looking specifically for work in non-profits but that was the first organization that employed me and over the years I picked up enough skills and experience that I eventually started to think about the idea for my current business. I just loved the idea that I could take what I know and bring it to do good, not only in Canada but in the US as well. Because of the age that we live in so much work can be done remotely, it’s almost ridiculous how much work can be done remotely.
Mike Delgado: When you ended up leaving that company and starting your own can you maybe share a couple different data science projects you worked on that you’re really proud of that you’ve helped non-profits with?
Matthew Dubins: Sure, I’d be happy to. One project that I really enjoyed that I was really proud it wasn’t even a highly complicated rocket science-y project, it was mapping hearing loss prevalence in Newfoundland and Labrador on behalf of the Canadian Hard of Hearing Association.
What I had done was I had used a Canadian hearing loss prevalent study that showed prevalence of hearing loss by age and sex. I used that study to estimate hearing loss prevalence with reference to age and sex demographic data in areas across Newfoundland and Labrador. It’s a census area or a census zone called dissemination areas, you have it in the US as well, I’m blanking on what the specific term is but it’s a little bit bigger than a zip code, a census tabulation area I think so we’ve got the same thing here it’s called dissemination areas.
Anyways so I had used demographic data by dissemination area to estimate these hearing loss prevalence across Newfoundland and Labrador and then I visualized on a map of the province. Then this map ended up becoming pert and partial of CHHA’s way of targeting the services that they provide in their lovely province. Yeah that was one project that I did, I really enjoyed myself with that project and they got a lot of value out of it as well.
Moving along to another project that I did, this was creating a couple of major giving predictive models for a faith based social services agency in Alberta called the Mustard Seed. Now the idea with a major give predictive model for those who don’t know to predict the likelihood that any one donor will be ripe for solicitation or ripe for being asked to give frankly a major donation, which could mean a donation of 5,000 or more, 10,000 or more, or so forth. In this case it was giving at the level of 10,000 and higher over the period of five years.
Now the biggest challenge with this particular project was to communicate a complex process to folks who frankly they don’t have the time or expertise to understand a lot of complexity. The most important factor in overcoming this challenge and I’m still actually involved in this process has been to help the client to understand the simplest elements involved in acting upon the analytical results. You know, as a data guy it’s so easy for me to talk at length about nitty gritty detail, oh my god. All too easy and like I love to do it and I have to catch myself.
Mike Delgado: Because you love the data and you just want to talk about it.
Matthew Dubins: I do, I do. But-
Mike Delgado: And the client’s like, “Well just tell me what you’re gonna do.”
Matthew Dubins: Yeah basically and yeah for them really what they need is to be told what the next steps need to be. Yeah that was the second project that I’ll talk about. Now the third project I’m happy to talk about was an in depth descriptive analytics and data visualization regarding the Giving Tuesday movement for the 92nd Street Y.
This was a super, super cool opportunity and I really enjoyed my time with this project. Now the objective of this project was to assess the effects of the Giving Tuesday movement on the individuals and that charities participating in it. Is there something beneficial, do they see a benefit over and above what might have been typical in terms of the donation behavior?
Now I have to say the biggest challenge in this project was obtaining the most reliable data and also gaining a good enough understanding of these data assets so that the results analysis was both true and accurate. Because yeah it’s as they say garbage in, garbage out.
Thankfully we were able to arrive at reliable enough data assets and also a good enough understanding of those data assets to arrive at a reliable enough and true enough analytical reports. To get there required frankly a lot of phone meetings and a lot of back and forth emails. Oh boy so hopefully that’s a good enough answer to your question.
Mike Delgado: Yeah, that’s fascinating. Those are three really distinct, very different projects, different problems involved. Tell me about when you are first meeting with these different organizations. Obviously there’s gonna be people, maybe in leadership, that don’t have like the data science background that you have, they actually have a challenge, they’re presenting it to you, and you like you were saying like you’re thinking about all the complexity, the data that I need to collect, the analytics, the programs I need to work with, but then you were just mentioning how you need to kind of solidify for the client to tell them, “Okay here are the next steps, here’s what I need from you in order to do this right,” how are you like I guess talking with these different leaders and explaining it in a way that is gonna be understood in a business context?
Matthew Dubins: In the initial stages you’re talking about?
Mike Delgado: Yeah.
Matthew Dubins: You know I think in the initial stages I don’t have as much of a problem or I tend to not to have as much of a problem because usually I’m able to convey the requirements of the project in a very sort of concrete and helpful way because at that stage it’s usually okay. I’m going to need database dumps from such and such a table, such and such another table, I’m gonna need from these tables these particular fields of information, I’m going to need records that pertain to a period or such and such a group and so forth.
These particular requirements are often easy for me to boil down into very concrete and understandable details. I think that for me the difficulty might be talking about the bumps along the way of the analysis, the exceptions that I had to make, the various filtering criteria that I had to engage in possibly like the algorithms that I had to use, and any sort of like quirky results that came up along the way. Even despite having given them very like concrete and simple requirements they might end up conveying to me data that is garbage-y and that really needs to be rehashed in some kind of a way. Yeah the initial stages not overly complicated in my experience, it’s everything that comes after that ends up becoming challenges that have to be overcome in some shape or form.
Mike Delgado: So I want to kind of dig a little bit deeper here as some of these challenges. You mentioned some of the data requests you’re making, a lot of them might be labeled very structured data, can I talk about maybe some challenged where you’ve gotten some unstructured data that you’ve had to work with or noisy data that is not helpful that you had to wrangle to make useful?
Matthew Dubins: Now in terms of any of the analysis that I’ve done I mean it’s all from structured data sets. I wouldn’t actually be able to speak to analysis of unstructured data within the confines of my business. You know in terms of noisy data oh we could talk about you know the maybe one or two noisy data assets that I was trying to make use of for the Giving Tuesday project. In that case the important thing in order to make sure that the data that I was using wasn’t noisy was to do a lot of very basic aggregate data summaries so number of donations by day, number of donations by organization, you know number of donations on weekdays versus weekends, and you know number of charities per donor and so on and so forth.
Yeah I mean when you’re dealing with humongous data sets and you don’t realize that the data set that you’re analyzing is garbage and then once you get to the stage much later on in the analysis when you finally realize that the data set that you’ve been dealing with all along is garbage it’s like one of those raising your fist in the air and saying, “No!”
Mike Delgado: All that work.
Matthew Dubins: All that work for nothing. Yeah no it becomes so important to use basic descriptive statistics to really comb through the data set that you’re working with in order to boost your own confidence that you’re not wasting your time. I mean that really to me is something that should be a lesson to anyone in the data science world is investigate your data with the most basic descriptive analytics as possible just to make sure that it’s clean.
Mike Delgado: For a lot of the charities and non-profits that you’re working with are a lot of the projects dealing with helping to get more donations? Can you kind of talk about the different sorts of product requests you’re getting?
Matthew Dubins: Oh boy so it’s definitely not a cookie cutter situation that’s for sure. For that reason I offer a variety of different possible services. I mean one possible service is just address correction to be honest.
Mike Delgado: Okay yeah, make sure they’re reaching the donors.
Matthew Dubins: Yeah, yeah. Like if you’re mailing packages out to invalid addresses you are wasting your money. Like I’ve partnered with another company that does really quick and easy address correction, they’ve been in the business for like 30 years and so when a charity needs to clean their file they send it to me, I drop it on my partner company’s servers, they’ve got an automated process that goes to town on the file, and then they return it back to me and I’m able to give them a cleaned address file so that’s one thing.
Another thing is data base segmentation.
Viewers of this episode may or may not be familiar with RFM segmentation or recency, frequency, and monetary. It’s just all about putting or segmenting a database according to the transactional behaviors of the individuals in that database. Obviously this is something that isn’t specific to the non-profit world, in fact I’m pretty sure it started in the corporate world and was ported to the non-profit world. In this case instead of segmenting customers you’re segmenting donors according to their donation behaviors, very simple right?
You know there are non-profits who aren’t even doing this and for them to gain an understanding of the activity levels in their database could very well be a quantum leap where they know what to focus on or they know who to focus on I should say, they have an idea as to what people are doing which if you don’t is a very big step, and then they have some clue as to what kind of language they should be using when communicating with these people because after all if you’ve got large segments of donors who haven’t given in two years then to send them out a letter package saying, “Thank you for your continued support,” is kind of silly.
Mike Delgado: Yeah I get those emails and those letters, I know what you mean. They’re obviously not using your services.
Matthew Dubins: Yeah exactly, exactly. What I have going for me I actually have a web app that I created with R and Shiny to do automated RFM segmentation where all they have to do is submit a gift history file or a donation history to the web app, the web app goes to town on the file, and then subsequently labels each donor according to their donation behavior on these recency, frequency, and monetary dimensions.
We’ll even labeled them according to how many years that they’ve been on file for which in and of itself can be very helpful. It gives a summary of those labels so it’s just a very simple summary statistics on an interactive filterable searchable table and then just allows them to download a file with the donor IDs and the RFM characteristics applicable to each donor ID. Then they can use that in their direct marketing efforts. That’s another thing I do.
Yet another thing that I do I actually believe it or not have an automotive predictive modeling app.
Mike Delgado: Really?
Matthew Dubins: Yep.
Mike Delgado: Oh wow, so tell me about that.
Matthew Dubins: This was also done in R and Shiny and so the idea here is so the participating charity or a non-profit facing agency can submit a gift history file to it containing donor ID, gift date, gift amount, appeal code or campaign code, first gift date, appeal type or campaign type, and fund type or fund allocation.
Using all of those columns of information for each gift transaction the app then creates three predictive models that predict each donor’s annual giving likelihood of giving again next year, of upgrading their giving level next year, and also quote unquote reactivating which just means giving again in the following year after one or more years of not having given.
Then using the scores that come out of this process the app them creates these very simple, plain english recommendations as to what the charity should do with each donors in the following calendar or fiscal year. The recommendations can be renewed same ask, renew upgrade ask, or reactivate same ask, or my favorite is solicit with extreme caution.
Mike Delgado: Nice. Matthew, tell me about like what are some of these data or the signals that help to categorize these different people especially like the people who are more likely to give more. Like what in the data helps to predict that?
Matthew Dubins: I mean this could be something like how recently the person gave, how loyal that person’s pattern of giving behavior was, that’s a very, very important one. It could be that person’s favorite charitable appeal to give to, and that obviously varies from one organization to the next. It could be how long the person has been on file for so if you’ve got an active donor who’s maybe been on file for 20 years that person might be highly, highly likely to upgrade their giving year. It could be what their favorite appeal type was or what their favorite fund to allocate their donation to was.
Now all of these things represent variables which are different from one organization to the next. The beauty of the app is that it will actually figure out what are the most relevant factors for each charity to predict donors who fall into each of these groups. It’s not a cookie cutter but rather it uses machine learning to arrive at these recommendations.
Mike Delgado: Matthew where can people go to get these apps?
Matthew Dubins: Well they can contact me and obviously my website is www.donorscience.ca or I mean they could email me or contact me over LinkedIn or what have you. You know the apps represent another service product of mine. You know they’re not open to the public, I mean obviously it’s the way I’m supporting myself and my family more importantly. I’m happy to grand access to anyone who wants to use them frankly and we can talk more.
Mike Delgado: Yeah that sounds great. Last question Matthew would be aside from all the data and analytics that you’re working on can you talk a little bit about any sort of data visualization tools that you’re using to help tell the data story?
Matthew Dubins: In terms of data visualization tools what I actually really love to use is actually our mark down. I mean we’re talking about like reproducible HTML based reports where I can then interleave tables and graphs and my own commentary on what I believed analysis to mean and how to use the analysis and such. I mean in terms of the specific data visualization tools I love using Ggplot2, I love using Plotly, and I like to make liberal use of bar graphs to be honest. I mean sometimes I use line graphs but to me bar graphs are key, they might be boring to some people but I’m sorry you will not catch me ever using pie graphs, you just won’t.
Mike Delgado: There’s a history of not so much love for the pie chart on this Data Talk show.
Matthew Dubins: I’m glad.
Mike Delgado: That’s so funny.
Matthew Dubins: But yeah bar graphs, sometimes scatter plots as well. I do like scatter plots I don’t often find the opportunity to use them. I also think that anytime I have the opportunity to use them thematic mapping is also really fun and helpful so you know where you’ve got you know dots on a map or specific areas in a map colored according to some kind of numeric variable, those are fun too.
Mike Delgado: I know that we’ve got to get going but Matthew can you share with everyone again where they can learn more about you, your services, your apps, everything else?
Matthew Dubins: Sure, they are welcome to visit my website www.donorscience.ca.
Mike Delgado: Wonderful. For those listening to the podcast make sure you check that out and also if you’re interesting in connecting with him on LinkedIn or following him we have links to both datascience.ca as well as links to his social profiles on our Experian blog, that URL is just ex.pn/datatalk49 and we’ll have a full transcription of today’s video there and obviously the podcast and just also let you know I’m very excited but we also have our podcast is now available on Google Play, Stitcher, Spotify, and pretty soon iTunes so we’re very, very excited about that. As you know we do this Data Talk every single week featuring different data science leaders from around the world.
Matthew Dubins, it’s been awesome to chat with you. I’m grateful for all the work you’re doing to help non-profits. If you are a data scientist looking to get involved in helping non-profits definitely reach out to Matthew, he can help steer you in the right direction. Matthew thank you again for your time today.
Matthew Dubins: Oh you’re welcome so much and thank you Mike.
Mike Delgado: You’re welcome, take care.
Matthew Dubins: You too.
Besides his wife and their two young daughters, nothing brings Matthew Dubins greater joy in life than making connections and helping charitable organizations realize their full potential. It’s been both his life’s work and his passion. That’s the inspiration he brings to work for Donor Science Consulting and yet his path to present has been anything but conventional.
At York University, on his way to his two degrees – a Bachelors’ in Psychology and Sociology and a Masters’ in Experimental Psychology, Matthew developed an interest in what motivates humans. Combined with scientific thinking and methodology, he discovered both a passion and mastery for data mining, which led him to work with the Canadian Breast Cancer Foundation – a place he credits for introducing him to the interesting and instructive world of non-profit fundraising.
From there, Matthew went on to KCI as a consultant. The years he spent with KCI provided him with an invaluable foundation of where he is today. There he learned about meeting the needs of clients in terms of data discoveries and putting them into action.
More recently, Matthew has worked with Blakely and Cornerstone, honing his understanding and methods of employing data for the benefit of non-profits. Ready to spread his wings, Matthew proudly brings his experience and knowledge to work for Donor Science Consulting.
Check out our upcoming data science live video chats.