Computer Vision: Teaching Computers to See w/ Dr. Ryan Compton at Clarifai @rycpt (Episode 8) #DataTalk

Listen to the podcast:

Every week, we talk about important data and analytics topics with data science leaders from around the world on Facebook Live.  You can subscribe to the DataTalk podcast on iTunes, Google PlayStitcherSoundCloud and Spotify.

This data science video and podcast series is part of Experian’s effort to help people understand how data-powered decisions can help organizations develop innovative solutions and drive more business.

To keep up with upcoming events, join our Data Science Community on Facebook or check out the archive of recent data science videos. To suggest future data science topics or guests, please contact Mike Delgado.

In this #DataTalk, we had a chance to talk with Dr. Ryan Compton about deep learning and ways to teach computers to see.

Here is a complete transcript:

Mike Delgado: Hello and welcome to Experian weekly data talk, a show featuring some of the smartest people working in data science. Today, we’re talking about the importance of computer vision, and ways to teach computers to see. We’re very excited to feature Dr. Ryan Compton who is the head of applied machine learning at Clarifai. Clarifai is a company that helps businesses get more value out of their images and video. To learn more about them, check out the URL on the screen. Clarifai has clients like Unilever, West Elm, BuzzFeed, Photo Bucket, etc. Make sure you check them out.

Ryan earned his PhD in applied mathematics from the University of California, Los Angeles. Ryan, it’s an honor to have you in our chat today.

Ryan Compton: Hey, thanks for having me.

Mike Delgado: I thought it would be great if you could share a little bit about yourself. What led you down the path to start to work with computers and helping with machine learning, to teach computers to see, and also about your work at Clarifai.

Ryan Compton: Yeah. My background, as you said I did a math PhD years ago. When I was working on that, after school I had to get some type of job, and there was this event around 2012, 2013, where I think society had renamed applied mathematics and statistics as data science.

I started looking for work in data science, and I found some really cool opportunities around optimization around image processing. Simultaneously, neural networks took over. A lot of the work I did with optimization, a lot of the image processing I had been doing, fit really well with the kind of stuff that people have to solve to make a neural network work really well.

My job at Clarifai right now is to kind of improve our neural networks as much as we can by improving the data that they use to train. To keep us competitive it’s really important to make sure that we train off of data sets which are not public that everybody’s got available. That’s what I think about now and it’s actually a lot of fun. More often than not the technology in today’s world is so good that it surprises me and that I do things which I never even thought were possible in the past. It’s a great experience working in such a high tech field.

Mike Delgado: That’s awesome. I was watching a bunch of your lectures on YouTube and I loved hearing your story on how you trained a model to help you monitor your baby in the crib. Could you share a little bit about that story? Because I thought it was a beautiful example of how computer vision technology has helped you as a dad.

Ryan Compton: Yeah, yeah. Actually, about a year ago, there was a period of a month or so where my baby was staying with his grandparents, my in-laws, in Los Angeles, and I live in New York. He was visiting them, and I missed him, so I wanted to watch him in the crib and see when he wakes up and see what he’s doing, and when he goes to sleep. I bought a Nest Cam, and we installed the Nest Cam at the house, and the Nest Cam was doing an all right job. I could log into the internet and I could see what he’s up to and he could talk and I would hear, and it was okay.

But the problem was that most of the time the crib was empty and it’s not that interesting. It’s just like an empty crib, nothing’s going on, and I just didn’t want to look at that. I wanted to get some kind of notification when he’s in there, when he’s doing something.

Nest has this setting where you can tell it motion was detected, and when we see motion detected it’ll send you a notification on your phone, look at the camera, something’s happening, so I turned that on, and I would get a ton of results. I would get motion is detected in the bedroom, look at these candles. I would see motion is detected and then I would wonder, okay, what’s going on in the baby’s room, but what do I want to see? Then, I don’t know if I can show you on my screen, but there’s like a door behind the crib, and I would get in the words and then it’s my father in law walking around in a tow. Then it was like, oh, I get notification all the time every day. It’s just way too sensitive and it’s not the kind of thing that is really pushing the limits of technology.

What I did was, I figured okay, computer vision can probably solve this. I took a Raspberry Pi instead of a Nest Cam and I hooked it up and I was sending every few seconds a photograph to a neural net that I built using my company’s platform. What it would do is when the baby is sitting up or sleeping or standing or out of the crib, it would be able to tell me what he’s up to. Then I could actually see, all right, the baby went to sleep, or I could see oh, the baby just stood up, and then I could turn on the camera and look at him. It was a lot more fun than the other solution I had.

I thought that was pretty cool. I think that putting machine learning in baby trends is pretty great. I’m still learning to hear little, process all kind of diseases, but I think he’s okay with it now.

Mike Delgado: Yeah, that’s awesome. I love that, how you use computer vision just to help watch your baby, I think that’s so cool. You know, we hear stories all the time about computer vision being used in cars, and we’re seeing leaps and bounds in what’s happening with automated vehicles and how vehicles can see a rock in the road versus a paper bag in the road. I’m curious about, what are some of your favorite use cases for computer vision from what you’ve seen, especially the work that you’ve done at Clarifai?

Ryan Compton: Yeah. The really interesting stuff that we do here, I would say, is very basic content moderation. There’s a ton of problems when people are able to sell things, like on Craigslist or any other site where there’s marketplaces, people always try to sell guns and drugs and they put up spam and porn and they always do it and it’s really easy to just immediately filter that with a computer, versus having some humans do that all day long. There’s tons of places where right away you see a job which is not that fun and right away automate it. I think that’s really interesting. For that stuff, it’s very fun.

Then for other kind of problems that we work on, I really actually think some of the stuff we do, like the baby cam, where we have a device that is sort of a security camera, or monitoring some kind of zone, becomes a lot more perceptive when you’re using actual computer vision rather than just simple image processing to get a look at something as well. We’ve worked with people who have security cameras around buses or security cameras in other settings, and rather than just tons and tons of empty blank video we see a lot of really exciting events and only that, which is great.

Mike Delgado: Awesome. Also, I was watching one of your lectures and you were, I think this was more recent, you were talking about how you were helping systems identify not safe for work imagery, and obviously that’s something that’s really important for businesses that are trying to make sure that the user generated content on their blogs or websites or whatever is being managed properly. I’m wondering if you can kind of share a little bit about that tool and how you guys developed that.

Ryan Compton: Yeah, yeah, for sure. That’s a really interesting project because there’s a lot of computer vision that you can do sort of out of the box by following an academic paper and downloading an academic data set, and then you’re ready to go. Moderating not safe for work photos, there’s not really an academic dataset that you just can download and then set it up and send it out to customers. The first step in actually building that product was getting huge amounts of data from a not safe for work category, and then getting a lot of data from the safe for work category. Then you just take the not safe for work data and the safe for work data, combine it together, and fill it with our neural network. What you’ll get is a function that will predict which category you’re in.

Now, the big issue is that the data that you get is going to be biased towards the type of data, the predictions that you make are going to have a bias towards the data that you collect. When I went out to build the system, I thought I found what was not safe for work because in my mind I was thinking, this stuff is not safe for work. Then I thought, here’s other stuff that’s safe for work.

Then I built this thing, it worked really well, I sent it to customers and they come back and they say actually, no no no, our editorial policy is completely different. Here’s this picture. It’s weird stuff, especially with this model. They have these illustrations and all kinds of stuff which I had not really thought about before. It’s completely different than the data that I collected, and then I have to iteratively go back and forth and say okay, why did my network say this about this image? How do I debug it? Because the type of things that are in the wild are never the things that you collect when you build your network.

For this problem it was really obvious that was the case. Every time that we interact with someone we always have to do some kind of customization to really make sure that the predictions we make will align with their editorial policy.

With that I think it’s a very subjective problem. With something like, is this a picture of a car or is this a picture of a cat? Some things are cars, some things are cats. It’s usually pretty clear, but when it’s something contextual, it’s always a little bit harder and I think that problem is a lot of fun for that reason.

Mike Delgado: Yeah, it’s so funny that you mention about, as you’re going through data sets to train the machine on what to look for, there’s all these things, like you mentioned, in the wild, that you never expect. Like you said, maybe graphics or things that you had not planned for. It just shows to the point that automated object detection is very difficult for computers, especially because when it’s in the wild you don’t know what people are going to be submitting, what types of images, media, etc.

Can you talk a little bit about how data scientists are training systems to recognize patterns in pixels, to identify these unwanted images, whether there’s nudity in comments, like the stuff you guys did for Disqus. I’m just kind of curious the steps you guys took to do that, because it’s really amazing and remarkable the things that you guys are doing at Clarifai.

Ryan Compton: Yeah. The way that we handle stuff in the wild that we haven’t seen before, or even the way somebody handles stuff in general, there’s this framework of machine learning called transfer learning, where we re-use a lot of an existing network and then you just adapt the last part of it to fit the taxonomy of the data set that you’re interested in. When we’re working with somebody who’s got wild images totally different than what we trained on, we actually have a product right now as well as a team of people who curate data that will help kind of get a data set good to do what we call custom training and make a classify that works specifically on that data set. When somebody comes in and all they have is illustrations, we can customize our model to work best on that data.

When we worked that Disqus hackathon, I wasn’t involved with that. I think they used out of the box our solution for not safe for work, which works pretty well. This, the wild problem, I don’t think they get it as severely as some of the other people we run into, but there’s definitely situations where somebody will have like, a dating website, and everybody is one ethnicity and because everybody in that group, just the underlying distribution of images there, is kind of different from all of the train imaging that we looked at, we sort of have to customize for that.

The way we actually do it is, as the customers send in those data we have a team of people that curate the data and then will adjust the last part of the neural network to work on that data. We use transfer learning, we call it custom training, we have a fantastic user interface that people can click on things and search, it works really well. I’m very impressed with actually some of the results we get with very minimal training time.

Mike Delgado: Ryan, I’m curious about the data set sizes that you need to help with training these models. How big are these media data sets?

Ryan Compton: Yeah, to train a whole neural network from scratch, if you don’t have a preexisting network, you can get pretty good results whenever you have more data. I think if I have less than a million images, I get a little bit nervous that it’s going to potentially not work very well. There’s a useful paper out of Google, I think it’s called “The Unreasonable Effectiveness of Data Set Sizes,” or something like this, and they had 300 million google images. There were 300 million google images, they trained the network, and they see it just keeps getting better and better and better.

This is really interesting because a lot of machine learning, if you go back in 10, 20 years, people talk a lot about model capacity, they talk about how adding more and more data, you don’t really necessarily get better results when you hit model capacity. There’s an interesting paper by Corinna Cortes about this where they investigate how much they ultimately benefit later on, but with neural nets, you just always get better. It seems like nobody ever hits their full capacity no matter how much you throw at it.

That being said, if you’re doing transfer learning and you can really use an existing network for your base because the base is just going to find things like edges and corners and circles and you’re always have to be doing that stuff, you can get better with much less. Our custom training platform, it will work very well with a few dozen images, and that’s what you have to do with Google byte. You can do a lot of tricks, so you use less video, but if you’re going to train a whole neural network probably like a million images.

Mike Delgado: I was reading through a lot of the use cases that Clarifai has done to help different industries, and one of my favorites was the one where Clarifai has helped doctors diagnose patients with visual recognition. I think that’s a beautiful example of how computer vision is helping people who are maybe suffering from an illness, maybe blindness. I’m kind of wondering if you can talk a little bit about that particular use case, because I just love when data is being used for good in that way.

Ryan Compton: Yeah. The use case right there, I think the one we talk about the most which is probably the most interesting is with a company called I-inside. What we have is a special camera, and this camera attaches onto an iPhone, and our network runs on the phone. There’s no connection to the internet, because they take this camera, they put it in people’s ears, and it’s used actually in sub-Saharan Africa right now, where they go into their ear, they take a photo inside of the ear, and then if it has this disease, I don’t know what the disease is but if there is a disease there it’s pretty obvious, that the rim is very gunky, or the ear is like okay. They can actually do this diagnosis without ever talking to a service on the internet, without ever actually even having to bring a doctor to look, you just have somebody go, take a camera, put the camera up to somebody’s ear, it takes the photo and prints the diagnosis. We’ve been working with them for a number of years now and it’s been great. It works really well.

There’s a few other places where we’re looking into not so much medical diagnostics, more like, kind of organizing archives of images for people. To actually get something like an MRI predicting is really specialized and it’s not a space that we’re in, but definitely organizing archives of data is something that we have done, and then I’ve have helped staff with training. With some of these practices we do dental x-rays, they can only make x-rays of this part, or if somebody’s interested in shoulder MRI versus head MRI, if there’s a large archive of images we can really easily get some of that.

Mike Delgado: I mean, I just love those examples of computer vision helping people who are disabled. I just have a lot of hope for the future, especially with the work you guys are doing, I think it’s beautiful.

Ryan Compton: Yeah. There’s a lot of really cool stuff to do, especially with finding objects and helping people who can’t see well. I think there’s a huge potential there. Self-driving cars, all kinds of stuff, is going to make life a lot better for a lot of people.

Mike Delgado: Ryan, do you have any concerns about computer vision being used for wrong purposes and any suggestions on how data scientists can help combat wrong usage of computer vision?

Ryan Compton: There’s definitely concerns about it. I think the way that, the thing that makes machine learning so exciting to me is if you’re a machine learning engineer and you make a decision as to what data you use to train a model or what type of, where you’re going to deploy a particular use case, you now get to use GPUs and all of this fantastic infrastructure to amplify your decision thousands of times over.

If I decide that something should be predicted a certain way, it’s not like, I can only say that about 20 things, I can actually now declare that and make that call 10 million times. There’s definitely concerns because we can do all kinds of sneaky stuff, or support all kinds of maybe unsavory people much more easily because your decisions can be so greatly amplified.

That being said, there’s a lot of really cool things that people are still doing. Data science and things like computer security are huge. There’s a lot of investment right now from VARPA and various agencies, not necessarily computer vision, but understanding attack vectors, how they’re coming into an organization and just automatically finding them and fighting them and stopping them. If you ever talk to somebody without works with computer security, there’s so many attack vectors everywhere that if you can’t quickly automatically identify them, you can have a serious, you can have problems. Pretty much every type of attack, statistically, once you can start to see it you can probably put together some sort of fix, but identifying attacks is a real way that data scientists can help influence.

Mike Delgado: Ryan, when a business is looking to choose a visual recognition API, what advice would you give that team or those senior leaders in that decision process, and also what would be some warning signs that they should look out for before selecting a company or API?

Ryan Compton:  You have to make sure that what you’re going to work with can adapt. One of the hardest things, I think, about throwing AI into a system is that if the AI that you’re using isn’t going to change and things change for you down the line, everybody is going to change down the line somehow, you’re going to need to change the AI if you change your taxonomy or if you change what you’re doing.

Of all the different vendors out there I think one of the most important things you look for is somebody who’s going to very easily change the pictures that they’re making to adapt to new things that happen to you. If there’s a warning sign, I think you’ll see pretty early on when you’re talking with them that what they’re doing is completely set in stone, that can be pretty obvious and some people just will not interact with you at all.

If you use a model, Google, or Microsoft, or Ali Baba, or something like this, I think Microsoft has a custom training now but definitely a lot of people they have models that are up there and they’re just completely fixed, or they need to be dynamically changed and that’s how they get control, it can really be a drag if it turns out that the kind of stuff that you need to recognize down the line doesn’t really mesh with what they’re capable of.

Mike Delgado: Ryan, we are coming close on the hour. I wanted to ask you just one more question and that is around computer vision and the work that you guys are doing. What really excites you about the future of computer vision and teaching computers to see better like us?

Ryan Compton: Yeah. I think one of the things that’s really exciting right now in the field is there’s this interesting kind of border between what’s easy for a human and what’s easy for a computer. A lot of stuff, especially in concept moderation, is extremely simple for a human, because you’re going to look at a photo and you’re going to have lots and lots of context around that photo. You’re going to see that this is an upsetting photograph because I know this political situation, I know what’s happening here, and because of your culture, because you understand things like gravity, there’s a ton of things that happen that make a lot of photos really interesting.

The kind of things that win photography interests have all of the context which makes them really work. Computers have no idea about that stuff, so if you’re trying to use computer vision to understand photos which are interesting for reasons beyond just the pixels, you’re kind of stuck.

There are a few people who are kind of combining the images with some text, trying to look at understanding photos beyond just the pixels, understanding the additional context that goes into a photo. That’s really exciting, because right now that’s something where humans definitely win. We know so much more about photographs when the photographs are interesting. When the photograph is just a cat, I mean, cats are kind of interesting but the computer knows it’s a cat. I think that this extra level of understanding context is a real exciting thing which over the next several years I think is going to make a lot of progress.

Mike Delgado: Awesome. Well, Ryan, it’s been an honor talking with you. We’re really grateful that you have shared your insights with us on computer vision today in our data talk. Where can everyone learn more about you and also your work at Clarifai?

Ryan Compton: Me personally, I have a blog,, that’s like, everything about me that is public is there. I guess you can find out my work at Clarifai, we do have a website,, there is a blog there which I post on sometimes. Both of those websites will get you everything that you need to know about me.

Mike Delgado: Wonderful. If you happen to be watching this as a later broadcast on YouTube, we’ll have links to the places where you can find Ryan online, on Twitter, LinkedIn, and also his blog. Also, if you’re interested in learning more about upcoming data talks, you can always go to and you can see past data science videos as well as upcoming ones. I want to thank Dr. Compton for his time. Ryan, thank you so much. It’s been a pleasure chatting with you and I hope you have a great rest of the week.

Ryan Compton: Thank you. It was great.

Mike Delgado: Great, thank you. For everyone who’s watching, thank you for watching today’s chat, thank you for the hearts and the likes and the comments.  We’ll see you all next week. Take care.

About Dr. Ryan Compton

ryan_comptonDr. Ryan Compton is the Head of Applied Machine Learning at Clarifai and previously served on the research staff at Howard Hughes Laboratories.

In 2012, Ryan completed a PhD in Applied Mathematics — which involved studying sparsity promoting optimization in quantum mechanical signal processing.

Make sure to follow our Dr. Compton on LinkedInTwitter and on GitHub.

Check out our other upcoming live video big data discussions.