Exciting Applications of Computer Vision Across Industries w/ Matt Zeiler @Clarifai (Episode 38) #DataTalk

Listen to the podcast:

Every week, we talk about important data and analytics topics with data science leaders from around the world on Facebook Live.  You can subscribe to the DataTalk podcast on iTunesGoogle PlayStitcherSoundCloud and Spotify.

This data science video series is part of Experian’s effort to help people understand how data-powered decisions can help organizations develop innovative solutions.

In this week’s DataTalk, we chat with  Matthew Zeiler, Founder and CEO of Clarifai — an artificial intelligence company that excels in visual recognition, solving real-world problems for businesses.

Here’s a full transcript:

Mike Delgado: Hello, friends. Welcome to our weekly #DataTalk, a show where we talk with data science leaders from around the world. Today, we’re talking about exciting applications of computer vision across various industries. We’re super excited to have Matt Zeiler Zeiler, who is the Founder and CEO of Clarifai, which is an artificial intelligence company that excels in visual recognition and solving real-world problems for business. Matt Zeiler, thank you so much for being our guest today.

Matt Zeiler: Thanks for having me.

Mike Delgado: Can you share with our data science community how you started, where your love of data science came from and your academic journey?

Matt Zeiler: I grew up in Canada in a small farm town. I always got an appreciation for building things, working with my dad, building the garage, and pouring concrete and stuff like that. That’s what triggered me to take engineering in undergrad at the University of Toronto. They have this program called Engineering Science where the first two years you take every discipline of engineering and you get to choose which option you want to specialize in.

When I was deciding between options, I went for advice to the resident adviser on the floor. And he happened to be Geoff Hinton’s Ph.D. students. Geoff is super famous in the world of AI. A lot of people call him the Godfather of AI, and he was a UIT professor at the time. And his student showed me this video, a flame flickering, and he said it was completely generated by a neural network.

That’s a type of artificial intelligence. And I was blown away, because I knew how to program, but there was no way I could write loops or functions or anything like that to generate a realistic video. I had to learn more. And that took me into the computer option. I ended up taking Geoff into his course in third year and then did my undergrad thesis with Geoff, which was quite the honor to get to work with him. And since I loved building as a kid, I knew I wanted to start a company someday, and now that I had a taste of machine learning, I didn’t know enough to build my company around it.

That’s what inspired me to get a Ph.D. I came to NYU to focus on that. And I got to work with a couple more experts in the field, like Yann LeCun and Rob Fergus. Very fortunate to learn from some of the experts before starting Clarifai four years later. The time frames here are 2005 to 2009 was undergrad. 2009 to 2013 was my Ph.D. So before this huge explosion of AI today, I got a taste of it very early on.

Mike Delgado: That is so cool, and I love hearing your background with farming, making things. Were you always a builder? Like tinkering with things?

Matt Zeiler: Yeah, and my dad’s not a farmer. My mom’s not a farmer. My dad’s a doctor; my mom’s a nurse. My brother is now a doctor. So my whole family was into medical. When I did premed, but what I didn’t like about some of the stuff that you have to learn around bio and chemistry was it was a lot of memorization. Whereas the physics and calculus was a lot of tools that you could use to build. And that’s what triggered me to shift into engineering.

Mike Delgado: That’s interesting. So, you said you were very fortunate to be under amazing mentors in AI and machine learning who taught you a lot. What was it that drew you to the Ph.D. program?
Matt Zeiler: Yeah, and I skipped directly from bachelor’s to Ph.D., so I didn’t have an intermediate master’s along the way. That was enticing. I wanted to do the deep dive into research and what you get out of the Ph.D. And at NYU in particular it worked out very good, because they didn’t add, or you didn’t have to do as a student, a full course load every year.

All you had to do was four courses for your entire Ph.D. I got those done in the first year, which allowed me … Even starting the first year on research, and the most important part about getting a Ph.D. is that focus on research. Because if you think about it, you’re expected to be innovative. And when you’re innovating, that by definition means that the stuff you would learn out of a textbook is already obsolete. It’s not at the, written on paper, so having that opportunity to do the research was really valuable in the Ph.D.

Mike Delgado: I’m not sure about the Ph.D. program you were in, but did you have to write a thesis? And what was that on?

Matt Zeiler: Yeah, I had to write a thesis. And it was on hierarchical models for understanding images. It was all focused on image understanding. We called it that kind of abstract term, one that people call supervised learning, which is great because your images have already been labeled by people. It contained dog, cat, tree or whatever it might be.

And then another branch, which is unsupervised learning, which is where you’re trying to learn directly from just the pixels, and there’s no labels provided by people.
And that is a much more difficult problem, because there’s no context. It has to learn it all on its own. And so I spent the first three years focused on that hard problem. And as soon as I started working on the more constrained kind of supervised learning, that’s when I got really good results really quickly, throughout my third year. And that became the basis for Clarifai.

Mike Delgado: That’s awesome. What types of imagery or video were you analyzing back in grad school?

Matt Zeiler: There’s a few standard benchmarks. One of them I missed, which is really famous, very old, from the ’80s or ’90s, I believe. And it’s handwritten digits, so the black background and the digits are in white. So it’s kind of binary, and there’s an extension of that to color, regular images called c410. Both of those data sets are very small images, which means that you can train smaller models and do experiments much faster. That’s really helpful when you have a new moonshot idea and you have no idea if it’s going to work. You can get feedback on that very quickly.

Then some of the more serious data sets, we came Cal Tech 101, which are regular-sized images that you can see as a human, all the details in. And then the most important data set was Image Net. And that revolutionized the whole field of AI really, because it was much, much bigger. We’re talking about 1.2 million images in Image Net versus 3,000 images in 101. So orders a magnitude bigger, and that people to do a couple things. One was train much bigger models to be able to learn from all that extra information and to optimize the frameworks we were using to be able to train much bigger models on much more data. And that’s where things like graphics cards became the standard.
There was no way you could do that type of heavy lifting with regular processors that we have in our computers. We needed to use the equipment that’s normally meant for video games to start tackling these bigger machine learning problems.

Mike Delgado: Today, we’re talking about how there are so many different applications for computer vision across different industries. And I think it’s interesting that both your dad and your brother are both doctors in the medical field. And I’ve been reading articles about how machine learning and computer vision are drastically helping doctors find things like cancer within MRI scans faster. Can you talk about what … because it’s just interesting to me that your dad and your brother are both in the medical field. You had a premed interest originally. I’m curious about your view of how computer vision’s being used in the medical field today, even though that may not be your specialty right now.

Matt Zeiler: We do have a customer in the medical space, and we’re doing what I think is really important in medicine right now, which is innovating on the hardware side of things. And we made a simple piece of hardware that goes on the back of your cellphone. It’s a little lens that allows doctors to see inside your ear.

That replaces a piece of hardware that’s a filing cabinet in size with your own cellphone. And it includes an app that lets doctors diagnose a disease in a nice touch interface. And what that has done is collect a bunch of data from doctors, from the experts, that Clarifai can use to train the model. And now that the model’s trained, they are automating that process. No longer does that doctor have to see every single patient. They can scale out that effort. They’re seeing incredible results on it. The most exciting thing is that with Clarifai’s STK, now we have APIs in the cloud that we host and run for you, but we also have an SDK that lets you run disconnected from the internet. Now they can take [inaudible 00:10:07] part of the world and provide medical care, which is super exciting.

To me, that’s more exciting than understanding MRIs and CAT scans, because those are million dollar machines that are the size of a room. Which are great, and I think there’s going to be lots of advances there, but they’re not the scale of the applications that we love here at Clarifai.

Mike Delgado: That’s so cool. Any example of the product that you worked on to help doctors with looking into the ear and finding problems? You had all this training. How much training data did you need before the model was working for you?

Matt Zeiler: In that particular case, there were 10 different diseases we were trying to distinguish between. And the total set of images that were labeled was about 75,000.

Mike Delgado: Wow.

Matt Zeiler: And when you think of having the device in a doctor’s hand, many doctors’ hands, where normal day-to-day would collect 75,000 images pretty quickly.

Mike Delgado: That’s amazing. So tell me about your company, Clarifai. When did you start dreaming up this company? And can you talk about this journey to starting Clarifai and then the work you are doing now and the companies you’re helping?

Matt Zeiler: As I was doing my Ph.D. and started working on these classification problems, at the start of 2013, I started having some really good results. And the way I knew they were accurate was the measure that we normally track as we’re doing experiments. But I built a simple demo where I could upload my own personal images, or images from the web, and form it into a demo and look at the results back from the model, and they were useful. And this was the first time I saw computer vision be meaningful and useful. That was in the spring of 2013.

I went back to Google for a second look at internships with Google Brain. I had a great learning experience, because I was on their machine learning team, led by Jeff Dean, and I happened to be an intern directly under Jeff. That was a great learning experience because he’s famous for scaling up a lot of the core infrastructure at Google, and he’s now the head of Google AI. I learned a lot from him, but I ended up leaving that internship two weeks early to come back to New York and start working on Clarifai.

I realized that the tech I had working at NYU was better than what Google had internally. And Google at the time was already ahead of everybody else in terms of AI, so I knew there was an opportunity to get this technology out there directly to develop this and enterprise this to let them have access to this cutting-edge technology. And not just let [inaudible 00:13:09]. That was what inspired Clarifai.

I started working on it in August and ended up getting a lot of job offers from Google, Microsoft, Facebook and Apple, because they knew I was graduating soon. And it was a tough decision, because, you know, those companies are great companies, great culture, and that’s risk-free with these offers that I had versus my childhood dream and technology I knew was going to be better than what they had. I followed my gut and officially incorporated in November 2013. Finished my Ph.D. at the same time; got that done. Then this annual competition around the Image Net data set is held toward the end of each year, and I submitted the results I was training my department of Image Net 2013. And about three weeks after incorporating, the results came out, and Clarifai won the top five places. It was a great way to kick off with the world’s recognition. And that triggered inbound interest from customers, investors, and that’s what really got Clarifai off the ground.

Mike Delgado: That is so cool. Tell me about the different types of companies, industries, you’ve been working with to help them with computer vision.

Matt Zeiler: Intentionally, we’re building a platform. We want everybody to get access to this technology in its rawest form. And that’s why we started with an API. Over the years we have evolved to alt, so it offer STKs that people can evolve directly into their mobile applications and even into unprimal servers now. And that lets us run the technology, but it doesn’t really cover two important things. What verticals do we target? And what use cases are people going to find useful from this technology? We’ve learned those over the years. We’ve been very fortunate to build a platform first, and because of early press and interest from key investors, we’ve had a lot of inbound interest from customers and educated ourselves on what they would find useful from our technology.

One example of those is organization. Companies like Trivago, for example, you see their hotel listing ads on TV all the time. They claim to have over 10 million hotels across the globe listed with Trivago. And each of those has many different images. And so if you’re planning your next vacation and you go with Trivago, maybe you want to find the hotel that has the best pools. Now that Clarifai has processed those images, we’ve organized them by pool and ocean view or bedroom shots so that when you’re planning that vacation, you can see those things and compare things much quicker. That was one organization use case.
On the visual search side of things, this is a new experience where people can upload an image and find visually similar images. And we just launched with homes.com in the real-estate space to offer visuals. You can upload a picture of a house you like and then find ones that are listed for sale that look similar. And then an extension of visual search is not just looking at that one image to find similar images. It’s to look at a group of images, or a history of images, to learn what the user cares about, their references in the world, and recommend things that they might be …

So that’s what West Elm is doing for furniture shopping. And they did this cool integration. You can Google and find it. It’s called Pinterest Style Finder by West Elm. They let you take a Pinterest URL, a board that captures your interests, and you drop it into the West Elm style finder, and it takes all of those images and processes them by Clarifai. And you select what room in the home you’re shopping for, like the living room, for example. And then you hit “Submit” and it comes back with products out of West Elm’s catalog that you might be interested in buying. Lots of fun stuff like that people are doing across the board and in different verticals.

That’s why when we think about who we work with, we think about the use case rather than the vertical. Because that use case is usually present across all different verticals or many different verticals at least.

Mike Delgado: That sounds so cool. I’ve definitely gotta check out the West Elm Pinterest visual search recommendation engine. I didn’t know that it even existed. I wanna make sure to put it up on our blog. \

Matt Zeiler: Awesome.

Mike Delgado: A lot of times when we talk about computer vision or the popular stuff that’s on the news about how computer vision is being used in cars, and as we move towards … They’re saying in the next 15 or 20 years, depending on how regulation goes, and as autonomous cars get better with computer vision, it seems like that’s going to be the future. What are your thoughts about, and are you diving into computer vision with vehicles?

Matt Zeiler: Unfortunately, we’re not. We’ve decided at a company level to not focus on that. We think there’s a huge potential there. Don’t get me wrong. I think it’s one of the concrete and powerful use cases of AI, and I think it’s going to change the way we do transportation. But I do think there’s a lot of competition in that space, more than any, probably because the outcome is very ex … I think there is so much competition that every car maker is going to have their own AI team trying to do this. There’s these startups that are all trying to even build cars, some of them, or offer some slice of that technology. And all of the tech giants are also rumored to be building cars, including Apple, for example. There’s a lot of rumors about them getting into the automobile space. I think it’s super crowded, but we’re interested to take some of the early rides at least.

Mike Delgado: What excites you about the future of computer vision and the work that you’re doing right now?

Matt Zeiler: Just seeing what people can do on our platform is already exciting. And that’s why I love having the platform of raw APIs and making it super, super simple for people to build things. And we talk about APIs a lot, but we have user interfaces around them so you can log in to your web browser and even if you’re not a developer, know nothing about machine learning, you can upload images and teach it how to recognize things that you care about. Some of the examples are things that we as a company would never dream of.

One that I love talking about is at a hackathon, and we go to a lot of hackathons to drum up these types of ideas. At a hackathon, in 24 hours a team took pictures from social media that cover shots related to baseball in some way, and then they trained a class fire on our platform to recognize whether a ball was being caught. And then based on all the pictures where balls were being caught, they looked at the GPS coordinates of those images, and then they created a key map over the stadium to show you where the best place to sit is if you want to catch a baseball.

Mike Delgado: That is so cool.

Matt Zeiler: It is something that is very practical and we never would’ve dreamed up. And that’s where we love our famous technology out there, to people. I think the next future iteration of that is to start learning from more than just pixels, which has been our focus from the start. Largely because that’s what I focused on in my Ph.D. But we see a big opportunity to start broadening understanding to texts, audio, and start fusing these understandings together. And we have some early research results that indicate that that’s going to be a really successful approach.

You can understand different types of things relating text and imagery together, for example. And a common use case I’d like to talk about with that is retail. Every one of us has shopped online, and we do the usual type into the search bar to find products. But when you go into those products, you don’t just look at the picture and then hit buy. You read the title. You read the description, user reviews; you look at the price, of course. A lot of pieces of data that Clarifai is looking at today, that we’re excited to start learning about in the future to provide a much more holistic understanding of the world. That’s where the future of computer vision and AI, more broadly, is going to go.

Mike Delgado: That’s really cool. Can you give me an example? Like, I’m on Amazon, I’m searching for something, and right now you’re saying that the computer vision technology you’re using is analyzing pixels of the imagery to determine what it is. But now you’re wanting to bring in other data sources, like texts, maybe audio, to help the computer learn more about that image. So, in a shopping experience, how could bringing all these things together help the searcher or the shopper?

Matt Zeiler: For example, if we wanted to recommend you products, we wouldn’t be able to tell the difference if we just looked at the picture between one size versus the other. A small, medium or large T-shirt or something. There’s no way you can tell from the picture because it’s probably the same picture used across all these products. But it’s all obviously in the metadata attached to that product; it very explicitly says the size. So, very simple examples like that. We can learn from a user’s purchases, what size T-shirt. That and your style, and we combine multiple pieces of data to do that.

Mike Delgado: Very cool. Matt Zeiler, when you’re looking to hire people on your team, what skills are you looking for? Or what personality types are important for people that are looking to join Clarifai?

Matt Zeiler: We’ve got lots of different roles, but I think most relevant here is the data science and machine learning type of backgrounds. We have two teams at Clarifai. One is called the research team; one is called the applied machine learning team.

Those are the two machinery teams. And research focuses on the problems we don’t have solutions for today, and that’s things like understanding texts. That’s a great example of a research problem. Whereas applied machine learning takes those solutions out of research and makes them production quality. That involves collecting more data, tweaking final parameters and then deploying them to production so customers can access them. And having that split is good because the research team is typically those with backgrounds in machine learning, and they typically have Ph.D.s. And we’ve had a lot of success with master’s and Ph.D.s with math and biology or physics types of backgrounds that are excited to learn more about neural networks. We do that in a more applied fashion. That’s the two backgrounds we look for.

In terms of looking at résumés, more broadly, one thing I like to see is stuff outside of academia or work that you weren’t forced to do but you’re good at. Because that shows that you have passion in something. And that’s one of the key attributes of any successful worker here at Clarifai. They’re really passionate about what they’re doing, and they have that inner fire and drive. That’s something that we look for in their backgrounds. As a more practical piece of advice, if you’re going through school still, I think getting internships really changed my career too.

Learning from Google engineers, and the whole process they have for engineering, made me come back after the first internship. I completely rewrite my research about bits. As I just became a much … that made me more capable. There was no way I’d be able to like it as much at Clarifai if I hadn’t done those internships.

Mike Delgado: This brings us back to the beginning, and I found it so inspiring that you had this dream to start Clarifai. You could’ve gone a very comfortable and fun learning route of going to, like you said, getting offers from Google, Apple, Microsoft, which could offer you great pay. You’d be at a great company, great culture; you’d be learning from other people. But in the back of your mind, you had this drive to start your own company, and maybe there’s someone listening to today’s show who has that feeling. What would you say to those people who are listening in who are like, “I have these offers to join these different companies, but I want to start something new”? Do you have any advice or any encouragement for them on learning from the path you took?

Matt Zeiler: A couple things. Find a mentor if you’re starting your own company, because there’s a lot of things you just don’t know. I knew nothing but research at the start. And that helps you in no way starting a company. So find a mentor who’s done that before. And the second piece of advice is make sure you actually want to start a company, that you’re not just excited about the technology. Because as it grows, you’re not going to be writing code every day; you’re not going to be developing the technology. You’re going to be leading the team and inspiring them and motivating them to innovate and do more of that stuff that you originally did. Make sure you like the idea of having a company and having that type of goal long term.

Mike Delgado: What skills are important for somebody to be a good leader of a science company? Someone who’s able to motivate but also keep a highly productive culture?

Matt Zeiler: It’s a good question. I think any entrepreneur needs to be super focused and organized. Because, for example, there’s so much communication and information happening every single day that if you don’t write it down immediately and have it put in a way that you’ll be able to find that information later, it’ll be gone. You know, no Matt how smart you are, it’ll be gone from your head 5 minutes later. That’s key; just staying organized is key. And then I think that a lot of the times leadership is best done by leading by example.

In the early days, you’ll have an opportunity to do a lot of different functions at the company, from doing the first sales calls to writing the first marketing website, to every division of the company, because there’s nobody else but you. You’ll have to do everything. Taking that early on and making sure that it’s really high-quality sets the bar for anybody a little higher and motivated to do things in the future.

Mike Delgado: Nice. Matt, thank you so much for being our guest on #DataTalk. Learned a ton about Clarifai. For businesses that wanna partner up or learn more about Clarifai, where should they go?

Matt Zeiler: Clarifai.com. It has everything, including our blog, which always has new information.

Mike Delgado: For those listening to the podcast, it’s spelled C-L-A-R-I-F-A-I D-O-T C-O-M. Check out clarifai.com. For those that want to learn more about you and your work, where should they go, Matt Zeiler?

Matt Zeiler: I do have a personal site as well. Matt Zeilerhewzeiler.com. Although, most of my career now is Clarifai. So much more updates.

Mike Delgado: Nice. Well, thank you so much for being part of #DataTalk. Thank you for sharing your story, your advice for data scientists and also for those that want to start up their own company. It’s been an honor having you as our guest. And looking forward to maybe chatting in the future.

Matt Zeiler: Yeah. Thanks so much for having me.

Mike Delgado: Thanks, Matt Zeiler. Have a good day.

About Matthew Zeiler

Matthew Zeiler, Founder and CEO of Clarifai, is a machine learning Ph.D. and thought leader pioneering the field of applied artificial intelligence (AI). Matt’s groundbreaking research in computer vision alongside renowned machine learning experts Geoff Hinton and Yann LeCun has propelled the image recognition industry from theory to real-world application. Since starting Clarifai in 2013, Matt has evolved his award-winning research into developer-friendly products that allow enterprises to quickly and seamlessly integrate AI into their workflows and customer experiences. Today, Clarifai is the leading independent AI company and “widely seen as one of the most promising [startups] in the crowded, buzzy field of machine learning.”

Founded in 2013, Clarifai has been a market leader since winning the top five places in image classification at the ImageNet 2013 competition. Clarifai’s powerful image and video recognition solutions are built on the most advanced machine learning platform, and made easily accessible via API, device SDK, and on-premise, empowering businesses all over the world to build a new generation of intelligent applications. Customers include OpenTable, trivago, Vevo, West Elm, Homes.com, and more.

Check out our upcoming data science live video chats.

To keep up with upcoming events, join our Data Science Community on Facebook. To suggest future data science topics or guests, please contact Mike Delgado.

Never miss a blog post!

Subscribe to keep up with all things Experian.