Join our Data Science Community on Facebook. Every week, we talk about important data and analytics topics with top data scientists on Facebook Live.
In this week’s #DataTalk, we talked about autonomous vehicles and computer vision with Dr. Inmar Givoni, who is the Autonomy Engineering Manager at Uber Advanced Technology Group. You can learn more her at inmarg.net
Here’s a full transcript:
Michael: Hello, and welcome to Experian’s weekly #DataTalk, where we talk to data science leaders from all around the world. Today’s topic is Teaching Cars to See: The Future of Autonomous Vehicles and Computer Vision, and we’re super excited and honored to chat with Dr. Inmar Givoni.
She got her Ph.D. in computer science from the University of Toronto, specializing in machine learning. She was also a visiting scholar at the University of Cambridge and did some graduate work at Microsoft Research using machine learning for Bing.com, which we can get to. And right now, she serves as the Autonomy Engineering Manager at Uber Advance Technology Group. So, no better person to talk to about autonomous vehicles than somebody who’s doing it right now.
Inmar, thank you so much for being our guest today.
Inmar Givoni: Thank you for inviting me, Michael. It’s great to be here.
Michael: And for those who are listening to the podcast, if you’d like to see a full transcription or watch the video, the short URL is just ex.pn/computer vision, and that’s also the place where you can find Inmar’s LinkedIn profile and her website. Inmar, what led you down the pathway to work in data science.
Inmar Givoni: Since about when I was in high school, I decided that I wanted to be a neuroscientist.
Michael: Oh, wow.
Inmar Givoni: I went to university with that passion, and as an undergrad I did a lot of related course work and talked to researchers in neuroscience and did summer research projects in many aspects of neuroscience. And my conclusion was that we don’t have a theoretical framework to understand it and I’m not sure how to fit in. But at the same time, I also took a machine learning course and I loved it. So as someone who at that point had several years of software development experience and really liked algorithms, it was a totally different way of thinking about how to solve problems, and it was a fascinating change of perspective for me.
It also made use of a lot of the method I studied and never connected to anything specific. So probability, theory, calculus, linear algebra, all of that had to be used for the algorithm. I decided that instead of working on understanding the human brain, I can work on building AI and building machines that have intelligence.
I moved to Toronto from Israel, and I attended school at the University of Toronto, specializing in machine learning, mostly unsupervised methods and core machine learning algorithms.
And as you mentioned, I did a couple of internships in research labs, and that allowed me to see what kind of impact machine learning can have on real-world problems and how nice it is that your work and code actually get shipped and deployed and used by millions of people.
I decided to go into the industry based on that, because I wanted to work somewhere I could see my work getting into the real world. And then I worked as a research scientist on various applications of machine learning, so e-commerce, e-reading and so on. And at some point, I went into leading and building things with a lot of focus on the aspect of taking machine learning from research into production. I worked on robotics systems, since I always liked real-world tangible systems and robots, and now self-driving, which is for me a type of robot.
Michael: That is so cool. Yeah, somebody at one of our data labs in São Paulo, Brazil, they have a hackathon. And they invited all these kids from the local neighborhoods to work with some of the data scientists at our offices, and they built a tiny autonomous vehicle. And they had little sensors all the way around it, and that thing will just go around the office. And it’s cool to see moving from theory and an idea to actually creating something tangible.
So early on you decided that as soon as you finished your Ph.D. program you wanted to move away from academia and start to do and create things that were practical and tangible.
Inmar Givoni: Absolutely.
Michael: Tell me about the work that you’re doing now, at Uber, using machine learning to help cars see.
Inmar Givoni: I have a job that I love right now. I work at this office in Toronto, which is an R&D office. There is a strong team of researchers here, which is led by Raquel Urkinson, who is also a university professor. And she has many years of experience in computer vision, machine learning, deep learning, and in the last decade she’s focused a lot on self-driving.
The team of researchers develops novel, state-of-the-art, deep learning–based algorithms for solving various challenges in self-driving. I lead the team that focuses on the engineering side of things. These are software engineers with a lot of depth and understanding of machine learning. Together, we plan on taking these prototypes and novel algorithms into full production. We’re getting them to run, honestly.
Michael: I’m curious about how you explain your work to your family. People like myself who are outside the field?
Inmar Givoni: One of the nice things about working with tangible real-world products is you can explain it less from the perspective of the technology and a lot more from the perspective of what is it actually doing. It’s kind of easy to explain self-driving cars. And then I’ll start to explain this main way we think about the main pipeline in self-driving.
A lot of times people talk about perception prediction and motion planning. Perception is basically … The car needs to understand all of the things around it and in particular the moving cars, the moving pedestrians, the other types of vehicles, like bicycles and motorcycles. We often call them actors, the technical term. It needs to understand all the actors in the scene. This is perception or detection.
Once that is out of the way, it needs to understand or predict where each one of these actors is going to move in the next few seconds or milliseconds. That’s the task of predictions or predicting where everything is going to be in the next little while.
Once you’ve got this, you need to motion plan. You need to decide where you want to go. You have some destination that you need to get to, but how do you get to that destination keeping in mind all of these different actors and where they are going to go so that it’s a safe trajectory. I usually talk about that, and I think it makes a lot of sense to people.
Michael: With computer vision, there’s tons of unstructured data. And I’m curious about all of the sensor data that is coming in. Can you talk a little bit about how much data is coming into the car to help it make decisions?
Inmar Givoni: The vehicle is equipped with multiple sensors. And for us, it’s not just the cars. We also have trucks that we’re working on. So, it has a whole bunch of sensors. Of course, many cameras that are pointing in different directions. It also has radar sensors. And it has a lighter, so that’s typically this big thing that you see on top of the car rotating around. And this is a laser-based technology that is used to map how far things are from the car.
So basically, these beams get sent and then they reflect off of or bounce back from surfaces when they hit them. And then based on computing that interval you can see how far things are. All of this is information that is very useful for perception, for understanding the world around you. And you can create algorithms that work on the sensors individually; you can create algorithms that try to combine them in some good way.
In terms of the structure side of things, when we think about structure in the context of, let’s say, images. Images are very structured. It’s a grid. And that makes it very easy for things like neural networks. Especially from pollution-only networks, because you have some filters that you move around the image and try to detect certain things. That’s at a very high level what CNN does.
What isn’t structured is the lighter information, because it’s basically a bunch of points in space which represent where things are. And depending on where you are, you have a different number of points, different numbers of these things bouncing back at you. That’s where we have unstructured data, and our many different breaks in the literature, and the research to make that type of data more amenable for a CNN-type network to work with.
One thing you can do is divide up the space into these boxes. It’s like three-dimensional pixels, and they’re called voxels, because it’s a volume pixel. And then you can count how many pixels are in each one of these boxes, and that becomes your input, and now again you have some kind of structured input. You can project the data from a three-dimensional representation to a two-dimensional representation. Imagine looking at all of these points from above and just flooding them all out. Then you see something from what’s called a bird’s-eye view.
There are various approaches to making sure that the data is structured. Here, structured is in the context of using a neural network.
Michael: I can’t imagine how much data is being pulled in every minute. Like you’re saying, there are people walking around. You have motorcycles, you have bicyclists, you have cars, you street signs, you have lights, etc. And the car has to be able to collect all this data and make sense of it to be able to know where to go so that it will avoid an accident and to get people to where they need to go in a safe way.
I was on a Reddit forum, and they were talking about autonomous driving. And one of the Redditors was asking how an autonomous vehicle handles or deals with street vandalism (e.g. if somebody creates a fake sign). Like putting a stop sign in a place that it doesn’t belong. And as a human, we would recognize that’s vandalism. And ultimately we want a car to have that human understanding.
Inmar Givoni: I have a bunch of thoughts about that. One is just from the technical perspective. I talked about this perception prediction motion planning. One thing I didn’t mention is that pretty much all self-driving right now relies very heavily on maps. And these are not the Google-style maps. These are very information-heavy, dense maps of the world that try to capture as much knowledge as possible as prior knowledge for the algorithms. And one of the things you can capture is where all of the traffic lights and traffic signs are and where all of the things that I expect to be there are.
One thing that you could do is say, “There is something here that wasn’t here before, and that’s a surprise.” And then you need to decide how you want to reason about it, what you want to do about it. But in terms of detecting it, you are able to say, “Something’s different.” There is another thing which is not a technical thing. It’s just something that I feel pretty strongly about, which is it’s not a game. If people are going to go out there and try and fool the cars or drivers in a certain way or try to mess up the signs, it’s actually from my perspective a criminal offense, because you’re risking people’s lives. Right?
Inmar Givoni: So regulation is not quite there yet. But it’s a form of hacking, and it’s a form of hacking that has very serious consequences. As society and in terms of government and so on, we need to start thinking about that and making sure that this is something that has legal and criminal consequences.
Michael: No doubt. We just got a question from Alex, who is asking if all cars are autonomous, theoretically there should be less accidents. Is that correct?
Inmar Givoni: From my perspective, my speculation is that this is correct. We’re in this funny situation where in the far future everything will probably be self-driving, and it will be less difficult technically. Then where we need to first get to, which is both cars and people driving on the road.
Michael: I’m excited about that future, where you can just get in the car and be able to get work done or watch a movie and … Because right now we have so many distractions, and there’s that big problem of people who are on their phones when they’re driving.
Inmar Givoni: Yeah, absolutely. A car doesn’t get destructive, it doesn’t get drunk, it doesn’t get tired. Right now there’s about 1.2 million deaths per year from accidents. Not to mention pollution, the congestion of cities, parking spaces. Most cars spend about 95 percent of the time not being used. So they need to sit somewhere and wait. And as you mentioned, most people I think would enjoy doing something else while they’re driving. But some people cannot drive — the elderly, people with disabilities and so on. And one more thing that I specifically care about — I love animals. And there’s a lot of roadkill. I just looked it up online because I didn’t know myself, and there is an estimate of about 1 million animals killed on U.S. roads per day.
Michael: Oh, my. Are you serious?
Inmar Givoni: That’s what Wikipedia says. So we can —
Inmar Givoni: Right? But I don’t think it’s far-fetched. And I think one thing that we would also be able to avoid.
Michael: No doubt. I didn’t even think about that. I was thinking about people, and vehicles, and traffic signs, but I wasn’t thinking about animals on the road. Sometimes people get in accidents over that and kill the animal, right?
Inmar Givoni: Part of the human death rate is because of accidents with large animals, absolutely.
Michael: An ideal situation would be the autonomous vehicle would see that animal in its path and be able to predict where it’s going to be so it could avoid it, right?
Inmar Givoni: Or slow down. I think a lot of it happens at night, where you can’t see the road up ahead. Things like lighter technology are very good for being able to see in certain conditions. Or high-resolution cameras that will still pick up some things even if with our eyes we don’t actually see.
Michael: I know that everyone working on developing autonomous vehicles is dealing with the trolley problem.
Inmar Givoni: I get asked that question a lot, so it made me sit down and think about it. And here’s my answer. I don’t think the issue here is the technology. I think the issue here is again going back to what is right and what is the policy. More than pointing to a technological problem, it’s pointing to an ethical problem, which has always been around.
This was studied in law classes, well before autonomous vehicles, just as an ethical problem. And we don’t have, as a society, an answer for what is the right thing to do when humans are involved. So if a human makes the decision and sacrifices some lives in exchange for others, we don’t have a mechanism of saying they made the wrong choice or they made the right choice. More than anything, this is revealing that we need to decide how we want to address this and what we think is the right decision. And then the engineering side of things is implementing it. But I think rather than leaving this decision to the engineers, it should be made as an ethical decision with experts in various domains.
Michael: I’ve seen some things online about a data ethics oath that data scientists would take, that whenever they’re working with data, they’re basically going to do their best to protect people’s personal data, handle data properly. Just like doctors take an oath to save lives.
Inmar Givoni: I haven’t heard about it, but I think it’s a great idea. I think we already do that because this is our value system, perspective, and where we want to be. Because we’re working. It’s both a great and exciting challenge, but also a great responsibility. I think everyone here is very aware of the fact that we need to keep our best minds and try to solve the problem so that it’s the safest and the best and take into consideration everything and not just the technical side of things.
Michael: When an autonomous vehicle is going down the road, can you explain what it sees?
Inmar Givoni: The first thing it sees is just a neural stream of sensory signal information. Every camera gives it an image which has pixels and their values. And every point is basically a point in space. There is an XYZ value to the point for how far it is. That’s the raw image. This is the same as with humans. This is what gets into our retina. And then there is processing to understand this sensor information. One thing you can do, which is pretty typical, is run an algorithm which is a type of detector. It will output these bounding boxes, or rectangular boxes around anything that is of interest. It will detect pedestrians, cyclists, cars, trucks and so on. And now we have the information of where in the range of the car these things are. And you can do the same thing for the lighter as well. You can detect these different types of things.
Another thing you can do is what’s referred to as segmentation. You can say, “Let me step back.” In this previous road that I described, you put a rectangle around anything that’s of interest, but there’s a lot of stuff you don’t put rectangles around. And there’s the question of, maybe there’s something interesting there, right? So you can say you want to be able to explain everything that you can see. You can have an algorithm that segments the image and says every pixel needs to be associated with something. This will be a tree, this will be the road, this will be the cars, pedestrians and so on.
So all of this information is the first level of seeing. And then when we talk about prediction we can say, “For every frame that comes in, I’m going to put these bounding boxes.” Then I’m going to look at the last frame, and I’m going to say, “It seems like this thing is moving at a particular speed.” Now I can say, “This box here is probably something that I know now is a motorcycle, and I predict that it will continue to move at this particular speed for the next 200 milliseconds,” or something like that. So now I have a notion of tracking of these different elements that are moving and where they are going. And then, as I mentioned before, the motion planning. So now I can say, “If I keep in my lane at this particular speed, and the car in front of me seems to be slowing down, that’s not good. So I actually need to slow down.”
Michael: That is fascinating. Have you ever been in an autonomous vehicle? Because I’ve never been in one.
Inmar Givoni: Oh, absolutely! We have a fleet of vehicles, and part of the process is for us to sit in the cars and get a sense of what it’s like. It’s a lot of fun. You feel a little bit scared at first, because you’re not used to it. And, is it really going to work? And after a while, you forget that you are in a self-driving car, and you start —
Michael: I would feel scared about letting the car drive me. But I guess there comes a point where you eventually get over that fear as you trust the car.
Inmar Givoni: Yeah, for me it was very fast. The first time I went in, I was looking over the shoulder. There is someone who sits behind the wheel and is ready to take control. So I was looking over her shoulders. But then within 10 minutes, it gets boring looking over someone’s shoulder. [And you just start doing other things. But it’s very exciting to see personally.
Michael: When do you think the majority of cars will be self-driving?
Inmar Givoni: That’s a million dollar question, or a billion dollar question.
Michael: I’ve seen some research, I think from Gartner, that was suggesting that it could be like 20 years from now where a majority of cars will be self-driving. But that’s kind of based on regulations and laws. There are all these other factors that are involved.
Inmar Givoni: It’s hard to predict. And it also depends on how fully autonomous you want them to be. I think it will be really hard to have 100 percent any situation, any weather, any terrain the car can take of it. Whereas if you restrict yourself to where we’re only operating in cities and so on, then the horizon is shorter. It’s just very hard to know, and I think the most important thing is to make sure the technology is safe. And so we keep working on it until it’s safe enough.
Michael: We’re coming to the end of the show, and I want to remind everyone that if you want to read the transcription, get links to Inmar’s LinkedIn profile, also her website, the short URL is just ex.pn/computer vision. Before we go, we always like to ask our guests a couple questions. And the first one, Inmar, is what is your favorite programming language?
Inmar Givoni: The most convenient one is Python. Just because it’s very easy to pick up. It’s very forgiving, and there’s a lot of packages and available code and examples for doing data science, machine learning and so on. But my favorite one is a programming language called Stanza. And the reason it is my favorite programming language —
Michael: I’ve never heard of that!
Inmar Givoni: It’s a new programming language that came out of the University of California, Berkeley. And the reason it’s my favorite is because it was written by a very good friend of mine. So you might want to give it a try. It’s similar to Python. The website is lbstanza.org.
Michael: OK, cool. If you can, email it to me, because I want to get it up on the blog so people can check that out. That’s awesome.
The last one is what is your advice for data scientists who are looking to get involved in machine learning, especially with autonomous cars or computer vision?
Inmar Givoni: There are a couple of good online courses that you can take. A lot of the massive online courses companies have introduction to machine learning, introduction to deep learning, introduction to self-driving. And I see a lot of people doing that and getting their hands dirty with both the data and the algorithms. It’s good to give yourself a tangible project that you’re interested in working through, and that really helps you decide on a task within this big world that you’re trying to get to.
So you say, “I know enough from maybe a beginner-level course to know what can be done with the technology, and I’m going to pick this project that I am specifically interested in, and I’m just going to try and put together all the different parts that are needed, and that will guide what I’m learning, what things I’m reading, what kind of algorithms I’m trying to implement or find existing code for.”
Michael: Awesome. Great advice. And for people who want to connect with you, maybe ask you questions as follow-ups, where can they reach you?
Inmar Givoni: You’re welcome to start at my website. It’s just InmarG.net, and from there you can find my email address as well.
Michael: Wonderful. I want to thank you so much, Inmar, for your time. It was fascinating talking to you, learning about your work. I learned a ton, and I’m just fascinated by the work that you’re doing, because you’re on the leading edge of where we’re headed as a society with self-driving cars. I can’t wait to have one. I think it would be great to be able to do other things in the car. And also reduce traffic.
Inmar Givoni: Absolutely. Thanks for having me. I’m also really excited to be working on it. I was talking to my grandfather this morning, and he is like, “When are you giving me a self-driving car?” Because he can’t drive any more, but he’s very interested.
Michael: Yeah, no doubt. Thank you so much for being our guest today. For those who want to know more about this #DataTalk series, you can always go to ex.pn/datatalk. That has a list of all of our shows, our podcasts and video series. And just so you know, next week we’re talking to Dr. Hong, who is the head of data science over at Etsy. We’ll be talking about machine learning and e-commerce. That’s next week. Inmar, thank you again. And I hope you have a wonderful weekend!
Inmar Givoni: Thank you. You too. Bye-bye!
Michael: Thank you. Take care!
Inmar Givoni is an Autonomy Engineering Manager at Uber Advanced Technology Group, Toronto, where her team’s mission is to bring from research and into production cutting-edge deep-learning models for self-driving vehicles. Prior to that she was the Director of Machine Learning at Kindred, where her team developed algorithms for machine intelligence, at the intersection of robotics and AI. She was the VP of Big Data at Kobo, where she led her team in applying machine learning and big data techniques to drive e-commerce, customer satisfaction, CRM, and personalization in the e-pubs and e-readers business. She first joined Kobo in 2013 as a senior research scientist working on content analysis, website optimization, and reading modeling among other things. Prior to that, Inmar was a member of technical staff at Altera (now Intel) where she worked on optimization algorithms for cutting-edge programmable logic devices.
Inmar received her PhD (Computer Science) in 2011 from the University of Toronto, specializing in machine learning, and was a visiting scholar at the University of Cambridge. During her graduate studies, she worked at Microsoft Research, applying machine learning approaches for e-commerce optimization for Bing, and for pose-estimation in the Kinect gaming system. She holds a BSc in computer science and computational biology from the Hebrew University in Jerusalem. She is an inventor of several patents and has authored numerous top-tier academic publications in the areas of machine learning, computer vision, and computational biology. She is a regular speaker at big data, analytics, and machine learning events, and is particularly interested in outreach activities for young women, encouraging them to choose technical career paths. For her volunteering efforts she has received the Arbor Award from UofT.
Check out our upcoming data science live video chats.