Listen to the podcast:
Every week, we talk about important data and analytics topics with data science leaders from around the world on Facebook Live. You can subscribe to the DataTalk podcast on iTunes, Google Play, Stitcher, SoundCloud and Spotify.
In this week’s #DataTalk, we talked with Favio Vázquez, Principal Data Scientist at OXXO, about ways to create intelligence with data science. Also check out his article on: Creating Intelligence with Data Science.
This data science video and podcast series is part of Experian’s effort to help people understand how data-powered decisions can help organizations develop innovative solutions and drive more business. To suggest future data science topics or guests, please contact Mike Delgado.
Here’s a complete transcript:
Mike Delgado: Welcome to our weekly DataTalk, a show where we talk to data science leaders from around the world. Today is a very special episode. We’re talking to Favio Vazquez. He is the principal director of data science over at OXXO. He has his Master of Science degree in physics, cosmology and data science for cosmology, and he also has his bachelor’s in science and computational engineering.
Favio, you have such a fascinating background in academia, especially working in cosmology, and then you made the shift to work in data science, outside of academia. Can you share a little bit about your journey, like leaving academia and then moving into data science?
Favio Vazquez: Of course. First, hi, everyone in the world who’s going to see this, and who’s seeing this live. I actually wrote up an article about this, like three months ago, because a lot of people have this question for themselves. Like “How can I shift into data science from X?” A lot of people are actually coming from academia. I was interested in very specific parts of cosmology related to computation. This happened in 2008 or 2009. I started learning programming languages, like C++ and FORTRAN, and all these things to do science in a good way, because these are languages you learn when you do science. Then I discovered statistics. This is a very important thing for cosmology, too.
One thing led to another, and I discovered Coursera and edX and all these web platforms. I was fascinated with the different topics you can cover, or learn, in these platforms. I think the path from statistics to data science was not that weird, for me, in those platforms. What I love about data science is that it’s a very wide topic.
And this is what I liked about cosmology before, because cosmology is about everything in the universe. Data science is about everything in the real world, because we have marketing and retail and health care and all these different things. I realized that my years learning calculus and statistics at university were very applicable to data science. What I learned to understand was new language, like Python and R. When I discovered R, I was like, “Wow, this is an amazing language. No FORTRAN anymore.”
So I think my passion for computation and programming took me to data science.
Mike Delgado: When you were in school, you were highly invested in cosmology, studying the universe. Tell us a little bit about that, because obviously that gave you a solid background to move in to data science.
Favio Vazquez: When I was 18 or 19, I was really interested in the universe, like astronomy, because that was the thing that I knew. And astronomy is not the same thing as cosmology. Astronomy is looking at the universe and the sky and trying to find patterns. Trying to understand the information about galaxies, the dynamics of the planets. Cosmology is a deeper level. It’s trying to understand the theoretical background of the universe — how it was created and the fundamental equations for describing gravity and how a system in years can evolve. Thermodynamics, hydrodynamics. It was very interesting to just read about this vocational, or easy, way.
And then when I was like 21, I really started to understand general relativity, tensors and differential geometry, and that was like, “This is what I want to do in my life, because it’s so hard but so interesting at the same time.” I think my main goal was trying to understand what time is. I think this is a very complicated question. And the origin of the universe. This was my main goal, about four or five years ago. Then it took a big shift to data science, but I still read cosmology. I still read papers on general relativity and all these different things. My master’s, which I finished that last year, was on cosmology.
Mike Delgado: That is so cool. So tell me about your transition, because obviously you could have gone on to get your Ph.D., or maybe you will at some point. But when you decided to leave academia for a while and go over to OXXO — Tell us a little about that transition. Or actually, before OXXO. Tell us about your move over to work in the public sector.
I really got interested and started learning SPARK. SPARK has been a really important part of my learning data science. And I actually started contributing in the community. I had some comments to approve in the SPARK source code, and that was like a goal. “Wow, part of the code I use for SPARK is mine!” So that’s very interesting.
And then, when I came here to Mexico, I really thought after finishing. Like six months before finishing my master’s: ”Should I do a Ph.D. right now? What should I do?” The thing is, I’m really comfortable with technology right now. Like today, I already know the languages that people are using, the frameworks, what’s state-of-the-art for machine learning and stuff.
And when I thought about doing a Ph.D. in physics, because that was what I was going to do at some point, I thought, this is four or five years of my life. When I finish that, I may be lost in the space of technology, and I’m trying to make an impact in the world. Trying to make it better in some way or another, and it was easier to do that with science right now. With physics it would take like 40 years. Because there is a very different spectrum for doing things for the world.
But with data science, you are with companies, and that’s it. In four months, you have a model that is deployed into production, and you’re changing lives right away. I think that was the thing that finally convinced me to do data science right now and maybe see what that brings me in the near future.
Mike Delgado: That’s so cool, hearing your story in academia and then moving into the workforce and your goal to help improve our world with data science. Now today, we’re talking about creating intelligence with data science. Can you talk a little bit about what you mean by that? Because when you suggested that topic, I was like, “I’d love to hear your perspective.”
Favio Vazquez: I’ve been working for companies for like four years now. What I’ve seen is we have parts that we want that aren’t exactly joined together. And we have big data, we have AI, we have data science, we have business intelligence people, and we have data analysis. This is a broad spectrum of people doing things that, of course, they have a goal.
It’s trying to improve the company, and it’s sales, or whatever. But then I started thinking: What is intelligence, actually, and why are we calling our field a data science? Is it a science? And then I got started with artificial intelligence. Why is it artificial? All these different things came to mind, and I started doing some research.
Last year, I came into a conclusion. The only way we can achieve artificial general intelligence — and by this I mean, trying to create machines, software and hardware, and all the different tools for technology that mimic humans in the way we think, we reason, we see, we understand things — is by joining together big data, AI and data science.
We need big data, because it isn’t only about a large amount of data. That’s a very tricky concept for people, because when they hear big data, they think, “Yeah, we have a petabyte of data,” and that’s it. It’s not that. Big data is about the technology we use to analyze the data. It’s about how we storage it, and the methods and the theory behind all the different frameworks we use to analyze this data. I think that in the near future people will not say big data.
They will just say data, because how much is big? This concept of “big” is evolving every year. Years ago one gigabyte was a lot; right now it’s nothing. I think it’s closer to that. Big data, to me, is one of the catalysts to get into an AGI — artificial general intelligence — because with more data, better software and better tools, we get better models. And when I say model here, I need to talk about AI.
I talk about AI in a way that lets you think of it as software or machines that can think or do things like a human. But in here, you need to think about what we actually do. I’m going to complete this when I talk about data science, but what we actually do as humans is create models. And when I say a model here, it’s an extraction of reality. This is what we do. Because we can’t study reality. And we can’t study the truth of the world. It’s not science. Science is not looking for the truth. We’re looking for knowledge. This is very important to understand, because we are not looking for the final truth of the world.
This is what our model is doing, trying to track a lot of different complex information around us. With assumptions — this is very important — we make assumptions, and then we can create a simplified version that we can understand and act on. AI is doing that for us. AI is machine learning. Is deep learning right now. It’s not the same. That’s something people must understand right now.
Deep learning and machine learning is just one of the steps we’re taking into AI. AI is more broad. When you talk about deep learning, it’s like the state-of-the-art of AI right now. This is the only way that we are seeing that we can do an advance in the field, but we don’t know what the end is. We don’t know what’s going to be the end singularity for creating AI.
The last part of this formula, big data plus AI plus data science equals AGI is actually data science. Why data science? Because this is the science behind everything we need to achieve here, to construct this AGI. Big data is the catalyst.
In AI, we create models. In data science, we can actually create a methodology and a system to systematically transform this model, this data, this information into actionable things, like the stations or the software or maybe a system. Data science here acts like the science behind the intelligence we want to create.
For people asking about what intelligence is, I really recommend that they take a look at the complex from Lex Friedman. He’s a lecturer at MIT, and he defines it in a very simple line. He says, “Intelligence is the ability to accomplish complex goals.” And I have a post about it; maybe I’ll share it in the end of this, or maybe you can look for it afterward. What I’m saying in the post is, what is complex? Complex is just a mix of parts that together form a thing that is harder to understand than one of its small parts.
We as humans have models to understand these complex things. Understanding is just the way to transform this complex information into simple information that we can act on. This is our workflow as humans, and we are trying to mimic this with AI. But I think AI alone cannot be the solution. We need big data and data science. This is why the topic is creating intelligence with data science, because it’s the final step with big data that will transform this area into a real thing. Into real models that can actually mimic the way a human brain works.
Mike Delgado: For those who are listening to the podcast, Favio has written extensively on this topic. He just mentioned some of his articles, and I’m going to put links on to the Experian blog. If you’d like to see those links, the short URL is ex.pn/datatalk46. It will have links to Favio’s profile on LinkedIn and other places where you can see his work.
Favio, you talked a little bit about the different assumptions that can sometimes lead to problems, biases. How do you deal with that when you’re working on projects?
Favio Vazquez: So, when I mention assumptions, they are not specifically meant to be bias. But that can happen. You can have bias as a human. We have things in our head before we see something. Our culture defines that. Right now we’re very far away from having machines that can have these kinds of things.
What we have right now — when you’re creating a model, or machine learning or deep learning, the assumptions you make are from the data you have. The data you have will tell you about the assumptions. Like, I’m going to assume that these two variables are linearly correlated. Or, I’m going to assume that there is missing data here, and I can impute it with the means of all the other variables because the population won’t change that much. Or it can come from the business.
This is important to talk about, because a lot of people who I have met think that right now it’s more important to have experts in the area, and to have an “expert” mind behind every model, because they will understand the business extremely well. They can do inputs for the model. That’s creating a bias in the model. This is what we don’t want to do in machine learning or deep learning. We want the machine to read and understand the features that describe the data.
Right now, that’s very easy with deep learning. It was kind of hard to do with machine learning, because we need to feature transform everything. But right now, with deep learning, it’s easier to grasp the features that define, or the most important parts of, some data without you actually understanding which part did those parts. We need to make assumptions to make a model, because this is the only way we can grasp reality. We have no other way. First, this assumption needs to be consistent with the business. This means that if you have an age of a person, it cannot be minus 40. So, you cannot say, “I’m going to assume that this person is minus 40.”
Mike Delgado: Yeah.
Favio Vazquez: Secondly, you need to have objective assumptions. There must be an intuition in the model, but the intuition needs to die when the model performs better than the baseline. Baselines here are intuition. I’m going to think that I’m going to put this store here, because I have a very similar understanding of a different neighborhood that works something like this, because I heard of it from a neighbor.
And I’m going to put it here because of that. That kind of assumption is not verified with data. Intuition must be verified by your data. Only intuition will take you to business intelligence, in the old way of doing things. With machine learning and deep learning now, we want to build AI, we need to pass that. We need to take steps above that and try to create objective assumptions.
Mike Delgado: I am curious: What are some of you favorite examples of AI in use today?
Favio Vazquez: There are a lot of things that impress me in the AI world. One of the most important ones for me is healthcare. I created a blog about a month ago about detecting breast cancer with deep learning. These are some of the things that really matter to me. Trying to make things accessible to people, because right now, we think that these things can only be done with extremely big and expensive machines, in only two or three countries in the world. If you have this problem, you need to go to maybe the United States and spend $40,000 to get AA deduction. Right now, what I think AI can do for healthcare is give the power to the people.
Of course, you won’t be your own doctor — that’s not what I’m saying here — but being able to do very good research for your problem with some understanding.
What do I mean by this? When I created that blog, I used deep cognition. Deep cognition is a platform for doing deep learning in a very easy way without coding. You need to know code. And people can actually do this kind of — I know this is for other things right now, and of course, this is better than just searching for your symptoms on Google and concluding that you are going to die tomorrow. There are companies charging thousands of dollars to do this kind of thing right now.
Mike Delgado: Yes.
Favio Vazquez: I’m not saying it’s not OK to make profit in innovation. That’s great. But I think AI is getting a lot closer to a democratization of the way we can access the data and understand it. With AI in healthcare, I think it will be easier for smaller hospitals in rural areas to have access to technology that was very expensive before, without buying a really expensive machine.
They can just have a person who can do deep learning. They have the data there, and the tissue data, and you can just predict a lot of different types of cancers right now with deep learning, and diagnose these people, and make it easier for people to get an appointment with a doctor. I think that’s very important. And that will take me to the second thing that I think is very important for application, which is creativity.
When I say creativity — Right now, we’re seeing a change in the software that can write a novel, write poetry, create a song, edit photos or fake a video of someone speaking. If you search for deep fakes, you’ll see that. I think this is important, because intelligence is not only understanding. It’s about creativity, too. There’s no way that we can say a machine has achieved AGI if it’s not creative, if it cannot create something very interesting from nothing. We as humans have the ability to see something mundane and transform it into art. This creativity part of AI is giving a lot of researchers tools to think about what can be done with AI, in ways we didn’t think before.
The last thing I relate creativity to is home and lifestyle. When I say lifestyle, I mean the way we live our lives right now. I think AI will improve, and I say this because a lot of people are very scared of maybe Terminator or these kind of things happening to us. But I don’t think this is the way things are going to happen. We are not using AI all the time. We use it to recommend music, movies, stuff to buy. Maybe in the near future it will help you cook your meals. Right now it can optimize the way you do your exercise or you actually take care of yourself. I think these three parts — and there are a lot of different applications — are really impressive, and I think they will change the way we live our lives.
Mike Delgado: That’s awesome. You mentioned creativity. I’ve never heard anybody talk about machines being creative. Are there any examples that impress you about an AI developing something creative out of nothing?
Favio Vazquez: Yeah, “out of nothing” is weird. Maybe we shouldn’t be talking about out of nothing, because we as humans cannot create things out of nothing either.
But from very little. What is interesting is that they can search for GANs — generative adversarial networks. This is something created by Ian Goodfellow not that long ago. These kinds of networks for deep learning can create an image from text. They can see a video happening and transcribe what is going on live. Like a person just entered the scene, and he just sat, and now he is throwing a ball. That’s great. And music creation. I don’t remember the number of the paper, but something like Bach, Bach something.
It is an AI using recursive networks to create music. We have transfer style from photos. It’s very interesting to see the mix of AI and art, because one of the things that defines us as humans is that we don’t only live from eating and drinking water. We need more things. We need love. We need art. We need music. We need our culture to just define us as humans. Because if we don’t have that, we’re like animals. I think it’s very interesting to see what will happen in the next years for art and AI.
Mike Delgado: That’s so awesome. Favio, I want to thank you for being our guest, talking about data science and creating intelligence. This has been a fascinating discussion. Where can others learn about you? And also, can share your idea about doing some episodes in Spanish to help our Spanish communities with data science? I think that’s an awesome initiative.
Favio Vazquez: I’m very active on LinkedIn. If you just follow me there, you’ll see that I post like four things a day. Normally I’m a very active writer. I write for some publications like [inaudible 00:29:50] and Towards Data Science and DZone. You can find me there, too. The first step to getting to know what I’m doing is LinkedIn. I’m launching a publication medium called Ciencia y Datos. It’s “science and data” in Spanish. If you are a Spanish speaker, or you want your audience to know you in Spanish, too, you can just go there and submit your article. In the future, we’re going to be doing these kinds of talks or podcasts and more things in Spanish. So, you’re invited.
Mike Delgado: That’s awesome. For those who want to connect with Favio, I highly suggest that you follow him on LinkedIn. He is doing fascinating work and sharing lots of awesome content. You can either look up Favio Vasquez, or the direct link on our Experian blog is ex.pn/datatalk46.
We’ll have a video posted as well as the podcast and full transcription of today’s episode. Make sure to follow Favio. He’s doing amazing work. I’m really excited about the work he’ll be doing to help our Spanish communities, because we need more content out in Spanish. Favio, thank you for everything, for sharing your heart today about data science. Hopefully we can have you back on the show real soon.
Favio Vazquez: Thank you for the invitation, and I hope everyone liked it. See you soon, guys.
Check out our upcoming data science live video chats.