How to Become a Data Scientist w/ Beau Walker @BeaujWalker #DataScience (Episode 9) #DataTalk

Listen to the podcast:

Every week, we talk about important data and analytics topics with data science leaders from around the world on Facebook Live.  You can subscribe to the DataTalk podcast on iTunes, Google PlayStitcherSoundCloud and Spotify.

In this #DataTalk, we talked with Beau Walker about ways to get started in the data science industry.

This data science video and podcast series is part of Experian’s effort to help people understand how data-powered decisions can help organizations develop innovative solutions and drive more business. To suggest future data science topics or guests, please contact Mike Delgado.

Here is a full transcript of the interview:

Mike: Hello, and welcome to Experian’s Data Talk, a weekly show talking with some of the smartest people working in data science today. I’m very, very excited because we’re talking to Beau Walker. He’s a data scientist with a JD in intellectual property law. He has a master’s degree in ecology and evolutionary biology, and he also did undergraduate work in biology over at Brigham Young University. Beau, thank you so much for being our guest today.
Beau: Great to be here. Thank you.

Mike: You first caught my attention because of your outstanding work helping other data scientists. You’re a mentor. You work with a lot of different organizations. You do data science boot camps at UCI. And then I started seeing that you’re doing live Q and A sessions on Instagram live. I said, “I’ve got to meet this guy.” You’re just doing tremendous work for the data science community. I love it.

Beau: Thank you. One thing that’s really neat about the data science community is it’s very open, a lot of very helpful people. And there’s a great group of people on LinkedIn especially who are helpful and very receptive to helping people get started in it. And a lot of us who are data scientists have a background that isn’t specifically data science. So I especially love to help people transition into the field.

Mike: Yeah. It’s interesting because your background is in the sciences, and then you move over to law and get your Juris Doctor in intellectual property law. And you would think your next step would be to become a full-time lawyer, and you took a different route. Can you tell us about your journey?

Beau: Absolutely. Whenever a recruiter has looked at my profile, they get confused about what this path means. But to give some context, kind of a two-minute overview, my dad’s a marketing guy and an entrepreneur. Some family businesses are the family restaurant. Ours was marketing. I worked in a bunch of different startups or helped him. I’ve done all aspects of marketing. And he had a couple patents and grew in that environment.

And when I got to school, my interests were in science. Originally started premed, studied biology and then got involved in some great research labs doing scientific research and analysis. And that’s where I picked up my core data science skills. My master’s thesis was developing computer vision methods for measuring erosion in the desert in Utah. So I had to learn Python, MATLAB, R, advanced statistics and machine learning methods just to solve this problem.

But at the same time, I was contemplating whether I should continue in academia. Having grown up with an entrepreneur, I had this love of business. And in academia, the impact of your work sometimes is years, depending on what you’re doing. But in business the results can be … You can have instantaneous or a lot more impact. So I took a job as a data scientist for a marketing consulting firm. And this was back when the term data scientist was relatively new — back in 2011, 2010. And I was using all the analytical and programing skills I used in my master’s in a business context and that was really exciting.

But, at the same time, I wasn’t sure about what data science was as a profession. I was contemplating. I love business. I love science. Patent law seems like a great combination of that. I’ve always been interested in inventions and creating new things, so I went to law school, and it took me about a month in law school to realize how much I missed data and programing. So I immediately started taking freelance data clients.

At the same time, I was committed to law school. I got a job at a law firm drafting patents — in biotech, data science and other industries. And a cool thing happened to me while I was working there for a couple years. One is when inventors would come in with an exciting new thing. I was jealous that I was just writing about it and not helping them build it or coming up with it myself. The other thing is, I’d be sitting in contracts class and instead of taking notes I’d be programing.

I did well enough in law school. I didn’t fail law school or anything, but I just started to realize this is where my interest and my love are. The things I read for fun are not legal blogs; it’s all the data science blogs. I love hearing about new machine learning methods, and the stuff I would do in my free time would be data science projects. So when I had the opportunity, I made the jump back to data science full-time and haven’t looked back, and I’ve loved it. That’s my story.

Mike: That’s awesome. Well, it’s not only that you love data science and are passionate about it, but you’re also helping others and mentoring others through the different organizations that you work with, through the boot camps you hold at UCI. I’m curious about what drives you to be a mentor to help upcoming data scientists.

Beau: One is just my own kind of weird background getting into it. I see other people who maybe don’t have a master’s or Ph.D. in statistics or math, but they’re really interested in data and interested in the field. And I happen to believe that diverse backgrounds are a huge benefit to data science, that so much of data science revolves around the application of your analysis and math in a business context or in the context of the field. There’s so much benefit from having a diverse background. Part of that is just me wanting to help people who are like me or who come from a different background.

The other is I’ve always loved teaching. I really enjoy that. I think that’s why I initially considered academia. It’s really rewarding. I learn a ton from helping other people. And I feel like I, in turn, have been helped by a lot of mentors and other people in my career. I love to pay it forward.

Mike: I love that about you, Beau, and it’s so cool you’re doing that. You’re living that out, paying it forward, and it’s just beautiful to see. Two of the biggest questions we have in our data science community on Facebook are, “How do I get started? How do I become a data scientist?” And this is one of the things you deal with on Instagram Live. You’re answering questions. And I’m curious what sorts of road maps you provide people on how to begin that process.

Beau: I think the very first thing you need is curiosity. The best data scientists I know are driven by curiosity. They love to solve problems. They’re curious about the world, and they may want to build things for people. So, that’s the first prerequisite, and I think most people thinking about getting into data science have that, or else they wouldn’t be thinking about it. You shouldn’t jump into it because you think there’s great salaries, which a lot of times there are. You should jump into it because you’re passionate — because it can take a lot of work.

The next thing you need is an understanding of programming and statistics. Those are two very important parts of what a data scientist does, of really unlocking the value of an organization’s data in a way where they’re producing insights or they’re producing some kind of predictive model that artificial intelligence, or something the business can use to actually function better. There are a lot of ways to get that knowledge. There are so many options — from boot camps like I teach to online courses to free courses. Very recently, a lot of universities are offering data science tracks, either as part of undergrad or a master’s degree in data science.

People always ask me, “What course should I take?” Or they’ll send me a link to whatever online course and say, “Is this a good one? Should I take it?” What I say is what really makes a course good is that you can take something from it and actually apply it to something. The best way that I recommend figuring out what skills to learn is to figure out what you’re interested in or the domain you’re interested in. So, say you’re interested in self-driving cars. You really want to help build that. Figure out what you need to know to be doing that. And then base your courses off that. If you have a problem-solving approach to learning instead of the other way around, it makes it a lot easier.

Part of that helps, because data science in any organization can be drastically different. What a data scientist does in the financial industry may be way different than what I’ve done in biotech and healthcare. And the underlying principles may be the same, but in terms of the tools we use, and even the methods, those can change. So it really helps to have an idea of where your interests are and what your passion is.

Mike: Beau, what type of time investment would you say someone needs to have to really develop their skills to become a data scientist? Because I’ve looked at different online programs, and I’ve seen the nine-month course here and there. Of course, you can go the academic route and maybe pursue degrees in statistics and take all the math classes you need. But what type of time investment would you say someone who’s really passionate, who really wants to pursue machine learning, what would they need to plan out as far as their schedule?

Beau: I think it depends on the route you feel would work best for you to learn the material you need. So much … Yes, so that is a really good question.. But, there’s so much information online, just freely available, that can help. But if you don’t come from the background, it can be really hard to pick that up on your own. Boot camps can be great for giving you a general overview of things, and those usually range between three months to nine months, like you said. That can be a significant time commitment. But there’s also some things that if you’re already a programmer, and those things come really naturally to you, you could probably come up to speed on some things a lot quicker. So it really depends.

Given the complexity of learning programming languages, it’s just like learning any other language. It takes time. It takes a lot of practice, a lot of coding. So it’s gonna be a time investment. How much really depends. I would say for me, it’s taken seven years and counting. So that’s one thing I love about the profession: it’s constant. I’m always learning. I always have to learn. And I still consider myself an aspiring data scientist. One of my friends, Eric Weber, who’s a data scientist at LinkedIn, had a post about this that said all data scientists are aspiring data scientists. It’s an idea that the field, the way that it’s moving, moving so quickly, we all have to be always learning.

Mike: Yeah. I like that. And I think that’s for anybody who’s in a serious profession. We’re always gonna be students of that profession, whether it’s marketing, whatever. So there are lots of different roles within data science. There’s data analyst, people who do data mining. Can you talk about, broadly, when someone says they’re a data scientist or they work in data, what types of roles are there? And maybe what types of skill sets would be appropriate for that type of role?

Beau: Okay. This will vary by organization, by industry. Data science and the whole field are new enough terminology that it’s hard to pin down exactly what people do. You may be put in a data analyst role that has the title data science and not being doing data science. But people disagree about what exactly that means.

The way I would answer this is to walk you through what a typical data pipeline looks like. At a very high level, you start with data. You might have to go out and get that data from somewhere. You have to do data collection. The data needs to be prepared in a way that you can analyze it. You need to perform an analysis on it, a simple analysis and generally what’s going on. It may be using advanced statistics, and it may be trying to predict something in the future, which would be more advanced, like machine learning.

From that point, typically, you either prepare a report or visualization or something based on what you discovered with the data, with the intent the organization can act on it. Or you put your model into production and work with engineers to actually build it as some sort of product.

For example, if you’re in the credit risk industry, data scientists would build a predictive model to say, “Should we loan money to this person or not?” And once they build that model, they build it into their system so that decision is made automatically. That’s the general process of taking data — whatever form it is — turning it into something where you can actually get insights from it and doing something with it.

So data wrangling, data preparation tends to be more on data engineers to focus on that area, that data storage and getting data ready for analysis. That tends to be what data engineers focus on. A lot of data scientists, myself included, depending on the project, have to spend a lot of time in that area. And in my experience, data analysts go through that, but they’re more focused on basic insights and not necessarily into the deeper kind of machine learning methods. Data science as a rule is considered focused more on prediction and on the future than just retrospective.

Again, these definitions are all very fuzzy. There’s a lot of debate on what it actually means, and it can vary by organization. But hopefully that gives a basic overview of what a typical data pipeline is like at most organizations.

Mike: Yeah, I like the way you fleshed that out, because the term data scientist can be so broad. There are so many roles at play. And certainly in the roles you’re playing, oftentimes you’re shifting back and forth between being an analyst, doing data preparation, and then you’re also doing prediction. So you’re doing all of it, but certainly there are specific roles that are doing just one or two things.

Beau: Yep.

Mike: If you were hiring a data scientist, what are some things you look for as far as personality traits or strengths you’d want them to have?

Beau: This is a great question. A couple months ago I hired a data engineer who I think has some of these characteristics. The biggest thing is that curiosity. I think technical skills are important, but what I’m looking for more, instead of saying, “I have this certification in Python,” I want to see that they’ve done something with it. So portfolio is huge. But even more important than that, knowing that they, if for some reason I need them to learn a new programming language or new method, they can do that. They can go out, figure it out and learn what needs to be done to solve a problem. For me, that’s the most important skill set.

Data science always throws you curve balls, and so the ability to recognize this is a problem that needs to be solved, and to have the ability to go out and say, “This is what I need to learn to figure out the best way to solve this problem.” And the way that things are evolving, the best way today may be different than the best way in a year from now. So that’s the most general characteristics of what I’m looking for. Someone who has that ability to go out and learn what they need to to solve a problem and built into that is programming skills and familiarity with dealing with data and other things like that.

Mike: Yeah, and that goes back to what you were saying earlier about being a student. That’s gonna be important, because things are always changing. Methods are always changing, and you have to be willing to adapt and maybe learn new languages, learn new ways of doing some things.

Beau: Yeah. I was talking with one of my friends, Ben Taylor, who’s a deep learning expert in the field, a couple months ago, and we were talking about this issue. He said he’s learned he’s way better at Googling than most people. I think that is absolutely an important characteristic for a great data scientist. I am really good at Googling. I’m really searching stack overflow, and figuring out if someone has solved my problem maybe in a different domain before and getting an answer that I need. I love that answer, but it’s absolutely true. If you don’t know how to frame a question and how to find to an answer, then you’ll have a hard time in data science, because there are so many unanswered things. But that’s also part of what makes it exciting.

Mike: Yeah. I love that. Become a good Googler.

Beau: Yep.

Mike: So, Beau, for somebody like me who’s not in the data science field, I’ve taken one computer science class in college, and I barely passed. My professor was gracious with me. But that one class taught me that I have now the utmost respect for any data scientist. I just took the visual basic C++ class back in the day. But I was like, “Oh man. People who work with data, hats off to them.”

Beau: Well, C++ is a hard language, so that could be part of it.

Mike: And we had to handwrite out the code on the final exam. And I was like, “Ugh. I can’t even test this.” But anyway, what advice do you have for those like myself who aren’t in data science roles? With this new age of AI, more and more companies are leveraging data. Certainly I’m thinking in the future we’re gonna be working with some sort of voice assistant or chatbot to help us. What types of things do you think we should be doing now to prepare ourselves, even though I’m not a data scientist, but things just to maybe learn to prepare myself to do well in the future?

Beau: As a consumer of these AI products or as someone who wants to be a part of building them?

Mike: Actually, as somebody who’s consuming, who’s using them for work.

Beau: Okay. I think one important thing is the ability to cut through the hype that surrounds these AI things. Computers are actually pretty dumb. This is an old programming thing. They only do exactly what you tell them. To an extent, most AI, especially right now, is limited to the data you feed it. So I think that kind of an understanding of what the limitations of AI are.

The reason self-driving car companies — they have really complex simulations to simulate driving, and they have closed … I think Waymo has its own closed driving course. And they’re basically trying to generate a whole bunch of data to feed their algorithms, because the accuracy and the performance of their algorithms are entirely based on the amount of data they have. And successful AI is one that can do well when it sees something that it hasn’t seen before. And there’s still a lot of limitations with that.

So I think the first thing would be to take the hype that you hear in the media with a grain of salt and realize there are limitations. And most AI’s gonna be really good at potentially one or two things specifically, but we’re still a little bit away from general AI.

So I think that’s the biggest thing. But on the flip side, I think there’s a lot we can recognize, and AI can do a lot better than us. There’s a lot of things that can be automated that computers are just better suited to do than humans. I believe that the automation of a lot of things is actually gonna be a really positive thing. Sure, there may be some jobs that are impacted, but I think by and large it’s going to be a really positive thing. It’s just increasing the effectiveness of the tools we have. Throughout human history, every time we’ve gotten new tools, it’s ultimately been a better thing.

Mike: Okay, good. We got a really good question from a listener named Dea. You mentioned having a solid portfolio with interviewing. Can you talk about what that portfolio might contain or should contain?

Beau: Yeah, that’s a really good question. I actually posted about this recently on LinkedIn, so it’s something that’s fresh on my mind. A really great place to house your portfolio is GitHub. GitHub is a code repository. You can put your code up there so other people can see it. You can have a markdown document, which is a little blog post describing what you did. Put any files up there. That’s a really good place to put the projects you’ve worked on.

In terms of what to put in your portfolio, I think examples of projects that show that you can take data and turn it into something useful. That’s a great thing to do. I always recommend that people who want to build a data science portfolio spend time analyzing real-world data sets, not just the ones you find on Kaggle or other places like that.

Kaggle, for those who may not know, is a website where data scientists can practice machine learning and compete against each other. And companies can actually set up a competition and have data scientists compete on the company’s data set to produce the best model. Kaggle’s a great place to practice and learn about machine learning, but a lot of times the data sets are a lot cleaner than you would actually see in real life. So I always recommend data scientists to get experience outside of Kaggle.

Two examples of things in my portfolio that really helped me make the transition back into data science from law was, one, a couple years ago I built a bot that would comb my wife’s favorite fashion sites, find the best deals, and then post them to her website or her blog.

Mike: That’s awesome.

Beau: And I’d use a machine learning algorithm on the back end to decide what were the best deals. I think on my GitHub now I have some code, an older version of that up there. That illustrated some of my skills.
Another thing I did was in law school. I did a study on how law firms use social media. Again, I created my own data set. I went to over 1,000 different law firm websites in Orange County and scraped them, classified them by type of law. I used data science to classify them based on keyword and natural language processing. And then I figured out what social media sites they were using to get it and what the size of the law firm was.

I constructed this data set, and then I also surveyed over 400 attorneys. I built this massive survey and understanding of how lawyers in Orange County are using social media, and I used data science to do that.
Those are two examples from my portfolio of projects that were real-world data, were things I was interested in, and I just went out and did a project. I didn’t have to have someone tell me, “Go do this.” It wasn’t a school assignment. It wasn’t something else like that.

Mike: And what I love about those projects is you’re showing the hiring manager that not only are you passionate about it, but you are curious and you went out and did these projects on your own to help solve a problem or help answer a question.

Beau: Yeah, absolutely. And you know what? It doesn’t need to be doing a big project like that on your own. One example of someone who I hired, something in his portfolio that I really liked, was he was working for a government agency where they did all of their data analysis in Excel. And Excel is great for a lot of things. When your data set starts to get bigger and you’re trying to do more complex things, it can be a real pain.

He noticed there was this huge inefficiency and decided he was going to write a Python script, unasked. I think he was an intern at the time … A Python script to do everything he was doing in Excel. And he built the script and sold his boss and everyone else in the company on using this specific Python script. They ended up cutting the time for the specific task by 30 times or something. For me, that was a really powerful example of someone who identified there was a problem. There’s a better way that I can do it. He went out and learned what he needed to to write the script in Python and then ultimately had a better result for the company he was working for.

That kind of thing in your portfolio — even if you can’t, because of intellectual property reasons or whatever, share the code for that, having that story or description of what you did, that’s huge.

Mike: Definitely. Beau, I want to thank you so much for your time and for being our guest today, sharing your insights with us. Where can everyone learn more about you and also your live video events?

Beau: The best place is LinkedIn. I post a lot on LinkedIn. I’ll post about the video events there. We’re always doing recording. I have a good group of other prominent data scientists I’ve been doing these events with. So connect with me on LinkedIn.

Mike: I just put up a short url: ex.pn/beauwalker, and I’ll set up a redirect that will go straight to your LinkedIn profile.

Beau: Okay, perfect. Awesome.

Mike: So people can follow you there, because you’re doing awesome work on LinkedIn, by the way. All your posts, it’s killer. You’re being very, very helpful. You’re so responsive to everybody.
Beau: Thank you. It’s a lot of fun.

Mike: I’ll set up a redirect. So if everyone wants to follow Beau, make sure to go to ex.pn/beauwalker. Thank you again for your time today, and we’ll chat soon.

Beau: Okay. Thanks so much.

About Beau Walker

Beau Walker is a Senior Data Scientist at Liquid Biosciences and Data Science Instructor at the Data Science Boot Camp at the University of California, Irvine. He also serves as a Data Science Mentor with Thinkful.

Beau Walker earned his Bachelor of Science degree in Biology and Masters of Science degree in Ecology and Evolutionary Biology from Brigham Young University. He also earned his J.D. in Intellectual Property Law from the University of California, Irvine. Make sure to follow Beau on LinkedIn for data science advice and check out his online Q&A sessions on Instagram Live.

Check out our upcoming live video data science discussions.

Never miss a blog post!

Subscribe to keep up with all things Experian.
Subscribe