Listen to the podcast:
Every week, we talk about important data and analytics topics with data science leaders from around the world on Facebook Live. You can subscribe to the DataTalk podcast on iTunes, Google Play, Stitcher, SoundCloud and Spotify.
This data science video and podcast series is part of Experian’s effort to help people understand how data-powered decisions can help organizations develop innovative solutions and drive more business.
In this #DataTalk, we talked about the usage of machine learning by human resource teams with Parveneh Shafiei, Senior Data Scientist at EY in Milan, Italy.
To keep up with upcoming events, join our Data Science Community on Facebook or check out the archive of recent data science videos. To suggest future data science topics or guests, please contact Mike Delgado.
Check out her presentation on “How to Start Your Journey as a Data Scientist”
To suggest future data science topics or guests, please contact Mike Delgado.
Here’s a full transcript:
Mike Delgado: Hello, friends. Welcome to our weekly DataTalk, where we talk to data science leaders from around the world. We are super excited today because we’re talking to Parvaneh Shafiei, who also goes by Perry. I’ll probably be using Perry because it’s easier for me to pronounce.
Today’s topic is HR analytics, the future of machine learning and human resources. Perry is a senior data scientist at EY in Milan, Italy, and she’s also the founder of R-Ladies, which is awesome. So we’ll talk a little bit about that as well. Perry, thank you so much for being part of our show today.
Parvaneh Shafiei: Thank you, Mike. It’s such a pleasure to join your online show. I’m super excited to be here.
Mike Delgado: For those watching the video, you’ll see me do some movements like this, because I’m in a new room and if I don’t move, the sensors will turn off the lights. So those are all the awkward movements I’ll be doing throughout the show today.
Parvaneh Shafiei: Mike, as a correction — sorry to interrupt — I’m not the founder of R-Ladies, but I’m the founder of R-Ladies Milan. R-Ladies is a global organization, but each city has a specific group.
Mike Delgado: Tell me about R-Ladies, how you founded it and what you guys are up to.
Parvaneh Shafiei: R-Ladies is a global group that has the objective to encourage more ladies and girls to be more into data science and artificial intelligence, because we have a lot of gap between females and males in this kind of sector, specifically on the technology. Once I participated in a meetup related to data, and one guy started to talk about R-Ladies and how it is distributed around the world. I was like, wow, I like this. This is so new. I love technology, and I feel that I’m in the minority most of the time in technology groups. So, I was like, OK, let’s see if there’s anyone in Italy or Milan. I didn’t find anything in Milan and just one group in Italy. So, I was like, wow, maybe I should start this group in Milan. Why not?
I started this group with a friend in July 2017. We hold various events each month, mostly workshops. The objective is to involve more ladies to speak or present data, to talk about their data experiences, what they are doing, what their goals are, and data science. I get a lot of support from ladies, and even from the boys. We sometimes have boys as speakers.
Mike Delgado: Men are allowed.
Parvaneh Shafiei: Of course. We need diversity. Well, actually, we need some priority. The priority is going to be girls, but most of the time there are also male participants. It’s really a great pleasure to be able to welcome any type of gender.
Mike Delgado: Are there any requirements to join the meetups?
Parvaneh Shafiei: No, not at all. You must have passion for the data, and that’s it.
Mike Delgado: That’s cool. That’s saying like no Python allowed.
Parvaneh Shafiei: Well, let’s say we are also open to Python. I like that, having more community to create for data science for ladies in order to support them. Sometimes, some of the speakers after the meetup tell me, “This was totally new experience for us, and we would like to share more because we get a lot of energy from the community.” These are the changes that I like to create, and this is the positive part. In the world of data science, since there is no specific solution to the different business challenges that we have, we need diversity and a lot of ideas. And where we can get this diversity? From different people with different mindsets, from different genders with different beliefs. That’s the key to it.
Mike Delgado: Definitely. I love that not only are you working as a data scientist, but you’re so passionate, you’re trying to get everyone in, get more diversity, get more people educated. It’s so awesome that you’re doing that.
Parvaneh Shafiei: Thank you.
Mike Delgado: Tell me a little bit about your path from schooling and getting into data science, because we have so many people in our data science community who are curious about how you got to where you are today.
Parvaneh Shafiei: I studied software engineering. I was a web and software developer, which is not so far from the world of data science and technology in general. But after a while I get bored, and I need a lot of diversity in my life and in my work. So, I was like, wow, this new, cool thing called data analytics is so interesting, and I had all this passion about the algorithms and machine learning and these topics. So, I was like, OK, I’m interested in starting a new road, so how about studying computer science in university. At that time I was in Iran. I’m originally from Iran.
To have an independent experience, I decided to move to another country to study. I selected Italy, which I love [inaudible 00:06:32]. I studied computer science, focused on the data in the [inaudible 00:06:38], which is one of the greatest universities in Italy. I studied a lot of arguments not specific on data, but about image processing as well. After that, while I was doing my studies, I started to get training in one component called [inaudible 00:06:57]. They’re a consulting company based in Torino. I was a trainee at that company, and little by little I started to do some online education from various websites, specifically [inaudible 00:07:12]. Little by little I started to get more skills in data science. Since I was a programmer before, it wasn’t that hard for me to grasp a new concept. Then I moved to a different company called Reply, which is a still a consulting company in Milan. Finally, I settled down in [inaudible 00:07:38].
Mike Delgado: Wow. You say you settled down, but you are just so active, doing so many different things. It’s so cool to hear your history. And what a great place to live, in Milan.
Parvaneh Shafiei: It’s an amazing place. I love it.
Mike Delgado: So, today’s topic is all about HR analytics.
Parvaneh Shafiei: Yes.
Mike Delgado: It’s a fascinating topic. Can you talk a little bit about how HR teams are beginning to use machine learning or data science to help them, and in what ways it’s helping them?
Parvaneh Shafiei: I started to work in the HR sector in [inaudible 00:08:23]. So, it’s really, really wide and open and a safe door to open in order to apply data science and advanced analytics method. We can start from a lot of small processes, from recruitment till the end of the employee’s journey, which is like retention or the [inaudible 00:08:47]. There are a lot of components of the HR processes that we can support through data analytics and machine learning technologies. From there, finding the right talents, from the [inaudible 00:09:02] to onboarding and then measuring employee performance or even understanding who is the next best person to be placed in this specific position. Or how we train our employees and customize the training based on their skills, based on the future career that they consider for them and their retention, how we can keep the employees who have value for us.
Mike Delgado: That’s brilliant. You know, one of the frustrations that I think a lot of us have is that from the human side, there’s going to be a lot of bias.
Parvaneh Shafiei: Sure.
Mike Delgado: Humans can be biased on how they hire, like gender or schooling. If you’re an alumnus of a certain school, yo might biased toward hiring people from that college. The goal of being a good recruiter is to try to eliminate that bias because you want to bring in the right talent, and it doesn’t matter necessarily what school you went to or gender identity or whatever. You just want to bring in the best talent. So can you talk a little bit about how data science can help to remove bias in the recruitment process?
Parvaneh Shafiei: Recruitment is an interesting HR process. It’s really time-consuming, and it costs a lot of effort. Consider if you have an open position, and you’re extracting a lot of people to send you CVs. Let’s say you are receiving 200 CVs each day, so you’re having a lot of trouble reading all of them.
The best thing that you can do is just do it randomly, or just search based on the specific keyword that you can find in the CVs. Let’s say you can open 200 CVs at the same time. This is where HR analytics will help you summarize this kind of CV content into a more meaningful concept. If you’re searching for a resource and talent for, let’s say, an analytics position, the best idea is to search in the CVs to find which contain words related to analytics. We can create an algorithm to automate this process, to find the CVs that include the kinds of analytic, data processing, [inaudible 00:11:49], any related keyword that can describe the content of CVs where they talk or relate to person. And then you can just, let’s say, restrict 100 CVs to 10 CVs. This creates a lot of improvement in your process in order just to focus on 20 people instead of 200 CVs. It’s going to help you a lot by removing unnecessary efforts in the work that you have to do.
Also, you can create, for example, algorithms to find the best matches for your open positions, no matter what kind of background they’re coming from, if they’re male or female, or if they’re educated at a specific university. You’re just matching their top performance characteristic to the characteristic that you can grasp from the CVs, and why not? Why not use this interesting method instead of a manual and time-consuming process?
Mike Delgado: Yeah, no doubt. The recruitment process is time-consuming, with the recruiter having to make phone calls. Being able to have something to help create and find the top talent to make the CV pool better …
Parvaneh Shafiei: Exactly.
Mike Delgado: It’s going to help the recruiter, help you save time and money.
Parvaneh Shafiei: Yeah, sure.
Mike Delgado: So, for [inaudible 00:13:24] community is always fascinated about what sorts of algorithms you are typically working with or maybe building to help you with this process.
Parvaneh Shafiei: It depends on the kind of objective that we are searching for. For example, in the check analytics, the objective is to predict if the person has the chance to leave or not, so the output is going to be a one and zero. In this kind of situation, the algorithms that are binary classifiers are going to help us, such as logistic regression or decision trees, SVM, random forest or neural network. Typically it depends on the nature of the problems that we’re facing. Normally I try different kinds of algorithms, and I select the one that has high performance with respect for the others.
So it depends totally on the nature of problem that they are facing. For example, for the recruitment process, maybe I just need to automate the process of summarizing the CV. I need to apply the text processing algorithms in order to grasp the general concept or summarize the CV or find the right keywords or sometimes sentiment analysis to understand if the CV has a positive sentiment or negative one.
Mike Delgado: Tell me about the data. I get a lot of resume data, or data that people type in when they’re applying online. A lot of that is structured. Is there unstructured data, too, that you’re working with?
Parvaneh Shafiei: Most of the time, we may have structured information. If we are talking about the data about their education, about the training or about the performance, these are the structured data. But of course we have a lot of unstructured data — let’s say the contents of the CV, or if we want to take the brand measurement of the company. We do some type of social network analysis in order to understand how people are talking about the specific company.
So we are expecting a lot of information such as tweets or Facebook posts in order to analyze if people are talking positively or negatively and what the main topics are, so we have a lot of unstructured data as well. It’s not just a structure. This is the most challenging part of the science, that we have both a structure and unstructured data. Even if we switch to the performance analysis, sometimes we have, let’s say, a written statement about the person’s performance. So, instead of reading all these performance reviews, we can apply text processing techniques to have a summarized version of the performance reviews. It’s another example of unstructured data.
Mike Delgado: That’s cool. So earlier about you talked about recruitment, but you also mentioned retention.
Parvaneh Shafiei: Exactly.
Mike Delgado: And [inaudible 00:17:00] because, obviously, companies want to hold on to their best employees. So, I’m curious, how is data science helping with holding on to employees or flagging to management that a person might be considering leaving the company?
Parvaneh Shafiei: This is one of the best-known projects in the HR sector, let’s say, trend analytics. We did this project for one of our clients to find out why employees are going away. What we did was aggregate a lot of data from different sources. This data are coming from, let’s say, the organizational information, the training information, not the sector but the performance information, and we aggregated this kind of information together to create a unique structure data.
And then we did some exploratory analysis to find out the big picture and then create the story, telling about what happened in the company and the current situation.
And then we tried to do some root cause analysis to find out where and why the trend has happened. What are the main metrics defining the trend? Was it about this specific manager? Was it about this specific university? Was it about this specific performance?
And we had a significant insight about that training data, that the employees who did a lot of unnecessary training had higher retention, while the employees with fewer training courses changed sooner with respect to the others. The next step was doing the predictive part, who is at risk of leaving compared to the other employees. We created three clusters of employees: high risk, medium risk and low risk. This gives the company a great insight to understand who needs immediate attention to keep and retain.
Mike Delgado: That is so cool. That predictive part is so fascinating.
Parvaneh Shafiei: Yes, exactly. For some employees, it’s kind of scary that it’s able to predict when you are going to leave the company. They can understand your decision before you say it. But let’s focus on the positive part, that they are going to help you to stay in the company and improve the situation that you are working in. If you don’t fit that well with this position you have, if you are valuable to the company, they’re going to create a better environment for you, or change your sector or providing better training courses.
Mike Delgado: I don’t know if you’re looking at employee survey data, like how people are feeling.
Parvaneh Shafiei: Yes.
Mike Delgado: I know you have lots of data that you’re working with, but survey data seems kind of … people can lie, right?
Parvaneh Shafiei: Yes.
Mike Delgado: How does that factor in as you’re looking at survey data? People might be very sad in their jobs, but they say that they’re happy. So that survey data can just be kind of unreliable sometimes.
Parvaneh Shafiei: Exactly. That’s the topic that we are talking today. [inaudible 00:21:14] felt that the traditional HR process to understand people is to do a survey. Normally there’s an analog survey to understand employee sentiment, and sometimes they do it through a third company to tell the employees that it’s OK to give your true view about us.
But sometimes the employees are not honest enough, or they don’t want to give their real view, or they’re just having like half-time, they don’t want to dedicate a lot of time to respond to a survey that takes 10 or 15 minutes, sometimes 30 minutes. So they’re just doing it without any attention. That’s the part where analytics is going to help you. Instead of asking the employees directly how they are feeling, you can get it from their data. And that’s the fascinating part about advanced analytics, that you can understand people’s sentiments and performance without asking them directly. In my view, combining both methods is going to be better than just relying on the advanced analytics results. You can compare the responses of employees in the survey and then compare it, but you have achieved true advanced analytics, and that’s the true value.
Mike Delgado: Yeah, it’s brilliant. The work that you’re doing, it’s fascinating, especially the predictions as you’re helping companies to find. So, in your testing, how accurate are the predictions that somebody is probably thinking of moving on within the next six months? Is there like a percentage?
Parvaneh Shafiei: In general, the quality of the predictive algorithms totally depends on the quality of data that you can collect. If you don’t have good data, you can’t have a precise model. The objective of predictive algorithms is not to create the best algorithm that is so precise in predictions. We are searching to create a general and simple predictive model that can describe the general behavior of the employees. That’s the main objective of the predictive models. For sure, we are going to find some confidence interval so we can represent that this predictive model is working, but they can give you the probability that within, let’s say, 60 percent to 80 percent, these employees have a high probability of leaving, because they have a characteristic so similar to the people who have left already. We are giving some confidence interval that, OK, this is the confidence interval the predictive model is giving you. That’s the advantage of the advanced analytics.
Mike Delgado: It must be funny for you, working in this role and you’re talking to your boss about it and your boss is like, well, let’s see where Perry stands. Is Perry going to be leaving?
Parvaneh Shafiei: Yeah. That’s the funny part. One of the companies where I worked, the HR sector came randomly to one of my searches. And then this [inaudible 00:24:48] analytics came to my mind and then I divided with the team that I was working. At that time there was a contest about proposing new ideas, what company can work on it. And one of them was exactly predicting who is going to go every six months, and in that period I found another offer for my work and I left. So, I was like, well, this is the best use case that I could introduce to my company about it.
Mike Delgado: That’s awesome. And then you can predict when your boss is leaving, too.
Parvaneh Shafiei: Exactly. I’m sure they aren’t happy about it, but if the company gives you a better offer and tries to keep you, this is the best approach that you can have.
Mike Delgado: That’s brilliant. I love it. This is so fascinating, learning about all the things you’re doing. So, before we end, we always ask our guests a few common questions just to get their thoughts.
Parvaneh Shafiei: Yep.
Mike Delgado: I think I already know the answer to the first one. What is your favorite programming language?
Parvaneh Shafiei: Yes, bet you already know. It’s R, because for me it’s easier to use, it’s more user-friendly, and there’s a large community that supports you to answer your questions, or a QA team, various packages to tackle different problems in the data processing. So I love it.
Mike Delgado: I would love to see a debate: R versus Python at a meetup.
Parvaneh Shafiei: That’s a good question.
Mike Delgado: That would be pretty funny.
Parvaneh Shafiei: But, again, all these things are just tools to tackle a problem. So it matters how you can use it, if it’s easy for you to use Python to tackle a specific problem, so just go for it. It doesn’t matter if you are using R or Python or even a Java-based programming language. It’s just important to solve the problem.
Mike Delgado: Right on.
Parvaneh Shafiei: Of course, also, we have to consider this capability of the solution that they are searching, we are resolving. So this is another topic that depends on the kind of environment that we’re talking about.
Mike Delgado: Brilliant. What advice do you have for those in the data science community who want to start their careers, or people who are just interested? Maybe they’re in college right now, and they’re listening to you, and they’re like, “Oh, I want to get involved data science, but I’m not sure where to start.” What advice do you have for them?
Parvaneh Shafiei: First, be sure that data is something that you love to do. If it is, you don’t have the fear to work with the various kinds of data sets. It can be images, it can be text, it can be voice or videos. It depends on the organization. And if you are not coming from a technical background such as computer science, be sure to be able to use the programming languages easily. Select one of the languages such as R or Python — any of them is fine — and then you have to cover the basic part, which is learning about the statistics and the mathematics and the probability discussions. And after you covered the mathematical part, you have mostly the part that you need to work with to do some data visualization, or do some data mining and create and be a good storyteller. From what you have done and what you’re applying on your data, you have to create a good story.
And after that, we reach the part of working with the databases. Sometimes you are lucky enough that they provide you the specific data in Excel or CSV or Word format, but sometimes you have to work with the databases in order to perform various queries to extract your information. So you may need to learn SQL-based queries and languages. So just go for that one. And then we reach the part of advanced analytics and machine learning, where you have to get acquainted with various algorithms from the most basic ones, such as delinear recreation, logistic recreation, to the more advanced ones, such as the neural networks and deep learning.
It’s a journey, and there is no end to this journey.
Sometimes you may need to learn other skills such as the text processing, social network analysis, organizational network analysis. It depends what you are facing. What I love about data science is that you need to learn every day. I continue to learn every day, and there are a lot of online courses that can help you such as Coursera, [inaudible 00:29:55], data camps that work on a specific topic, and a lot of online communities such as Cargo, which is a database website that hosts various database competitions. Be free to use this kind of online communities and be able to use data easily. And after all this, you have learning you must apply in a real database problem. So extract the various databases from the websites I just gave you, play with them, create reports, present in various communities and meetups, even on the website in the [inaudible 00:30:40]. As I said, you can share your kernel, sharing your results so others can see, comment and give you feedback. Then you are learning from others, and that’s the best part.
Mike Delgado: Love that. I can’t wait to get this transcribed. You’ve provided so much good advice — the constant learning. If you’re going to enter data science, you have to be ready.
Parvaneh Shafiei: Exactly.
Mike Delgado: This is a field where you’re never going to get stagnant. If you get stagnant, you’re going to get out.
Parvaneh Shafiei: Exactly. Every day there is something new that you have to learn. It is advancing a lot, so you have to be improving at the same time. This is why I’m not getting tired of it.
Mike Delgado: Yeah. That’s wonderful. OK, last question. What advice do you have for people who are hiring data scientists? If you were hiring somebody for your team, what would you look for in that candidate?
Parvaneh Shafiei: Do you mean the companies that are hiring data scientists? How to use them the best?
Mike Delgado: Yeah. Like, if you were hiring a data scientist to work on your team, what would you be looking for? Like skill set, personality type? What’s important to you when you’re interviewing somebody?
Parvaneh Shafiei: Ah, OK, for attracting the right talent. Since we don’t have a specific task in data science, sometimes it is just about the automation of the processes. I try to search on the various data skills that the person has. Maybe he or she is a good presenter and can create a good story out of the data, but might not be the best one to create predictive models. Or somebody is a good communicator, which is related to the data story part as well, but not so technical. There’s sometimes a person that’s just so technical, but not so good at presenting and communicating ideas. Fine.
It’s not just about one specific skill. I’m not searching for a person to have all their skills at the same time. I’m trying to search for somebody with a good background in data science and then focusing on a specific task, a good storyteller, a good visualization, or a great person who has developed a lot of predictive models, or a person who is just doing a lot of data processing tasks. I’m trying to find a different person with a different skill set and then match them all together to create a team with different skills and abilities.
Mike Delgado: That’s awesome. Perry, this has been a blast chatting with you, learning about all your amazing work. You’re doing HR analytics. I want to let those who are listening to the podcast know, if you’d like to watch the video or read the full transcription, get links to follow Perry on LinkedIn, the URL is ex.pn/datatalk41. That will bring you to the blog posts with a full transcription. By the way, Perry, if you have any other links or resources you want me to put on that blog, you can email them to me, and I’ll make sure to put them up.
Parvaneh Shafiei: Of course. I can maybe provide a presentation about how to start the journey as a data scientist.
Mike Delgado: Oh, yeah.
Parvaneh Shafiei: So you can share it with your audience. I will be more than glad to do it.
Mike Delgado: Yeah, I would love that. I’ll definitely put that on the blog. Thank you so much for being our guest. It was an honor talking with you.
Parvaneh Shafiei: Thank you, too.
Mike Delgado: Best wishes, and for everyone who’s listening today, we will be back next week with another DataTalk. Thank you so much, and take care, everybody.
Parvaneh Shafiei: Thank you so much, Mike. It was a great pleasure.
Mike Delgado: Thank you, Perry.
Parvaneh is a senior data scientist in People Advisory Services (PAS) in EY with more than two years of experience in the application of machine learning methodologies (descriptive and predictive) in business for identification and structuring innovative initiatives and solving problems. She is also the founder of Rladies – Milan: a worldwide organization whose mission is to promote gender diversity in the R community. She earned her Master’s Degree in Data Processing from Politecnico di Milano.
Check out our upcoming data science live video chats.