Every week, we talk about important data and analytics topics with data science leaders from around the world on Facebook Live. You can subscribe to the DataTalk podcast on iTunes, Google Play, Stitcher, SoundCloud and Spotify.
This data science video and podcast series is part of Experian’s effort to help people understand how data-powered decisions can help organizations develop innovative solutions and drive more business.
To keep up with upcoming events, join our Data Science Community on Facebook or check out the archive of recent data science videos. To suggest future data science topics or guests, please contact Mike Delgado.
In this #DataTalk, we learned from Dr. Paul K. Newton at USC about how biology-based mathematical models are used to optimize chemotherapy treatments.
Here is a transcript:
Mike Delgado: Hello, and welcome to Experian’s weekly #DataTalk, a show where we talk to data science leaders from around the world. Today’s topic is Biology-Based Mathematical Models to Optimize Chemotherapy Treatments, and we’re super excited to have Dr. Paul Newton, who served as the professor of aerospace and mechanical engineering, mathematics and medicine at USC. He is also the Editor in Chief of the Journal of Nonlinear Science.
This is a very special edition, because we never covered anything, Dr. Newton, about medicine and data. This is a very special episode for me. It’s an honor to have you. I thought maybe we can get started with you sharing your story, your journey academically and the path you took to where you are now.
Dr. Newton: By training, I’m an applied mathematician. I was an undergraduate at Harvard, and I majored in physics and applied math. Then I got my Ph.D. in applied math at Brown University, which has a separate applied math department as well as a math department, so they have two separate departments. Those are two different tracks.
If you get a Ph.D. in applied math, you’re learning all kinds of things, from probability theory to computational science to data analytics methods to differential equations to linear algebra. Those are the topics you would learn as an applied mathematician. Whereas if you go in the pure math track, you’re proving theorems and you’re doing all kinds of different things.
I like science a lot. I was always interested in physics and biology, so I went on that track. That’s my background. I did not have any training in cancer biology particularly. That came about 10 years ago when I teamed up with a group of people at the Scripps Research Institute down in San Diego. I got a phone call, a cold call, from a guy down there by the name of Peter Kuhn, who is a specialist doing circulating tumor cell biology. He takes blood samples from patients working with the Scripps Green Hospital there. Then they extract the small number of tumor cells that come from a tumor in a cancer patient, and they look at the genomics of that cell. They look at all kinds of the physical properties associated with that cell.
He thought it would be good to try to get one of these physical science oncology centers that the National Cancer Institute was announcing. As I said, this was about 10 years ago. He wanted to write a proposal and try to get one of these national centers where we would have a group of applied mathematicians, engineers, physicists working together with biologists and oncologists doing what they call a physical sciences approach to cancer biology. We teamed up and we wrote a proposal having to do with the fluid mechanics of blood flow in the body and how circulating tumor cells travel through the bloodstream and how they get tracked at various sites and then eventually form metastases.
We wrote a proposal, which I look back on now and I smile at because a lot of the things that we thought we were going to be able to do didn’t pan out. On the other hand, a lot of the stuff that didn’t pan out we didn’t mention at all in our proposal. Anyway, we got one of the centers. There were, I think, 11 or 12 of these centers throughout the United States. I was working with that group for five years, going to San Diego a lot. That, I think, was between maybe 2009 and 2014.
Since then I’ve branched out, and I work with groups of oncologists and biologists at various places around the country in cancer centers, including USC.
Mike Delgado: It is so cool how you started building this center. Was this the first of its kind? I’ve never heard of anything like this.
Dr. Newton: It was the first of its kind in the sense that the National Cancer Institute had this big initiative and a big push because they felt that cancer research had become a little bit internalized. People make progress for sure, but the progress that people were making had plateaued a little bit. So the National Cancer Institute and NIH in general were looking for new ways to invest money that might have a bigger payoff.
Their big push was to try to get data people, physicists, engineers and quantitative science people, mixing it up with biologists and oncologists, who are tremendously smart and dedicated people, but they don’t necessarily have the same quantitative training that somebody who comes through an engineering, applied math, physics, data analysis background. That was really the brainchild of the National Cancer Institute.
Then, they had a big call and initiative and probably had 50 or so proposals or maybe more, and they ended up funding about 10 or 11 of them.
Mike Delgado: That is awesome. We have a lot of people in our data science community who want to start their careers looking at different paths. I’m curious, for your center when you’re looking to hire or bring on a data scientist, what skill set is important for them to have?
Dr. Newton: That’s a good question. For me, I think it’s important to have a mix of different kinds of people because it’s such a broad area that there’s not going to be any one person who’s going to have a strong background in statistics, a strong background in data analysis — let’s say machine learning, let’s say topics like that — and a strong background in mathematical modeling.
Let’s say differential equations and physical modeling and a strong background in Python and coding and things like that. I’d say it’s impossible to find one person, but when you get lots of people together, five, six, seven, eight, nine, 10 people, and then you throw in a specialist, an oncologist who works with patients, and you throw in a biologist who has a wet lab, who’s doing single-celled genomics and cell analysis, it becomes a powerful thing. I would say all of those tools potentially are useful. We look for a team approach more than just trying to find one person or two people who can cover everything.
Mike Delgado: I think what’s cool is that the work that you’re doing is truly using … because people talk about using data for good and data philanthropy. It’s a buzzword. What’s great about what you’re doing is it’s truly using data science for good for all of humanity.
Dr. Newton: To be honest, USC’s engineering school, Viterbi School of Engineering, has also had a big push, which our dean calls engineering plus, which is a buzzword as well, but it makes sense. That is, there are lots of things to do in life. There are certainly lots of interesting things that one can do as an engineer or an applied mathematician or a scientist, but a lot of schools are moving toward trying to identify areas where you’re doing more than just doing science for science’s sake. You’re trying to have some sort of a social goal in mind or some sort of a person purpose in a sense. That’s also a big push at lots of different schools, this engineering plus or science plus approach.
Mike Delgado: It’s beautiful to see that. It’s beautiful to see the work that you’re doing at USC and your team. Initially, for those who are new to the broadcast, I saw this article that was on the USC website about Dr. Newton and his work on chemotherapy treatments. What was interesting to me was I had no idea how much data and mathematics worked alongside medicine. I just never thought about it. When I read this article, I was like, “That’s neat.” Can you share first what the traditional chemotherapy treatments are like and then move into what you decided to do with your research?
Dr. Newton: In the 1950s or ’60s, chemotherapy started, and scientists, oncologists started developing protocols. Basically, at that point in time the idea was that a tumor was made up of a collection of identical cancer cells that were growing at a faster rate than the healthy cells in the body. They might not have exactly believed that all cells were equal, but that was basically the operating assumption because they had no ability to distinguish them.
Once you view a tumor as a homogeneous collection of rapidly dividing cells, then clearly the approach would be to try to kill as many of these dividing cells That was the operating philosophy and, to a large extent, is still the operating philosophy, and it totally makes sense. as you possibly can and to eradicate the tumor. If you have a group of insects in a field and they’re destroying your field, you’re going to try and kill as many as possible.
The problem with that approach … Actually, before I get into the problem, let me just say what would follow from that assumption is that you would try to use the maximum amount of chemotherapy that you can in order to kill the maximum amount of cells. The problem is that patients, obviously, can’t tolerate an unlimited amount of chemotherapy. So there needed to be some sort of a balance between the maximum amount that a patient could tolerate and trying to kill as many cells as possible.
The protocol that developed is called the maximum tolerated dose. That was what oncologists developed, and they did clinical trials on what is the maximum tolerated dose for patients in different age groups, for males, for females and so forth. They developed MTD, maximum tolerated dose, protocols, which are basically off/on kinds of chemotherapeutic regimens where you give somebody a very high dose of chemotherapy for an hour, let’s say. They go into the hospital and then you let them rest for a week.
Then, they come back in the next week and you give them another dose, and you go through this for a period of months. That’s an on, off, on, off schedule. That’s called MTD therapy and to a large extent, not entirely, but to a large extent, that’s the standard operating procedure. You can tweak the amount, what the dose level is on that, but MTD is the operating procedure.
Another dose protocol that people sometimes use is called low-dose metronomic therapy, or LDM, where you’re giving a very low dose of chemotherapy but continually. That’s an interesting approach that has benefits as well. The total amount of chemo that you give would more or less be the same as the maximum tolerated because you’re giving a lower amount, but you’re giving it over long periods of time.
You can think of low-dose metronomics, in some ways, as being like taking insulin or trying to treat diabetes where you always have an insulin pump on you and it’s continually giving you a small amount in order to keep the disease in check. No one’s ever compared those two things in any kind of a quantitative way and, as you can imagine, there’s a lot of other things that you could do. Those are the two extreme things that you could do, but in principle there’s lots of other things that you could do.
As I said, if you view the tumor as a collection of homogeneous cells, then you would want to try to kill as many as possible, but now people realize, and in the past 10 years people realized, that a tumor is actually made up of a heterogeneous population of cancer cells that are all very different. They’re genetically different, they’re different in their growth rates, they’re different in how they respond to chemotherapy and all of those things.
Now, the analogy you should have in your mind is suppose you have a field and you have lots of competing kinds of insects that are ravaging your field. There might be one dominant group of insects that are the most visible and doing the most damage, but there might be another smaller group of very damaging insects that are also potentially harmful.
Now, your goal, if you just go in and you blast those insects with DDT and you kill the most damaging large subpopulation, the danger there is that another subpopulation of insects could very well survive because maybe they’re resistant to DDT and then take over the field. Then you’ve got a worse problem on your hands than you would have if you’d selected for that small subpopulation resistant to the DDT.
That is what happens and is the main mechanism of chemotherapeutic assistance. That when you blast a tumor with just a single kind of chemical, chemotherapy, you can be selecting for a subgroup of cells that are going to cause way more problems for you down the road. You can maybe see benefits in the short run because as you’re killing the dominant group of cells, the tumor might shrink. So it looks as if you’re making progress, but then inevitably what happens is that the tumor recurs and starts to grow and it’s a much worse situation because those cells are resistant to the chemotherapy.
Then you can ask yourself, “What would be the best approach now that I know that there are a bunch of competing subpopulations of different kinds of cancer cells?” Or you’re in a field and you have a whole bunch of different kinds of insects. Then, the thing to do is to … Let’s say that in an ideal world you could continually monitor the different levels of those subpopulations in the field.
Then what you would try to do is to manage that competition. You would take doses of chemotherapy or doses of DDT and you would try to kill the most damaging insects, but you wouldn’t try to wipe them out necessarily. You would try to reduce their numbers so that they are then competing head-to-head with another subpopulation and spending a lot of effort, and energy, and time competing against each other in a head-to-head battle instead of just one subpopulation dominating.
Then, the goal becomes how do you manage that in order to keep the different kinds of cancer cells to fight against each other in such a way that the tumor is smaller than it would be if it was untreated but not completely eradicated? Then what you’re doing is managing the cancer instead of trying to wipe it out. There are some benefits to that.
That becomes a tricky thing for lots of different reasons. One reason is it’s not possible at this stage to figure out continually all the different sub-population cells in a tumor the way it might be to look at all the different insects in a field. It’s much easier to monitor that, so that’s a challenge. The other challenge is even if you would assume that you can tell what the different sub-opulations are and the different cells are that are resistant, what is the best approach to try to manage that competition? That is how we viewed the problem, and there are other groups that are viewing tumors this way, sort of as an ecology, so people call this tumor ecology.
Using evolutionary principles to manage a tumor instead of using maximum tolerated dose.
That’s an intro into the thought process that goes on behind people who use Darwinian evolution ideas to manage this competition among all the different kinds of cells.
Mike Delgado: What you just explained is amazing. The fact that you’re using evolutionary theories to help with cancer treatments — especially game theory. Is that the proper term?
Dr. Newton: Yeah.
Mike Delgado: To make cancer cells compete with each other. I’ve never heard of that.
Dr. Newton: It’s a pretty new field. I must say there’s our group and there’s a group of people at the Moffitt Cancer Center in Tampa, Florida, where my ex-Ph.D. student is now a postdoc. His name is Jeffrey West. He is working with a group of clinicians down there, and they’re trying to develop clinical trials based on these ideas. They have some very good people who are using these methods to try to test them out in clinical trials.
Mike Delgado: Wow. You have this amazing team at USC that you’re working with. Tell me about the role of the data scientist or the people who are now analyzing data. Where are you placing them, or what are they focused on in this research?
Dr. Newton: I have a group of about five Ph.D. students working on various aspects of this, and then we work with other labs. I mentioned a biologist by the name of Peter Kuhn who has a lab, and another oncologist here at Tech named David, who is a clinician. We work with different groups of people, but my students do several different things. Some of them do mathematical modeling in the sense that they’re looking at, as you said, game theory models of evolution and how to balance these competing sub-populations using what we call feedback control or adaptive control theory in order to design chemotherapeutic schedules to try to get these sub-populations of cells competing against each other.
These would be people who are getting their Ph.D., let’s say, either in applied math or physics or engineering, and they are doing computational models of a system of equations. Typically, these are called the replicator equations, but it doesn’t matter what they’re called. Or you could do cell-based models that are stochastic that models and use game theory evolutionary principles to try to model this. That’s one kind of person who typically would be getting a Ph.D. in applied math, engineering or physics.
Then, another kind of data science person would be more a machine learning and a big data person. I have a kid who is going to be defending his thesis in August who has done an awesome Ph.D. thesis using those kinds of techniques to look at all kinds of different cancer models. He got his master’s degree in computer science and then is getting his Ph.D. in aerospace and mechanical engineering. He’s got a job already lined up, a full-time job as a data scientist at JPL. Amazingly, his whole thesis is all about data science’s approach to healthcare and biology and he got snapped up by a data science group with JPL, but he’s happy about it and he wanted to stay in Pasadena.
Mike Delgado: That’s cool. It must be hard to keep such a smart team together because everyone’s wanting to tear your team apart.
Dr. Newton: The challenge is that it’s a bit of the ramp-up period. When you get a new graduate student, you’ve got to train them for a couple years. They have to take lots of classes; they have to pass exams in order to get through the master’s level into the Ph.D. level before they get really serious about the research. It’s a big investment to train for a couple years, and then they work on their Ph.D. thesis maybe for two or three years, or sometimes even four years after that.
Mike Delgado: What other types of projects have you done with medicine and math, maybe in the past, that you’re happy about?
Dr. Newton: One of the big projects that we started out with, as I had mentioned, at the Scripps Research Institute was basically on forecasting associated with different kinds of cancer. In other words, we looked at metastasis, and we wanted to build what are called dynamical systems, or forecasting models of how metastatic cancer is going to proceed for different kinds of cancers. We used Markov chain kinds of models, which are a certain kind of relatively simple dynamical system approach to predicting and, if you have data, that you can train your models on.
This is an ongoing project that we’ve now written lots of papers on, and we work with groups at Sloan Kettering and at MD Anderson, as well as Keck. What we do, basically, with that whole project is we get longitudinal data sets. Longitudinal data sets are data sets that track large cohorts of patients who have cancer for 10, 15, 20, 25 years, let’s say. They track them and they keep track of every single time they get a treatment, every time that their cancer spreads to a different site, they mark down the date and so we have a long-term dynamical trajectory of thousands of patients of different ages, different genetic types, different kinds of cancers. A lot of these cancer centers have these longitudinal data sets just stored away in their files, but they don’t know what to do with them. They’ve never analyzed them.
Seven or eight years ago, we realized that these longitudinal data sets were super interesting, and super important, and useful for developing models. We started using those longitudinal data sets to train our models, our Markov chain models of progression. That has proven to be useful. Our group is most known probably for those kinds of models since we’ve been doing that the longest.
Mike Delgado: I think it’s awesome that you were thinking about these different studies that have been going on at these different cancer centers and then going through the process of collecting the data. When you approach these different cancer centers, obviously, you’re a university. How do you deal with the privacy aspect of the patients?
Dr. Newton: It’s delicate. First of all, you can’t just knock at the door at a cancer center and introduce yourself and say that you’d like their data. That’s not going to work particularly well. It might work at Keck because I’m a professor at USC and so I know people at Keck. You sign privacy laws, and you go through some training having to do with how to deal with medical data. So that is all important.
Really, the most important thing if you’re going to try to work with a data center is to have someone at that center who you know and who you’ve met at conferences, and who’ve you talked with. So there’s a ramp-up period there. You have to develop a certain level of trust or comfort with people. Then, almost invariably, they will say they have data that is just sitting around that no one has looked at, and it will be interesting for them if you could do something with that data.
The thing that’s amazed me is how much data is out there in the medical community. In my little world, it’s just the cancer world. Just in the cancer world there’s so much data that hospitals have that doctors just collect over years and years, and it’s just sitting there in files and has all this information in it, and no one has extracted that information from it. I think that’s a huge area of opportunity to develop.
Mike Delgado: Yeah, that’s amazing. Like you said, there’s so much time involved in building those relationships, building that trust —
Dr. Newton: It helps a lot to be at a big university that has a medical school and lots of professional schools. You typically would start out working within your university system as a graduate student or even some undergraduates who get into this, but mostly graduate students, postdocs and faculty members who work together with the oncologists at that university.
Mike Delgado: What’s interesting is I’ve seen a parallel, talking about data ethics and also the medical community. When someone becomes a doctor, they take the oath to do everything they can to save human life. There is now a movement, and I think it was started by DJ Patil, who was the Chief Data Officer for the White House under Barack Obama.
I see him now talking about how we need to have a set of data ethics standards that data scientists ascribe to to make sure that we’re doing everything we can in the data scientist community to protect data, protect privacy. I’m curious about your thoughts on what sort of training or guidelines your scientists subscribe to to make sure that data is going to be properly taken care of.
Dr. Newton: It’s very delicate, no question. The whole data question, as we’ve seen in the last couple weeks, is a really delicate issue. Healthcare data and medical data is even worse.
Most universities have programs and systems in place that train researchers in the basics of how to maintain confidentiality and things like that. There are checks and balances in place. I’m not saying that it’s not difficult. It is difficult, and you do have to learn some basic tools there. It’s a big subject, no question, and there’s definitely room for improvement.
The whole field of data science is just moving so fast and is ahead of a lot of the checks and balances, as you can see from Mark Zuckerberg’s latest testimony. My sense is that he’s trying to do the best he can, but the field is moving so fast, and he can’t control it all, and they’re trying to stay ahead of a moving wave.
Mike Delgado: Yeah, it’s huge. Dr. Newton, just one last question. It’s a question that comes up a lot in our data science community, and it’s around what advice you would give somebody who is finishing up graduate school, or finishing up college, and is looking to start their career in data science. What advice would you give them to help them on their way?
Dr. Newton: That’s a good question. The key to making yourself marketable, if that’s your goal — which it probably is for most people — would be to get a broad training in lots of different things so that when you go into an interview, when you have a team of people interviewing you, let’s say at a place like Google or Facebook, they’re going to be asking questions from all kinds of directions. From machine learning to statistics, to mathematical modeling, to computer science. If you have at least one course in each of those areas — of course you’re going to be specializing in one of those areas — but if you have a little bit of broad training in some of these other areas so that you can at least understand the questions they’re asking and at least you could see how those techniques could be useful, it helps a lot.
I think that was the key to the grad student I was telling you about who got this job at JPL. He’s very broad-based, and he can converse on lots of different kinds of topics, from modeling to statistics to genetics to machine learning. I think taking one class in all those different areas that you’re not specializing in can really pay off.
Mike Delgado: That’s great advice. Dr. Newton, thank you so much for being our guest in this week’s #DataTalk. For those listening to the podcast, if you’d like to watch the video or read the transcription, you can go to the blog, and the short URL is just ex.pn/newton. That’s also the place where we’ll be embedding the podcast.
Dr. Newton, thank you so much for being our guest. It was an honor to have you. You guys are doing tremendous work, using data for good with mathematics and medicine to help humanity. Thank you for everything that you’re doing, and it’s an honor to have you on our broadcast.
Dr. Newton: Thanks very much, Michael.
To suggest future data science topics or guests, please contact Mike Delgado.
Paul Newton received his B.S. degree in Applied Mathematics/Physics at Harvard University in 1981, with a thesis written under the supervision of G.F. Carrier and his Ph.D. in 1986 from the Division of Applied Mathematics at Brown University. He then moved to the Mathematics Department at Stanford University to work as a post-doctoral scholar under J.B. Keller. He became Assistant (1987) and Associate Professor (1993) in the Mathematics Department at the University of Illinois Urbana-Champaign (UIUC), and the Center for Complex Systems Research (CCSR) headed by Stephen Wolfram. In 1993 he moved to the Aerospace & Mechanical Engineering Department at USC as Associate Professor, and was promoted to Full Professor in 1998. He is currently Professor of Aerospace & Mechanical Engineering, Mathematics, and Medicine (Norris Comprehensive Cancer Center) at USC. He also serves as the Editor-in-Chief of the Journal of Nonlinear Science, Springer-Verlag Publishing.
Check out our upcoming data science live video chats.