Interview with Chris Wetherill – SafeAuto
About Chris Wetherill
Christopher Wetherill is a Data and Decision Science Analyst at Safe Auto Insurance Company, where he analyzes data in order to solve problems or help business leaders make more informed decisions. Before becoming a Data and Decision Science Analyst, Mr. Wetherill was a research assistant and Ph.D. student of virology and computational biology at Virginia Polytechnic Institute and State University, or Virginia Tech. He also served as a statistician at John Carroll University and a researcher at the Cleveland Clinic Neurological Center for Pain. Mr. Wetherill earned a bachelor’s degree in Psychology from John Carroll University
Mr. Wetherill’s work has been presented to a number of scientific organizations, including the Midwestern Psychological Association, the Botanical Society of America, and the Cleveland chapter of the American Statistical Association. Research topics ranged from predictive analyses of professional football games to semantic satiation among ambiguous words. He has also contributed to two collaborative open-source e-books: Applied statistics: An introduction to statistical analysis, and Data + Design: A simple introduction to separating and visualizing information, both of which can be accessed through GitHub. Mr. Wetherill earned a bachelor’s degree in Psychology from John Carroll University
[OnlineEducation.com] On your blog, you describe your work as doing “fun data-related things” for an auto insurance company, but your official title is Data and Decision Science Analyst. Annual employment surveys conducted by the Institute of Advanced Analytics at North Carolina State University suggest the “decision analyst” job title is becoming more common, at least among its own graduates. What is a data and decision science analyst and how is the position different from that of a data analyst, if at all? What makes it fun?
[Mr. Wetherill] Well, to start, I want to be careful not to generalize too broadly when describing my position: it’s a bit of a unique team and I’m honestly just not sure how widely it differs from other positions with a similar title. Formally, the role of the position is to work with business units to solve emerging business problems, but in our day-to-day work, it’s really a blend of computer science, data analysis, and data science. At its core, the job is to help the business digest data in an informed and reasoned way that enables us to make evidence-based business decisions.
As data collection and storage capabilities have pretty consistently boomed in recent years, companies have more often than not struggled to stay ahead of that curve, and we are no exception. We’ve found that we’re taking in far more data than we have the capacity to digest, and so a big part of what I do is develop mechanisms to comb through those data, curate them in some way that our different business units are able to easily work with, and create dashboards, reports, and models to allow end users to interact with those data in ways that they haven’t previously been able.
Certainly, a portion of this role is simple data and business analytics (data in; monthly report out); but so much more is dedicated to finding new ways to interact with our data and to empower less or non-technical end users to do the same in a robust, validated, and scalable manner. What makes this so fun is that we are developing analytic capabilities that the company has never had before: we’re the first to tackle (and more often, to even define) any of these problems.
The most important takeaway I’ve gotten from this role is that it increasingly isn’t enough to just know SQL, or to just know statistics, or to just know software engineering. Rather, you need to be comfortable shepherding the data from start to finish: you need to be able to query it, analyze it, prepare it, and present it, developing out a reproducible and automated data extraction–analysis–reporting pipeline as you go while still keeping in mind the business as a whole and how your data will be consumed and utilized within that context.
[OnlineEducation.com] One could say most analytics experts analyze data and apply its insights to some end, but goals, methods, and skills seem to vary. What special skills and knowledge do data and decision analysts generally need? What about technical expertise? If analysts need to know how to program, what languages are most important in today’s job market?
[Mr. Wetherill] I’m always a little irked at the question what languages are best to learn. In my experience, every language has its time in the spotlight sooner or later: for instance, in the data world, SAS seems to be on the decline; R is the cool kid on the block, followed by Python; Julia will probably be the next hit.
Knowing how to write code is no longer an option in the field; however, programming languages come and go, and the specific ones that you learn are generally far less important than the skills and mindset that accompany them. Anyone entering the field of data science will benefit from the ability to decompose complex, thorny problems and to approach them in an almost modular, piecemeal way.
So much of the work that we do is a little fuzzy around the edges: we have a general sense of where we want to end up and what business problems we want to address, but no clear path to get there. We need to be able to identify what data are needed to be able to answer the problem and arrive at the end goal; what is the granularity of those data; where are those data stored, or do we even capture them with our current processes: what preprocessing, manipulation, and computed measures will we need to get the data into a usable form; how will the end users interact with and use the data; and any results that we publish?
Individually, none of these is a particularly difficult question to answer, but without decomposing the problem and stepping through each of its component parts, it suddenly becomes a much thornier thing to tackle. Almost without exception, the people who have most thrived in this position and the applicants who have been the most successful have been the ones with this mindset, even if they had never written a line of code before starting with us (although I certainly don’t recommend you wait until the middle of an interview to teach yourself how to write code).
[OnlineEducation.com] Data talent is in high demand across many different industries. Prior to assuming your current position, you used your data expertise in a research capacity—biomedical research specifically. How does data science advance our understanding of our health and our world, and then improve them?
[Mr. Wetherill] Prior to [becoming] a Data and Decision Science Analyst, I was in Virginia Tech’s Translational Biology, Medicine, and Health doctorate program—It’s a mouthful, I know—studying virology and computational biology. In the work we were doing, a big focus was on how genetic information could be leveraged to better understand how various [viruses] reproduce, infect others, and mutate to evade both our immune system and medical therapies.
Increasingly, we see science turning to bigger data and more sophisticated analyses to address problems of public health concern, from predicting how widely Ebola virus would spread in west Africa to identifying ways to make more effective and longer lasting flu vaccines. The fact—and the exciting part to me—is that we don’t have to rely purely on mechanistic studies at the lab bench to see advances that have a tangible impact on our lives and our health: instead of relying on some serendipitous result, we can utilize simulations and other analytical techniques to identify specific avenues of research that are more likely to produce compelling results than others.
Really, although the precise definition of translational science is a hotly debated term—seriously, Google it if you don’t believe me—what it belies to me is the understanding that no individual scientific discipline can afford to operate in a vacuum any longer. Just as in data science, you have to be comfortable with everything from data collection to its final presentation. In the biological and life sciences, it’s no different: that’s wonderful that you’re an expert in molecular virology, but you still will need an understanding of statistics, probably some experience in software engineering, and certainly an appreciation for public health and epidemiology won’t hurt anything. The important takeaway is that the work you do isn’t limited by your job description: the natural world is an interconnected, messy, and complicated place, and if you truly want a chance at understanding it, your work needs to extend beyond the walls of your office.
[OnlineEducation.com] It seems as though many data scientists and analytics professionals strive to solve problems, but what about those that affect their work? What are some of the challenges data experts face on the job and how do they overcome them?
[Mr. Wetherill] One of the issues that we run into daily is that there are almost never any clear-cut solutions to the problems that we’re tackling. Every user-facing product that we create has different enough requirements and diverse enough users that there are seldom any cookie cutter solutions where we can just throw in some arbitrary data and forget about the rest. As a function of that, it can be really easy to go down the rabbit hole and play with new toys all day—there’s no shortage of great analytic tools out there to choose from.
More challenging, though, is presenting data to non-technical audiences in a way that can be easily and accurately interpreted. Say, for instance, we’re delivering a dashboard that can raise a flag whenever the observed count or average value of customer payments along any of a few dozen different channels drops below the values we would expect for that day, indicating that one of our payment processing systems might be out. We might have some truly great predictive modeling going on behind the scenes, but that isn’t relevant to expose to our users, nor would it be helpful trying to explain the model or its underlying assumptions and limitations to them. Instead, we need to do our best to control, non-intrusively to the end user, for cases where those assumptions are violated and leave them with just a nice, clean summary of how our systems are performing.
That can be a tricky balance to strike, because many times these analyses do come with caveats: their results hold true in certain situations and not others; they are only generalizable to such an extent. And although being mindful of this might well lead to a better or more responsible interpretation of the data, that isn’t a luxury that you will always have and it can be easy to lose sight of the fact that you and your audience may have widely differing backgrounds.
To that end, I think the best thing any data analyst or scientist can do is to take the time to learn the business. Nothing ever ends with the data: they will always be a jumping off point for, and driver of, at the end of the day, business decisions. And approaching these problems with an understanding of how the data are actually used and required in the context of the business will only ever make the applications that you develop more powerful.
[OnlineEducation.com] Most career fields call for a certain set of personal skills and aptitudes. Can you describe some of the qualities and characteristics employers might look for in data science and analytics candidates? Or habits that help them succeed once in the work force?
[Mr. Wetherill] Of course, candidates in this field will need a working knowledge of the commonly-used tools—typically R, Python, Hadoop, the SQL flavor of your choice, and probably a couple others. But those are all relatively easy to learn: anyone can start a free instance on Amazon EC2, fork a few GitHub repositories, and start playing around. What, in my mind, is both harder to learn and more important is some amount of comfort in the inherent uncertainty around the work you’re doing: you’ll almost never go into a project knowing everything you need to or having all the necessary tools already at your disposal.
You have to be creative and figure out workable, but usually not perfect solutions to problems as they come up. Many times this means using tools that no one else in the company has ever used or heard of; being everything from helpdesk to system administrator to software engineer; and building buy-in for the solutions you develop.
At some level, you have to be comfortable playing the contrarian and going up against the established dogma of your company: sometimes you will have institutional buy-in, but many others you’ll be the one spearheading things. If you always wait for the approvals or are reluctant to try anything new because it might cause an upset, you’ll never add value, or at least not nearly as much as you could be adding. At the end of it all, you’re coming on board to solve problems within the company that no one else is quite sure how to: take advantage of that, because if you aren’t constantly pushing yourself to the point where you’re forced to admit, “I don’t know,” you likely aren’t doing a very good job. The candidates whom I have seen have the most success have consistently been not the ones with the longest laundry list of programming languages on their resumes, but the ones who can sit down in front of a completely novel problem, admit they don’t know what they’re doing or what the answer is, and then in the same breath start hacking away at it until they’ve figured it out.