I am starting a new job in November. This is not a prank like last time. But before the grand reveal of where, first I’ll subject you to a lengthy blog post about my thoughts about the how and why. Hopefully this provides an additional perspective to the excellent posts by Lana Yarosh and Jason Yip on their experiences on the computer/information science academic job market. But those of you who know the rhythms of the academic job market are already realizing that (spoiler alert), I’m not starting a tenure-track faculty role. Instead, I’m going to spend the next few years being a data scientist. But I definitely promise not to be this guy:
This blog post is a mixture of how to get into data science as well as how to leave academia for industry. I want to be clear that this is not my farewell letter to academia, but rather advice to other PhDs—especially in the social sciences—who are considering going into industrial data science. This is the amalgamation of notes I’ve kept, thoughts I’ve restrained myself from tweeting, and lessons from innumerable pep-talks fromclose friends and family who have counseled me through this process. I hope my experience can clarify some of the fuzzy contours of a process that academia leaves you completely unprepared for. But fair warning, this is still a really, really, really long treatise. I’ve tried to make up for that with a liberal application of GIFs.
This story about a boy leaving a plum post-doc at a great lab on good terms for a non-tenure-track data science position is broken into four acts. The first, “The Big Decision”, is about the choice to pursue opportunities outside of the safe confines of academia. The second, “The Search”, is about my experience starting the search outside the Ivory Tower’s ladder. The third, “The Recruitment”, touches on some of the frustrations and anxieties I confronted through the process. And the final act, “The Little Decision”, is about my process of negotiating and choosing an offer.
At the outset, let me admit that I’m writing from a position of relative privilege as a network and computational social scientist who can pass as a “sexy” data scientist rather than, say, students of literature or biology, who will not granted the same assumptions about the merit of their interests and applicability of their training. That said, if you spend 4+ years in graduate school without ever taking classes that demand general programming and/or data analysis skills, I unapologetically believe that your very real illiteracy has held you back from your potential as a scholar and citizen. That’s tough love, but as someone who only started to learn programming via Python in the fourth year of my PhD, it can be remedied — often more quickly and easily than you’d believe. The rest of the world thinks this stuff is some arcane dark art, but I guarantee you’ll surprise yourself at how quickly you’ll be reading developer documentation, be able to ask and answer technical questions on StackOverflow, and ultimately be able to “pass” as the imposter that almost every other data scientist dabbling in these magicks feels too.
Stage 1: The Big Decision
The anchor and the list
Why am I interested in data science if I want to end up in academia? In my case, I am anchored in time and space as my partner still has two years left in her degree program even at the end of my two-year post-doc contract. We both plan on moving when she is done, so it doesn’t make sense for me to find a tenure-track job here. Although graduating up to a soft-money, non-tenure-track “research assistant professor” role was possible to ride out the next two years, I wanted to use this second two-year window to branch and try something new. In particular, I had never done an internship while in graduate school, nor had I had a “real” job in between undergraduate and graduate school, so I was curious about what life outside the asylum was like. 12 months into my post-doc, I began to scratch the itch of persistent recruiters and to think about what life in an industry research lab, corporate data science group, or start-up setting would be like. And while it is not Bismark, North Dakota, it is nevertheless the case that the hub of the data science universe is not in Boston, Massachusetts (SF and NYC are). So the requirement to stay here altered the calculus for the kinds of jobs I could consider. But these formed the outlines of some of the things that I put on a pro-con list that I started (but should have actually written down) before the search. I think writing down the pros and cons before starting the search could be important, both in terms of documenting what originally motivated you as well as capturing how your thinking evolved. So write a pro-con list before starting.
Skilling up isn’t selling out
Going into this process, the whispers that followed my academic colleagues who had preceded me in going to industry rang loudly in my ears. “He was so talented.” “She had so many best papers.” You can be forgiven if you think these are eulogies for deceased colleagues. Academia has a strange path dependency where venturing off the farm means you can never return. Well not never, but rarely. But my time in grad school and as a post-doc was marked by spending lots of time being on the outside looking in at other people’s cool data. Instead of devising ever-more-clever ways to move already-collected data from someone else’s machine to my own machine, I decided to look to industry as an opportunity to “skill up” and develop technical competencies. I learn best by doing, but there is little reward for “doing” machine learning, map-reduce, or graph databases in the social sciences if they are not in service of a research question. Others have pointed out that going into industry also lets you work on products that are essential infrastructure in the information economy, gives you the flexibility to move between product areas, or provides greater work-life balance. But for me, I very much hope to bring a really interesting battery of tools, skills, and practices back into the academy in a few years’ time.
Stage 2: The Search
Strength of All The Ties
Mark Granovetter’s famous “strength of weak ties” theory was formulated specifically in the context of job searching: strong ties share the same information but weaker ties provide new information about opportunities. I naively hoped that spamming people I trusted would lead them to return an informal e-mail with the subject “Looking for job” from out of the blue. Some of these initial messages received enthusiastic responses within hours, others languished for weeks, and some were likely forgotten. But the point of casting these stones is you neither know how far their ripples will propagate, nor what surprises they might dislodge. Of course be judicious in to whom and how you reach out, but also discard the academic job market logic of thinking that the only jobs available are the ones with publicly-posted calls. In fact, spending your time applying through sites like Monster.com won’t generate the leads you need. Oh, and come to conferences like CSCW and ICWSM where lots of amazing social and computer scientists from academia and industry get together!
Recruiters are important, but they’re not your friend
As I’m sure many others have experienced, I had recruiters contacting me seemingly within minutes of making my first GitHib commit. In addition to spamming colleagues, I also entertained some of the opportunities recruiters sent along. I admit that being a recruiter must be terribly difficult job of trying to match insatiable demand for unicorns with the fallibility of actual human beings. Recruiters play an important role, not only in exposing you to a wider set of possibilities than your network might offer but also in steeling yourself for the realities of the search ahead. While recruiters are super enthusiastic, they aren’t a new friend trying to hook you up with a job as a favor: they get paid when you take an offer and they will pressure you to interview repeatedly and take any offer that comes along. This is often hard for academics who have come up through a system that demands deference to others’ agendas under the assumption they have your interests at heart as future advocates. Your days of delayed gratification are behind you. You will need to learn to assert yourself so their interests do not override your own and protect your time from distractions like peripherally-related opportunities. These skills, politely practiced on recruiters, will become even more invaluable when you enter the negotiation stage later on.
Growing into the role
By the virtue of even being awarded a PhD, you have accomplished something that makes you uniquely expert in the world. You’re going to be hired because your background includes some combination of “harder” technical skills in terms of using tools to perform analyses and “softer” integrative skills in terms of asking the right questions of complex data. But your interests and skills are surely extremely narrow and will need to expand considerably to fit into the nebulous boundaries of the “data scientist” who’s expected to have some combination of hacking, stats, and expertise. If you’re a social scientist like me, you’ve probably never taken machine learning and won’t know what k-means clustering means or why you should prune a decision tree. If you’re a computer scientist and never taken a statistics course, you may not know how to interpret effects from a regression model or how to design a behavioral experiment.
The intuitions behind these aren’t particularly hard, but you will have to study and practice using these to be conversant in them during interviews. There are no shortage of blog posts on how to break into data science, but I would especially recommend Trey Causey’s. I also strongly recommend Doing Data Science and Data Science for Business as two exemplary introductory books that don’t get lost in the weeds of formalized math or conceptual abstraction. Again, resist the academic urge to understand why they work from first principles, but focus instead on how you can use them to tackle problems and develop heuristics for the kinds of problems they won’t work on. Part of growing into the data scientist box will involve quickly teaching yourself to do these analyses beyond what the introductary documentation and manuals say. And you should seek out interesting data sets in “pop culture” (think entertainment, sports, geography, etc.), document these analyses on a GitHub repo, participate in Kaggle competitions, or implement a website with some interactive features. After you tweet (surely you already tweet!), go post your analysis to /r/DataIsBeautiful, HackerNews, or DataTau.
Stage 3: The Recruitment
Don’t do coding interviews
I am not a computer scientist nor am I a software engineer, so the experience of the “coding interview” was foreign to me. If you’re not familiar with the exercise, after the initial phone screens with recruiters or managers, you’re put on the phone or in a room with an engineer and asked to program some basic function to “make sure you know how to code” or “see how you think through problems.” You will never ever actually need to implement a Fibonacci number generator or sorting algorithm in your actual job. But in a coding interview, you will need to be able to demonstrate you can implement a workable version from scratch within minutes in a closed-book environment while a stranger judges you. This is a terrible approach to recruitment that selects on personality types rather than technical competence. If I had to do it all over again, I would simply refuse to do them — and make that clear ahead of time. If there are concerns about your ability to code, have them do a code review on the analyses you’ve posted to a GitHub repo or Kaggle competition. If they want to see how you think through a data analysis problem, walk through a case study. If some domain expertise needs to be demonstrated, arrange for a “take home” assignment to return after 24 hours. After 4+ years in a PhD program, you’ve earned the privilege to be treated better than the humiliation exercises 20-year old computer science majors are subjected to for software engineering internships.
Ignore the hangups of academia
I’ve alluded to this above, but it can’t be re-inforced enough: industry is not academia. If you’ve made the choice to go into industry, you need to re-calibrate your dystopian comic strip mindset from “PhD” and prepare yourself for “Dilbert”. You’re entering into a new kind of relationship where success is measured by purusing a very applied research program that will demand flexibility, scalability, and attention to details. You really need to be fundamentally honest with yourself about this: industry will not be academia by another means. Your manager will provide a different kind of mentorship that is likely both more and less hands-on than you’re used to. Getting things done fast matters more than worrying about novelty. The pace of work revolves around fiscal quarters rather than academic semesters. You will be compensated and promoted for implementing ideas that are “good” because they make or save money. On the flip side, you don’t need to commit to toiling away somewhere for 4+ years. Your work will be used by thousands or millions of people. There shouldn’t be a shortage of extremely motivated people who have amazing skills and ideas. There will be many fringe benefits that you won’t need “seniority” to take advantage of. You could gross more in your first year as an industrial data scientist than you did in all your years in graduate school, combined. You’re right to think many of these are deeply unfair, but don’t forgo the privileges that an industry role entitles you to out of deference to academic norms you’ve internalized but no longer apply. I admit to having many, if not all, of these hangups, but you need to confront and tame them lest they lead you to self-sabotage.
Serenity in the face of chaos
I’m probably not alone in having the tendency to conjure detailed future scenarios far before the prerequisite actions have remotely come to pass: “Wouldn’t it be great to live in X”, “I can’t wait to work on Y”, “Z is such a brilliant person.” But while enthusiastically pursuing several leads, things well outside of your control will shut them down. Higher-ups might surprise the rest of the group with re-organizations, the politics of hiring decisions might surprise a potential manager, etc. These aren’t a reflection on your performance in an interview, but the disappointment you’ll feel from wanting something for which you’re both qualified and recognized for nevertheless being “taken away” is still very real. So in addition to keeping the fantasies on ice, never stop pursuing other opportunities during the search, no matter how “sure” something feels. You’ll need a backup plan in the worst case and leverage in the best case, so keep other recruiting efforts going even after there are offers on the table.
Stage 4: The Little Decision
Negotiating offers: thar be bigints here
Make sure to take advantage of websites like Glassdoor and PayScale to get a sense for what others at the company or in similar roles earn. Does a median starting salary of $120k seem like an outrageous sum of money when you’d really be happy with just $90k when all your assistant professor friends at Big State U are just making $60k? It turns out your potential employer would also be happy to pay you less than the market rate too! But that starting salary you negotiate becomes the base on which all your raises, bonuses, and future salary negotiations will be based. I made this “mistake”, but apparently it’s widely-acknowledged to never disclose your current salary or how much you expect to make. Remember, this is a business negotiation where they are hiring you to make them a lot of money, some (small) fraction of which they’ll return to you as compensation. If the transactional logic of shaking the most money out of a for-profit corporation who will unflinchingly lay you off in a heartbeat makes you uncomfortable, there are an increasing number of exciting data science opportunities in government and non-profit spaces too. But after an offer has been made, the worst thing an employer can say to your salary request for what seems like a really big number is “No”. Really — it doesn’t go on your permanent record or anything. Having other offers and using them as leverage in a negotiation is not dishonest, especially if you’re upfront throughout the process about looking at other roles (as you should be doing). There are lots of other excellent resources out there on negotiating offers, and never ever accept the first one given to you!
Equity and other fringe benefits
If you’re going into a start-up environment, issues around equity loom larger than salary, but it’s a complicated game. Remember again, if you’re a data scientist with a PhD, you’re worth more than an entry-level engineer and you should be asking along the lines of what other mid or senior-level engineers get: something like 10 and 50 “points” (0.1% – 0.5% of equity), which may vary substantially depending on the size and stage of the company. But always remember that your stake is likely to get diluted down as the company grows and the employee equity pool is usually last in line cash out after the other investors and founders. A fraction of a percent doesn’t sound like a lot, but when $100 million exits aren’t rare, software developers aren’t buying big houses or starting non-profits by dutifully saving up their salaries. You should use sites like AngelList and Wealthfront to get a sense for what the going rate in an industry, role, etc. is. How you choose to balance the trade-off between more salary and more equity comes down to your tolerance for risk and your faith in the founders’ vision and other investors’ patience. And don’t forget to go back to that pro-con list at the start. If there are other fringe things you would like to keep doing like attending/submitting to relevant conferences (whether academic-focused like KDD, ICWSM, and CSCW, or industry-focused like Strata, UseR, PyCon), having time to consult for a non-profit, teaching at a local college, etc., negotiation is the best time to make those expectations clear.
The Reprisal of the Pro and Con list
Now you have some offers on the table and the pro-con list you wrote up before starting recruitment. Like me, your thinking probably changed a lot going through the process. You will have gone through many reality-distortion fields, drank a good amount of kool-aid, and probably saw some sausage-making throughout the process. This will lead to you coming up with cons you hope to never have to confront again and pros you never dreamed of. Like any good little Bayesian, you should use this new information to update your prior beliefs to come to a better decision. In my case, my priorities started off with wanting privileged access to data, working in an industrial/corporate setting, and developing new analytical skills. After the process and talking with friends and colleagues, I realized that many of these still applied, but I had overlooked how important being able to engage in the academic conversations was to me, especially if I wanted to stay competitive on the academic market for the medium term. But I can’t stress how important it is to have some sort of objective list of criteria that you write down or store in other people’s minds so that you can ground yourself on these values during a very exciting but disorienting search.
The Grand Reveal
With all of that bluster out of the way, I’m very excited to announce that I will be joining the Harvard Business School as a research associate in November. This is not a tenure-track job but I will become one of the first data scientists on their HBX platform, which is their unique MOOC initiative focused on business education. I will be doing a mixture of both platform and academic research to understand the factors that contribute to learning and success in these contexts using both observational and experimental data.
I realize that the bloom is very much off the flower after some very public failures and very justified criticism in the MOOC space. But I also think there are important niches these can fill, even if they can’t and shouldn’t supplant other modes of education. I believe that HBX has identified a really interesting niche and strategy as well as made a big commitment in people and resources , so I’m excited to dig into where and how these approaches are succeeding or faltering. I’ve also been told Harvard employs a number of smart people and has something of a soapbox from which to publicize information. Seriously, I’m thrilled to be at the intersection of traditional business strategy and education, data-driven decision making, and collaborating with brilliant HBS faculty like Bharat Anand and HarvardX colleagues like Justin Reich.
“But Brian, you just spent a billion words talking a big game about industrial data science — what gives?” You’re right, I still don’t have any full-time experience working in industrial data science. You’re also right that I’m still in academia. Going back to my pro and con list — which is going to be different for every person — this role gave me an ideal mix between academia and industry: it is focused on research on learning at scale but it’s working on a product with very real customers and competitors. If I was looking for a longer-term career change, was more willing to relocate, had different skills and interests, or wasn’t solving a two-body problem, I would have made very different decisions.
Returning to the question of why write up something I’m not actually doing, I wanted to share my perspective of “how I almost went into industry” after four months of interviews with nearly a dozen different companies. Very little in academia prepares you to go on a market like this, but getting social scientists into data science roles is vital to ensure the right questions are being asked and the best inferences are being made from many types of data. The recruitment process will be upsetting and disorienting and the episodes from above may or may not resonate among those actually in industry. And I hope others will share their stories. But I wanted to especially target those of you in academia and are considering making the jump: you’re not alone and you should go for it.
Thanks for making it this far and feel free to get in touch if you have any questions. And many thanks to Alan, Lauren, Michael, Patti, Trey, and Ricarose for super valuable feedback on earlier versions of this post!