So you've been offered that sweet data science/machine learning gig you've been working so hard for. What now? Before you figure out what skills you need to freshen up on, or the most optimal driving path to work to avoid traffic patterns, you need to make sure this new role is a right fit and that you'll be happy working there.
If you spend time on /r/datascience, you'll quickly notice that a hot topic is junior data scientists getting offered (or accepting) a data science role and figuring out that the role is in fact not really a data science role; it's either more of a data analysis or data engineering role, and the junior data scientist finds themselves either writing SQL queries and building dashboards all day, or architecting a data lake and building data pipelines. From my experience, this usually happens because the employer heard about data science and how it's transforming businesses and they hopped on the AI train. The problem is
1) they don't have the in-house experience or the data architecture to support a data science team and 2) they don't know how to hire data scientists.
The combination of the two can produce some scary results, as what generally happens is they find some fancy DS/ML job descriptions online and instead of attracting senior data scientists (the ones who may actually be up for the challenge of a nascent data environment*, but usually prefer established teams), they attract junior data scientists (who don't have the experience to discern whether this is a fitting role or not). The junior data scientist mentions her interest in neural networks and drops the term "heteroskedasticity" and (BOOM) she has the job. And now she's expected to produce results from a data environment that isn't quite data-science-ready.
Unfortunately, there isn't much literature out there to help junior data scientists distinguish ill-fitting roles from roles that will help them grow. I'm hoping this post help make this distinction.
The following are some characteristics you should run away from if you're looking to land a data science/machine learning role as a junior job seeker. Most of the following may be attractive to senior data scientists who may be up for the challenge, but may still apply to them as well.
1) No data architecture¶
If the firm doesn't have an established data environment, then guess what? Nine times out of ten, you'll be responsible for building this, instead of working on actual data science projects. This is one thing I see over and over. RUN.
2) No defined objectives¶
If, during the interview, they say something like "we don't really have actual projects yet, but we just KNOW that there's value in our data." This is an indicator that they haven't thought much about data science projects and thus, don't really have a plan for you to provide value. RUN.
3) No data engineers or machine learning engineers¶
Senior data scientists may be up for this challenge, but expecting a junior data scientist to be able to perform well in this environment is unrealistic. Having no data engineers or no machine learning engineers means you'll be building data pipelines and focusing on writing SQL queries or moving data around. And when you do finish a data science project, since the firm doesn't have any machine learning engineers, you'll be expected to put this project into production, which can be overwhelming unless you have prior software engineering experience. RUN.
4) No comprehensive vision for the team¶
This one can probably be lumped in with #2, but it goes further than merely not having defined objectives. This one speaks more to the data science manager. I've been lucky to have great managers in my career, ones that defined a comprehensive vision for the team in terms of projects and personal growth, both of which are important. If the hiring manager doesn't have a clear vision for the team, then, junior data scientists, beware. You can spot this when the hiring manager says something along the lines of "data science is so new and we're all just figuring it out" or "we're still trying to prove the value of data science".
The other side of this is growth. You want to make sure your manager cares about your long-term growth. Ask if they have one-on-ones with their team. Ask about their long-term team goals. Ask about long-term projects. Ask about a mentorship program. Usually the answers to these questions will give you an idea for the vision of the team and whether or not you'll grow to your potential in this role. Your long-term growth is just as important as what you work on in your day-to-day. If the hiring manager doesn't have a clear, comprehensive vision for the team, RUN.
5) Focus on tools over problems¶
This is a tricky one, because everyone wants to use the fancy tools and solve the biggest problems, so when a hiring manager mentions these, we're instantly attracted. But if the team and manager focus on tools over problems, then this may signal that they are more concerned with building cool stuff instead of providing practical value. In my experience, managers that focus on problems over tools will ensure that data science projects are providing value to business owners. Let me go ahead and break something to you early on: no one outside of your team cares what tools you're using; they just want the job done. And a focus on tools over problems is a sign that the team isn't providing value to the business. Usually, if you're working on a mature data science team, then you're likely already using the cool new tools to solve business problems. I don't think that this alone is a deal breaker, but if it's combined with any of the above, then RUN.
6) You're the first DS hire (with no plans to hire a senior)¶
When I graduated and got my first job, I worked directly with a senior software engineer. Every single line of code I wrote crossed his eyes and although this was frustrating at times, every minute was a learning experience. I felt like I was maturing in dog years. Every month I learned more than I did in the prior five months. I was a sponge. If I have one piece of advice to anyone starting their career, it would be to find a job where there are senior level employees that you can learn from. This should be one of your main priorities as a budding data scientist and if the hiring team doesn't have plans to hire at least one senior data scientist then RUN.
1) Find your niche¶
Find a role where you can carve out an area of expertise for yourself. For example, if you want to specialize in natural language processing (NLP), then instead of working where everyone is an expert at NLP, consider finding a role where no one is an expert—you'll stand out a lot more, and probably provide more value. But be sure that they have the data and business problems to support the area you want to specialize in.
2) Interview them¶
This is where a lot of young candidates go wrong. They don't ask enough questions. I see it all the time. Junior candidates just don't know that they're supposed to ask questions, but this is one of the most important steps in finding the right fit. Ask about the team. Ask about the company. Ask about the possibilities to be mentored or mentor others (equally important). Ask about inter-department collaboration. Ask about the types of data science projects you'll be working on. Ask about the successes of the team. Ask about the failures (diplomatically, of course). Ask about the vision of the team. Anything you're interested in, ask about. An interview isn't just a time where they ask you questions; it's a time for both parties to ask discerning questions to make sure both parties feel it's a good fit.
3) Ask about the role specifically¶
This section is such an important sub-section of the previous one, that I decided to dedicate an entire section to it. It's the most important thing I focus on when I go on interviews. More than work-life balance, or what time everyone arrives at the office, or group dynamics; more than all these things, I care about the role itself. My biggest goal when interviewing for a new job is to determine what I'll be working on and sharpening the skills I care about most. Since this is the most important thing to me, I make sure to leave no stone unturned. Failing to ask these questions is how most data scientists get into a situation where the job they're doing isn't the job they thought they'd be doing; that is, they're playing the part of data analyst or data engineer instead of data scientist or machine learning engineer.
Don't accept the wrong role—ask questions!
Come work with me!¶
On a final note, I'd like to extend an invitation to you. We have a few positions opening in the coming year:
• Two Data Scientist roles
• One Machine Learning Engineering role
• Two internships
We're doing really cool work and I can promise that you won't become an Excel or SQL jockey who just runs reports or builds dashboards. You'll be doing actual data science/machine learning work. Feel free to email me if you have any questions! (tmthyjames at gmail dot com).
* By data environment, I mean an environment that is ripe for a data scientist to come in and begin leveraging the data. For example, if all your data is stored in monthly updated Excel files that are scattered throughout multiple servers, then a data scientist would have a hard time centralizing the data to begin an analysis. On the other hand, if all the data is located in an easily accesible, centralized database, then the data scientist could easily collect the data to begin an analysis.