Training

Strategies to Hire the Best Data Team for your Organization

Boston

Demand for Data Skills Continues to Rise

Over the past 10 years we’ve seen an explosion in the generation of data, coupled with the rise of the data scientist to help make sense of this ever-expanding corpus of information. Technological advancements in AI, machine learning and computational power have increased the impact that these scientists and analysts can have on organizations and the speed with which they can achieve results. Companies are also realizing the importance of data as a strategic asset and the profitability of combining internal and operational data with vast sources of third-party data. Since these benefits have been thoroughly demonstrated across a variety of industries, implementation of big data and predictive analytics technologies is moving beyond the early adopter phase of the technology lifecycle.

According to LinkedIn, data science roles grew more than 650% since 2012. Even at this rate, the supply of candidates doesn’t match the demand for jobs, with hundreds of companies searching for candidates with data-related experience and qualifications. In LinkedIn’s 2018 Workforce Report, there was a national shortage of 151,717 people with data science skills. Whereas previously data science was the domain of the tech industry, demand for these roles has expanded to virtually every sector.

Data Science Unicorns

One of the first job postings for a data scientist appeared in 2008:

“Be challenged at LinkedIn. We’re looking for superb analytical minds of all levels to expand our small team that will build some of the most innovative products at LinkedIn.

No specific technical skills are required (we’ll help you learn SQL, Python, and R). You should be extremely intelligent, have quantitative background, and be able to learn quickly and work independently. This is the perfect job for someone who’s really smart, driven, and extremely skilled at creatively solving problems. You’ll learn statistics, data mining, programming, and product design, but you’ve gotta start with what we can’t teach –intellectual sharpness and creativity.”

Over the past decade since the role emerged on the data landscape, the expectations and requirements for this role have ballooned out of control. A typical current job description will list expertise with data transformation, machine learning, algorithm development, geospatial transformation, web development, NoSQL and big data architecture as requirements for a junior data scientist. Some job descriptions list unrealistic educational requirements, such as PhDs in AI or neurocomputing, fields still in their infancy.

According to a report from Burning Glass and IBM, which analyzed over 130 million unique job postings across data-related jobs, 39% of data science or advanced analyst postings required an advanced degree.  This sizable requirement is 3-4x higher than other data-related roles.

In the startup world, “unicorn” is a term used to describe an organization that grows quickly to achieve a one billion dollar or greater valuation. The rare existence of these organizations is implied in the name of the mythical creature. This term can also be applied to the data science community.  You’ll often hear hiring managers bemoan the fact that they cannot find a data specialist who can do everything: mine data, write complicated data processing procedures, understand how to augment structured data with unstructured sources, recognize system requirements, be diligent about quality and testing, generate reports, create data visualizations and communicate all relevant information in concise business language to senior management.

Rather than spend countless recruiting cycles searching for these data science unicorns, it’s more realistic and practical to breakdown the definition of a unicorn into its composite parts to find specialists that are skilled in different layers of the data stack.

Understanding Your Hiring Needs & Various Data Roles

Before designing the actual interview process, it’s important to understand the goals of your ideal data team and the roles necessary to achieve those results.

The tech community tends to interview based on very quantitative factors and focuses on the technical aptitude of a candidate as it relates to languages and platforms. However, the actual desired attributes of a data scientist or analyst role are closer to that original LinkedIn data science job description from 2008, which focuses on “intellectual sharpness and creativity”. TCB Analytics interviewed a community of 200 data professionals and managers to find out what skills really mattered when building out a data team. The top response was problem solving and curiosity, followed by communication skills.  Coding, statistics and knowledge of algorithms were all at the very bottom.

As your organization looks to expand its data team, it’s important to align those desired attributes with the interview process for your candidates.

Data teams typically incorporate the following roles:

Data architect: Experts that are good at extracting data, setting up cloud-based storage and monitoring that infrastructure for scale and cost. These are the people that design the foundational layer of your data products to ensure your internal and external customers will benefit from high availability and that the solutions will scale efficiently. Relevant technologies and methodologies in this space often include ETL, cloud-based databases like Amazon Web Services (AWS), database design and NoSQL.

Data engineers: Employees that are skilled at moving data from your storage layer into aggregates and reference tables. They know enough about data architecture to communicate backend requirements, but their main job and focus is in making sure the data is prepared for analysis. They should be able to find complementary data sources, test data integrity and organize data for your analysts. For this role, ideally the candidate would have experience with ETL, knowledge of scripting languages such as Python and/or Perl and they should be masterful at manipulating large data files from a command line.

Data analysts: People who are naturally curious and inherently love to solve problems. They should be proficient in tools like Tableau for data visualization and R for statistical programming, so they can cut data, quickly generate reports and create dashboards for consumption in a web-based interface.

Creating a collaborative environment that encourages exploration is important to retain these analysts and provides team-based coaching and career advancement. Informal peer review processes and collaborative platforms like Slack are especially helpful in stimulating curiosity and friendly competition. By monitoring these platforms and interactions, you’ll have a strong sense of the skills on your team and can identify the personalities who are best suited to present reports to management.

Data scientists: Staff that write predictive algorithms and advanced statistical models to extrapolate insights from data or anticipate future business trends. While the demand for these positions is rising dramatically, we advise caution in rushing to hire a full-time data scientist unless there is a real need, defined goals and a mature data environment, since these resources are very expensive and often under-utilized. Unless data is a core component of your business or you are consistently using analytics to make business decisions internally, it’s often more cost-effective to outsource this function until your data environment is mature enough to support a full-time data science staff.

While your candidates should be specialists in their designated areas, they should also know enough about the other roles on the team to be mindful of the challenges and opportunities that exist across the business. That mutual respect is vital to having a world-class data team that self-teaches, challenges one another and scales efficiently.

Interviewing Best Practices

Given the potential benefits of data science and analytics to any organization and the limited talent pool of skilled resources, it’s critical to examine your interviewing process and determine whether it’s truly designed to source a top-tier data team.

After determining the optimal roles necessary to create an effective data and analytics team for your organization, it’s important to design an interview and assessment process that is best suited for your ideal candidates. Given the recent demand for data analytics and data scientist skills, it has become an increasingly daunting task for managers to adequately test and qualify candidates.

Avoid hyper-focused interviews. As previously mentioned, there are a variety of team members that can tackle data problems, and there is overlap that exists between these roles. If the interview questions are too focused on a narrow skill set, it’s easy to miss out on a candidate that could fill another role on the data team. It’s helpful to provide a test that can be given to candidates regardless of education, with various expected response types based on experience level and area of expertise. Adding arbitrary education requirements may unintentionally limit the diversity of your hiring pool.

Look beyond the algorithms and analysis. A solid functional understanding of data practices and principles is necessary for any member of the data team, but other less quantifiable skills are just as important. Curiosity, communication skills, and overall team fit are not evaluated by applying an algorithm to a statistical problem. Task the candidate with presenting their results to your team. This is extremely important and helps weed out candidates who put together an impressive written report but fail to effectively communicate their results. It may not be necessary for the candidate to translate the results into business-relevant benefits, but they should be able to convey their results and their process in reaching those results. Not only does this step help your organization gauge the communication skills of a candidate, but it allows you to evaluate cultural fit. 

Don’t create unnecessarily stressful environments. Don’t whiteboard test candidates in real-time. This practice adds unnecessary stress to an environment that’s inherently high stress and not particularly relevant to real-world situations. In fact, if an emergency environment is your typical process for analyzing data, that doesn’t reflect highly on your organization. Memorization of algorithms, a necessity for whiteboard tests, is less important than an understanding of data, analytics and statistical principles.

While it’s helpful for the candidate to meet with existing team members to assess cultural fit, avoid a gauntlet of all-day interviews. You can artificially narrow your candidate pool with the requirement of an all-day in-person interview, which isn’t feasible for some candidates.

It’s also important to remember that your interviewing process is one of the first impressions that a candidate will have of your organization. With the booming demand for data-related roles, your candidate will most likely have several job opportunities from which to select.

Stop looking for a unicorn. Don’t expect to hire one person that excels at everything from data munging, engineering, analysis, visualization, application development and executive presentations on findings from data analysis. The person responsible for managing your data team should take your test and determine the skill levels and responses appropriate for your open positions. Look for a well-rounded team that provides unique perspectives.

Interview Toolkit

Our team at TCB Analytics has interviewed hundreds of individuals with various backgrounds over the past decade and we identified the need for a more efficient way of quantifying technical and cultural fit. We created a simple data exercise that can be completed by any candidate for a data-related role, regardless of experience or education levels. This test has been administered to dozens of candidates across a variety of industries.

Dataset

First, you’ll need a dataset for the test. If you’re comfortable sharing internal or proprietary data with the candidate for analysis, that approach can help screen candidates if domain expertise is an important consideration for the position. Otherwise, we recommend a dataset with terms that are easily understood by your interviewing population. There are numerous repositories of open datasets online that are available for download. Two examples include the Registry of Open Data on AWS and Data.gov.

You’re looking for candidates with expertise in data processing languages like R or Python, so your dataset will ideally be too large to process in Excel. However, the dataset should be small enough to process on a single laptop, especially if the test is conducted remotely on personal equipment. If the candidate is very hesitant to use or try R or Python, that should signal a red flag, since those languages are the two standards in commercial data analysis for sizeable datasets.

For the purposes of this example, we’ve chosen a dataset that involves beer, with approximately 1.5 million beer reviews. (Unfortunately, since this test was created, BeerAdvocate has pulled the original dataset. If you’re looking for a similar dataset, check out a similar example from Epicurious.)

Questions

We recommend the following questions, tailored to your chosen dataset:

  1. Which brewery produces the strongest beers by ABV%?
  2. If you had to pick 3 beers to recommend using only this data, which would you pick?
  3. Which of the factors (aroma, taste, appearance, palette) are most important in determining the overall quality of a beer?
  4. Lastly, if I typically enjoy a beer due to its aroma and appearance, which beer style should I try?

At a minimum, this test requires that the candidates perform the following functions:

  • Read the data into R or Python
  • Summarize and explore the data
  • Aggregate and manipulate the data (simple means, thresholds, grouping and subsetting)
  • Visualize and communicate results (extra points for presenting the findings and code in a well-documented RMarkdown or Jupyter Notebook)

However, some of the questions and concepts can be non-trivial and this becomes evident when giving the test to more experienced candidates. For example, we’ve had wildly varied responses to this question: “If you had to pick 3 beers to recommend using only this data, which would you pick?” More experienced candidates for data science roles have developed full-blown recommendation systems or used Principal Components Analysis. More junior analysts have used simple averaging and ranking to recommend a beer.

Another good example arises when answering questions 2-4. These questions require a certain amount of data for the findings to be considered valid. Some of the beers only have 1 or 2 reviews, so it would make sense to determine a cutoff before including those beers in the analysis. The candidate should justify how they determined the cutoff, but the responses will vary based on the approach.

Administration

To provide enough time to adequately explore and analyze the data, we recommend giving the candidate a week to perform the exercise. However, we do not expect a candidate to spend more than a few hours of their time on this test. Instruct the candidate to document their code, visualize their answers and prepare a presentation of their results. By allowing the candidate to take the test remotely versus a short in-person whiteboard session, you’re more closely mirroring the working conditions of an actual job environment. You should also schedule time for the candidate to present their findings, either in-person or virtually, depending on the requirements of the position.

Evaluating Results

You’re looking for the candidate to use the data to answer the questions and then provide some evidence to back up their answers. There is no single right answer to these questions, but you’ll want to determine the level of depth you’d expect for each role based on the job requirements and the candidate’s background.

Review the candidate’s coding and writing skills in their written results. This should reflect their ability to understand a question, use the right data to answer the question and document their results to promote easy collaboration.

Gauge their ability to communicate their findings. Have them present these findings not only to the technical team, but also to executives if you expect this candidate to be presenting complicated results to business stakeholders. They should tailor their presentation accordingly.

Be mindful of their ability to take feedback and constructive criticism from your team. Some candidates may be more defensive about their approach and not respond well to questions. This is a clear sign that someone may be difficult to work with in a team-based environment.