Building a Data Science team

 
data science image part 2 
1- The Data team
2- When do you need data science
3- Qualifications and skills
4- Interview and onboarding
5- Management and type of the team 

 

 
1- When do we need a Data Science team?

Don’t worry about starting to use machine learning from day 1 when in a startup, first we need to setup the basic infrastructure:

  1. Database
  2. Software
  3. Servers

To do all the above, it does not require a data scientist, but rather a database admin and/or a software engineer, then recruit data scientists.
While in mid-size companies, we can bring the data scientist team from day 1.

2- Qualification and skills by role and responsibility

Data Engineer

  1. Gets the right storage to store all the data.
  2. Decides how to build and manage databases, knows how to use SQL, NoSQL, Hadoop.
  3. Willing to find answers on their own.
  4. Solely responsible for the infrastructure, hardware selection, security constraints.
  5. They need to know how data science works and how data is going to be pulled.
  6. Works well under pressure.

Data Scientist

  1. Runs experiments.
  2. Pulls and cleans data.
  3. Communicate information and results.
  4. They might need to do a little a bit more data engineering as well.
  5. Qualifications in statistics.
  6. They need to know about machine learning.
  7. Usually knows how to use R and Python
  8. They know how to do visualization like D3JS.
  9. They know how to retrieve data from the database.
  10. Primarily focused on statistics.
  11. They may be software engineers and they picked up a course on data science.
  12. Comfortable to acknowledge what they know and what they don't know.

Data Science Manager

  1. Recruit, support and motivate the team.
  2. Set objectives and stay transparent even if the experiment failed.
  3. Put the right people in the right place.
  4. Report to higher management.
  5. Good communication skills.
  6. Some kind of background in software development and machine learning.
  7. Know what are the roles of the data scientist and data engineer.
  8. Know what can be done and what can’t be done using data (maybe the solution cannot scale or the current machine learning algorithms cannot solve it).

Where to find the team

  1. LinkedIn
  2. Monster
  3. Data science and ML competitions.
  4. Kaggle.com (has a very large job board which is a good place to search for).
  5. Hire.com is a good place of people who took online classes.

The challenge is that a lot of people call themselves data scientists, but we can filter on what they know like R, Python, etc. 

3- Interviewing for Data Science
  1. Scheduled individual meetings and then ask: How would you tackle this kind of problem.
  2. Do they have experience in this kind of experiment?
  3. Allow them to do a presentation of their skills.
  4. Technical problem: have them solve a very small but real problem, maybe a small dataset to analyze, it might take them an hour.

Onboarding the Data Science team

  1. Share policies on how to communicate with other team members, etc.
  2. Set them up in front of a certain small problem/project to have them up and running.
4- Managing the Data Science team
  1. Conduct individual meetings. 
  2. Peer review and presentations.
  3. Evaluate the progress, re-iterate on the goal and surface any showstopper pending on the data team.
  4. Group meetings are good too, use brainwriting techniques.
  5. Data science members will interact with an external unit in the organization, we can do that by being cc’d / chat or being told by the organization on who to collaborate with.
  6. The best way is to have an open policy (manager of the data science team is easily reachable) and not only through 1-1 meetings.
  7. Identify new opportunities for the data science team to learn new tools, opportunities, promotions, etc.

Evaluating the success of the team

  1. Solving organization problems (increase usage by x % or amount of time) which are usually vague.
  2. Solving an internal problem (code refactoring, etc.) which is concrete.
  3. Vague metrics (organization wide) vs. very specific metrics (specific projects) 
  4. Take responsibility for failures, even if the organization asked to run the experiment from the team and the hypothesis that we got is wrong.
  5. Propose concrete steps to treat failure.
  6. Celebrate success.

Embedded vs. Dedicated Groups

  1. Embedded teams:
    1. Sit with other teams like marketing, etc.
    2. Big advantage to promote collaboration and know the difficulties.
    3. Communication: Working on a concrete problem which comes out from another team (marketing, leadership team, etc.) one option to have this is to embed the person inside that department or have an open communication.
    4. Support: have someone ready to talk to in case of need.
    5. Empowerment: the data often doesn't tell you what you want to hear, it is common to have data but the hypothesis was wrong. We need to communicate this which is why we need to empower the data scientists.
  2. Dedicated team: only communicates within the team.

The best combination is a bit of both (reach out to get the right information from other departments but there is always a base to go back to)

 

How does data science interact with other groups

Through consulting which commonly covers:

  1. Coming up with a solution that needs machine learning.
  2. A person in the team educates other people of what data science is and how we can use it.
  3. We can come up with one-off meetings to share what we can do, or conduct a bootcamp.
  4. We can also develop a new product that answers the questions/requirements.

Empowering others to use data

  1. Conduct data science training.
  2. Pointing people to resources to learn.
  3. Writing information ourselves.
  4. Internal talks (conduct TekTalks, how do you pull data from databases, how do you visualize data, what is ML, etc.)
  5. Have a page that people can interact with the data.

Data science idea evaluation

  1. See how data science can or cannot solve a certain problem (why not propose a forum where people put their problems and see if we can solve them through data science)
  2. We need to draw other people to data science solving capabilities.

Common interaction difficulties

  1. Lack of interaction (maybe because there is nothing to do)
  2. Lack of empowerment (because they are embedded in another team)
  3. Know when to stop (it doesn't have to be perfect)
  4. Lack of understanding of the problem.

Common internal difficulties

  1. Interpersonal miscommunication (setup a code of conduct from day 1, code review, how to interact within the team)
  2. Define success and failure: don't set the success to learn machine learning algorithms only but rather to understand the problem and analyze it.
  3. Have people communicate with each other, open criticism.
  4. Be clear what will happen if the code of conduct is breached.
  5. Increase interaction and followup in case of need. 
  6. Identify the reduced motivation and fix it (maybe the person is not motivated working with the marketing team)