Executive Data Science specialization: A brief summary

A short summary of what I learned from the course and how it related to real-world applications I applied within the context of enterprises and startups.
data science image part 1
 

A crash course in Data Science

First off, let’s define the difference between statistics and data science.

Statistics like to have models that are clear and applies systematic procedures.

 

 

 

Data science uses statistics, can deal with multiple models and doesn’t have to be straightforward. A data scientist would need a deep understanding of computer science, statistics and a good level of communication.

 

A crash course in Data Science

First off, let’s define the difference between statistics and data science.

Statistics like to have models that are clear and applies systematic procedures.

Data science uses statistics, can deal with multiple models and doesn’t have to be straightforward. A data scientist would need a deep understanding of computer science, statistics and a good level of communication.

Supervised vs unsupervised learning

Supervised: Rely on old data to make predictions on new data (per example, using Machine Learning linear regression to determine the fruits in a basket, whether they were oranges, apples or bananas)

Unsupervised: Rely on data to learn its structure without using explicitly-provided labels (per example, using Machine Learning clustering algorithms to group all similar fruits, though it doesn’t know what they are: oranges, apples, bananas)

Structure of a Data Science project

 

  1. Ask the right question: What is the hypothesis that we are trying to solve?
  2. Data: Aggregate any related data that could be meaningful.
  3. Exploratory data analysis: Look what could be missing in your data.
  4. Formal modeling: statistics, see if this works.
  5. Interpretation: Verbal thinking with others.
  6. Communication: Collaborate with the team to bring a common consensus.
  7. Decision: Come up with decisions based on the results.

One very important aspect in the above approach that is worth detailing is Exploratory Data Analysis that comprises 2 main goals:


1- Is the data that you have suitable for answering the question that you have. 
Then so this will depend on a variety of factors depending on very basic information such as is there enough data, are there too many missing values, to more fundamental ones, like are you missing certain variables or do you need to collect more data to get those variables, etc? 

2- Start to develop a sketch of the solution
If the data are appropriate for answering your question, you can start using it to sketch out what the answer might be to get a sense of what it will look like (example: basic decision tree to identify patterns).

 

The output of a Data Science project

  1. The result should be reproducible.
  2. The documentation of hypothesis, approach, and code should be done.
  3. The conclusion should be precise and unknowns should be limited to the least.

The four secrets of a successful Data Science project

  1. New knowledge is created.
  2. Decisions or policies are made based on the outcome of the experiment.
  3. A report, presentation or app with impact is created.
  4. It is learned that the data can’t answer the question being asked of it.

Is the project about hype or has a real impact

Answer the below to define if the project is about buzzwords or a real contribution:

1- What is the question you are trying to answer?

2- Do you have the data to answer the question?

3- If you could answer the question, could you use the answer? (example: The winner of Netflix prize increased 10% the accuracy to predict how much they are going to enjoy a movie based on their preferences, but computing power was huge so they ended not using it)