5 Step Learning Path to Become Data Scientist in 2019


What is data Science, Big Data and Data Analysis

Most cost effective and best learning path to build your career in Data Science and Machine Learning explained with clear roadmap.

Data Science is all about solving business problems and there is no formal education system available with clearly defined syllabus. Let us understand the basic terminologies and how to build your career in Data Science.

Let us first understand what data is, data is basically the raw form that we get from various applications, tools, machines or manually collected.

Historically we use to collect most of the data points through sampling or census activities and lots of manual activities were involved. That resulted into data being very scarce and costly. Since we had very limited data available with us, we could analyse them very easily.

During 1980s and 90s, gradually companies started deploying CRM (Customer Relationship Management) tools, ERP applications and customer data started to pile up. Along with that we started using various types of sensors in our manufacturing processes.

All these started giving data dumps, which were largely unstructured. That’s when we started working on large data sets.

It involved cleaning data dumps, mapping with other available data sets to get meaningful insights for business. Now data from social and various online platforms have also added to the list.

While the explosion of data happened, we didn’t train too many people to analyse these huge data dumps. That’s why we have huge shortage of good data scientists and there is complete imbalance of demand and supply.

It is very important to understand the basic difference among the three branches as shown above. In this article we will be talking about the middle branch that is Data Science.

Before that I will give you a brief about the other two:

Big Data:

This is mostly to do with data management, as we have seen that data is getting generated from many different platforms. It could be search history, social media, CRM etc. All this data is humongous and in different formats. We use tools like Hadoop to manage this kind of big data.

In this field, you will learn data storage, using query like SQL, processing data etc. Along with this there are tools for data visualization as well, which you can learn.

Data Analysis:

This is very much like business intelligence. In this we are not trying to predict or model anything like in Data Science. We just prepare trends and insights from available data sets using tools like Excel, Power BI, Tableau etc.

This is most basic of the three branches within data science.

Now let us understand what exactly Data Science is and the steps to master it.

Basics of Data Science

Data Modeling

This is the most important and in demand vertical of Data Science and this also includes Machine Learning.

In this we try to build models/algorithms to predict an outcome. It also involves classifying data into different homogenous groups. For example, we build models basis customer profile and past behaviour to predict propensity to pay a debt.

Another real life example is sales, here we build models to predict buyer behaviour and upsell product accordingly. So, we try to predict whether certain type of customers would buy a particular product or not.

This whole exercise involves basically 6-7 key steps as shown in the figure.

The 1st step is all about gathering data and cleaning all the data points, you will get lot of unstructured and data in different formats. All this needs to be sorted out. We do a lot of exploratory analysis and visualizations to understand data and trends.

2nd step is about choosing the correct model or algorithm, this is where knowledge of Statistics and Algebra is required. Basically, we have number of independent variables and we form hypothesis. Then we see which variables are actually impacting the outcome.

We generally create models basis historical data, this data set is divided into Training and Testing data sets. Model is created or trained basis Training data set.

Post this, we test our model on Testing data set to predict, compare results and accuracy of the model. We must make sure that our model is predicting right outcome.

Machine Learning – Key to Data Science

“Machine Learning is the science of making computers learn and act like humans by feeding data and information without being explicitly programmed”

Machine Learning started as a medium of model building and predicting behaviour, now it is getting integrated with AI(Artificial Intelligence).

But I will keep it simple for you and it is not so technical that you can’t learn. Basically, there are three branches of Machine Learning.

Supervised Learning:

In supervised learning, model predicts outcome (dependent) variable from a given set of predictors (independent variables). Using these set of independent variables and training data, we generate a model/function that maps inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data. Examples of Supervised Learning algorithms are Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.

Un-Supervised Learning:

In this, we do not have any target or outcome variable to predict / estimate.  It is used for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention.

Examples could be market segmentation, Text mining etc. Some of the algorithms used are Apriori algorithm, K-means.

Reinforced Learning:

Using reinforced learning, the machine is trained to make specific decisions. It works this way: the machine is exposed to an environment where it trains itself continually using trial and error. This machine learns from experience and tries to capture the best possible knowledge to make accurate business decisions.

Examples are gaming, stock trading or robot movements etc. Example of algorithm is Markov Decision Process.

Do you need to understand statistical concepts

This is the most common question that everyone asks me, and the answer is a big “YES”. As you have seen so far that the basic requirement of data science is predicting basis past behaviour or historical data.

If you study Statistics, you will find that it does the same thing. We try to predict values for population basis sample data, I hope you are with me on this.

Along with statistics we also need to understand mathematical concepts of calculus, probability and algebra. However, statistics forms the base of predictive modelling.

Many of you might be having programming background and you will be tempted to jump into Python coding and call yourself a data scientist because Python coding is not that difficult. But I would suggest everyone to first understand the underlying concepts of model building and machine learning.

To do this you have to learn statistical and mathematical concepts.

What should I learn? – Concepts or coding

This is another question that many of you might be having. Many online learning platforms are promoting coding with R or Python and become machine learning expert.

So, what happens, we take up these courses and learn coding but we don’t understand the background. So, in the real world scenario you will be only able to do what your supervisor tells you.

For example, you may learn how to use a clustering technique in Python, but you will not know when to use clustering and which technique will work best in a given scenario.

To solve a real world problem, we need to know all the different techniques and which one can be applied. In our models, we look for better accuracy in predicting. To do this, we may end up using multiple techniques and concepts. All this needs to be logical based on statistical or mathematical concepts.

Initially, you may be doing only coding work but if you want to grow then concepts are very important and my suggestion is to start with building sound base with concepts.

2nd step should be learning Python or R for coding work.

So, How shall I start learning data science, steps to learn Data Science?

Data Science is an evolving field, we don’t have any perfect course or certificate which can claim to cover all the aspects. Most of the professional in this field are doing self learning and taking up multiple online courses. I have taken many courses as well though I work at a senior level and working in this field for last 13 years.

If you are new to this field then I would try to explain you which courses you can take up and how to move forward, I am keeping it simple and explaining in 5 steps.

Also keeping cost in mind, I will recommend the most cost effective way to gain knowledge. I am not recommending you to start searching YouTube and get lost in plethora of videos, some are good but others are just average and you will not understand where to start and how do they link to each other.

Here I am talking about 5 simple steps to follow for a beginner:

Step 1: Learn Statistics

If you are from Statistics background or have used statistical tools previously then you have crossed this one major hurdle.

You need to know basic concepts of Forming Hypothesis and testing them, Regression related concepts, Factor and Clustering concepts etc.

Recommended Courses (Pick one of them):

Statistics for Data Science and Business Analysis(Udemy) by 365 Careers

Duration : Approx 5 Hours
Price : Approx $10 (Lowest price) – One of the best at this price
Rating : 4.5 out of 5
You can Sign up Here

The Data Science Course 2019: Complete Data Science Bootcamp(Udemy) by 365 Careers

Duration : Approx 20.5 Hours
Price : Approx $10 (Lowest price) – One of the best at this price
Rating : 4.5 out of 5
You can Sign up Here

This one covers both Statistics and mathematics basics

Step 2 : Mathematics for Data Science

Knowledge of linear algebra and calculus will help you to understand the mathematics behind the algorithms and how adjusting parameters affect the algorithms at run-time and their results. Even if you have technical background, there is no harm in brushing up your memory.

In case if you don’t like mathematics, don’t get disheartened. Just try to understand the basic concepts and move ahead, don’t stop here. The key is to move ahead and you can always refer back to your video or notes.

Mathematics is covered in below mentioned video as well as in the Machine Learning course by Andrew Ng(covered in Step 3).

Recommended Courses :

The Data Science Course 2019: Complete Data Science Bootcamp by(Udemy) by 365 Careers

Duration : Approx 20.5 Hours

Price : Approx $10 (Lowest price) – One of the best at this price

Rating : 4.5 out of 5

You can Sign up Here

This one covers both Statistics and mathematics basics

Step 3 : Learn Python and R

It is always better to start learning Python and R before we move into Machine Learning concepts. This is specially important for someone having very less or no programming experience.

Machine Learning A-Z™: Hands-On Python & R In Data Science(Udemy) by Kirill Eremenko

Duration : Approx 41 Hours

Price : Approx $10 (Lowest price)

Rating : 4.5 out of 5

You can Sign up Here

Complete Python Bootcamp: Go from zero to hero in Python 3(Udemy) by Jose Portilla

Duration : Approx 24 Hours

Price : Approx $10 (Lowest price)

Rating : 4.5 out of 5

You can Sign up Here

Step 4 : Advance Machine Learning Concepts

So far, I recommended courses from Udemy because they are very cost effective. Always try to enrol, when there are some offers going on, generally they cost anywhere near $10.

Now that you have good understanding of the various tools and techniques used in Data Science, you can go for courses which are premium and generally cost anything between $100-$300. These are some of the best and industry leading courses.

Andrew Ng from Coursera is a dynamic instructor, he is the co founder of Coursera and founding lead of Google Brain. He inspires confidence and self belief, especially when sharing practical implementation tips.

Machine Learning courses by Andrew Ng are highly recommended by most of the industry experts.

You can simultaneously do both step 3 and 4, learn Python along with Machine Learning course.

Machine Learning (Stanford University-Coursera) by Andrew Ng

Duration : Approx 55 Hours

Price : Approx $ 48/month (It will take 2-3 months to complete, you can apply for financial aid option for free course)

Rating : 4.9 out of 5

You can Sign up Here

Applied Data Science with Python Specialization(Coursera) by University of Michigan

Duration : Approx 99 Hours

Price : Approx $ 48/month (It will take 4-5 months to complete, you can apply for financial aid option for free course)

Rating : 4.5 out of 5

You can Sign up Here

Step 5 : Deep Learning and AI

Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods. This is the next step within machine learning and AI. I would suggest you to first understand basics and then get into deep learning.

Recommended Courses :

Deep Learning Specialization by Deeplearning.ai(Coursera) Course by Andrew Ng

Duration : Approx 120 Hours

Price : Approx $ 48/month (It will take 4-5 months to complete, you can apply for financial aid option for free course)

Rating : 4.9 out of 5

You can Sign up Here

Machine Learning and Deep Learning are the most sought after skill in today’s time as well as in the foreseeable future. This is the right time to start your learning and jump start your career.

Since this is a fast evolving industry, be ready to learn new techniques and target to learn a new thing every quarter.

I have done exhaustive research and came up with the Best Machine Learning Courses, Best Deep Learning Courses and Best AI Courses which cover various aspects, technologies and programming languages.

I have only included the best and though there are other good courses also available but I have tried to keep it simple.

Top Selling Materials and Templates

Six Sigma Black Belt Preparation Pack/Training slides with Minitab examples

Lean and waste training slides

Six Sigma pack of Excel templates

5S Course and Training slides