In this post, I lay out the concrete self-study path for applied Machine Learning that you can use to orient yourself and figure out the next step in your journey to ace Machine Learning.
To start with any technology not only Machine learning, but it's also important to maintain consistency with proper plan and do not hyper in any kind of situation. Being calm and positive is one very important factor when you are going to begin your journey in AI or Data Science because it contains lots of maths, indeed code is very simple, so my suggestion is to learn from errors and extract information and relevant points from it to move forward. 😊
Machine Learning RoadMap
Machine Learning is a huge field of study that includes many algorithms, techniques, theories, and maths behind the working of algorithms. Machine Learning not only comprises building the model rather understanding how the process works, how to train, how things work, how to prepare, generate, and present the results.
The Roadmap will take you to extract the best out of it. I will be giving a detailed explanation of each topic on how to approach and handle the barriers you face in different situations.
👉 let's move on by taking each step forward.
1) Programming Language 💻
First, to begin, with any technology, it's essential to decide the proper plan to move forward, and first comes the programming language. there are different programming languages supported by technology. it's your choice to choose what's best for you and in which you are interested. Moving into Data Science and Machine Learning you can go with Python or R.
As per my experience and what I have learned, my suggestion would be to move with Python because of its user-friendly nature, rich set of libraries that is easy to use. I will show topics to be covered while learning python.
- Python Basics:- learn the first basic syntax, control statements, loop, and exception handling in python
- Python Data structure:- learn and practice standard python data structure and built-in functions. the basic python data structure includes list, tuple, set, dictionary, and string. play with all this python data structure with various built-in functions.
- Python Function:- learn how to build function in python in multiline as well single line function which also includes list comprehension.
- OOPS:- Python is object-oriented programming and it's very important to have a knowledge of OOPS in respect to building various APIs for Machine Learning. learn the concept of classes and objects, Abstract Class, Class Method, Object Introspection, Constructors, Inheritance, Overloading, and Overriding.
- Modules:- After completing all this you can proceed with understanding basic and main python pre-built modules and their built-in functions. important modules include math, random, statistics, regular expression.
2) Statistics ✌
Give some amount of time to be familiar with some statistical terms which will help in the further part of data analysis and also to understand the working behind algorithm when you will reach the Machine Learning algorithm. Some of the basic topics of statistics are listed below that should be taken care of.
- Introduction to basic terms
- Variables and Random Variables
- Population, Sample, Population, and Sample Mean
- Population Distribution, Sample Distribution, and Sampling Distribution
- Central Tendency (Mean, Median, and Mode)
- Range
- Measure of Dispersion
- Variance
- Standard Deviation
- Gaussian/Normal Distribution
- Standard Normal Distribution
There are some topics which are somewhat up a level than this, can be said as intermediate level and then advanced statistics, But for now, as a beginner, you do not need to go and learn advanced statistics, it will be covered with your data analysis. At the intermediate level, you can learn some of the following topics.
- Z Score
- Probability Density Function
- Cumulative Distribution Function
- Hypothesis Testing
- Kernel Density Function
- Different Plotting Graphs
- Central Limit theorem
- Skewness and Kurtosis
- Covariance
- Pearson Correlation Coefficient
- Spearman Correlation Coefficient
3) Data Analysis 👌
Now, you have the knowledge of programming language, you can kickstart your journey by developing basic data analysis skills. Machine Learning completely depends on data. Data is the most crucial part of building any machine learning project of a model. Because working on data, first, it's essential to understand the data, what your data says, what intuition data has.
Building any machine learning model includes 80 percent work on improving and understanding data and only 20 percent of work on modeling the data.
It's very important to analyze and prepare your data by removing all the anomalies data contained. you can start data analysis with the basic and major Python libraries that are listed below.
- Numpy: Numpy stands for Numerical Python which is a python package for computation and processing single-dimensional and multi-dimensional array elements. Numpy provides a convenient and efficient way to handle a vast amount of data and perform basic mathematical operations to analyze data on various parameters. Numpy is very fast which makes it reasonable to work with large datasets.
- Pandas: Pandas is an open-source library that provides high-performance data manipulation in Python. Data Analysis requires lots of work including restructuring, cleaning, merging, etc. there are many tools and techniques to perform this task but we prefer pandas due to their fast results, easy implementation, and expressive than other tools.
When you are familiar with this library then try to learn some visualization libraries and the learn techniques for Feature Engineering or Exploratory data visualization library include Matplotlib, seaborn. Matplotlib alone is sufficient as a beginner, you can learn seaborn while practicing or simply keep it as a backup. there are many visualization tools like Tableau, Power BI which are efficient to prepare reports and present to a client. but it's right to learn these tools when you are familiar with all these libraries and machine learning processes and algorithms. so do not distract from the path by starting learning these tools for now.
Feature Engineering task includes performing various techniques like:
- Handling Missing Values
- HandlingCategorical Features
- Feature Transformation
- Handling Outliers
- Handling Imbalanced dataset.
👊 After this technique, there are various techniques for Feature scaling and Feature Selection. Providing proper features to predict the corresponding target variable is important. if you are giving a variable that does not have a correlation with the target then the model will underfit and you will also be confused so feature selection is the most critical method before modeling.
4) Machine Learning Algorithms 👋
Now, you are happy to reach here, for what you were aiming for. get started with supervised learning and explore algorithms by understanding the mathematical intuition behind the working of the algorithm than try to implement using python library on different datasets and evaluate the results by understanding different evaluation metrics used for each task.
Supervised Learning Algorithms: we have already started with supervised learning. you can have a basic understanding of supervised learning and procedure by reading this article. and step by step I will be posting an article explaining the intuition behind each algorithm.
supervised learning is categorized into 2 types Regression and Classification.
Supervised Learning Algorithms |
Unsupervised Learning Algorithms:
Unsupervised learning works without any kind of supervision means labeled data is not provided to the model instead the algorithm tries to find the hidden trend in data and groups similar data in the same group. Unsupervised learning is further divided into 2 categories as Clustering and Association
- K-Means Clustering
- Hierarchical Clustering
- DBSCAN Clustering
- Principal Component Analysis(PCA)
Association rules allow you to build associations amongst data objects inside large databases. It's about discovering relationships between variables in the database.
5) Ensembling Algorithms
Now, it's time to explore and enhance your algorithm knowledge by going deep understanding of decision trees and learning about advanced tree boosting algorithms. boosting algorithms include Adaboost, Xgboost, lightboost, Catboost, etc. With these algorithms, you should also take care of Hyperparameter tuning and whatever algorithm use try, try to have an understanding of its parameters which can be tuned to improve its performance, and try different hyperparameter tuning techniques with GridSearchCv and RandomizedSearchCV.
Try different cross-validation techniques to improve your model's accuracy.
6) Build-Basic Project(Explore Kaggle)
The best web portal to learn and explore data science skills is Kaggle. visit there and take the knowledge-based completion and explore the dataset. Pick any public dataset of your choice and domain and try to analyze, and summarize the results, present your analysis and approach to the public at the Kaggle community.
7) Flask and Rest-API
Flask is a micro web framework written in Python. It's simply a web application framework that represents a collection of libraries and modules that enable a web application developer to write applications without having to bother about low-level details such as protocols, thread management, etc.
Flask will help you to build the front end of your model and present it to the end-user. If you want to learn Flask then the best video tutorial series on youtube is available from Corey Schafer.
8) Build End-End Projects
Now it's time to explore your understanding and convert it into an experience. Try different domains, pick a dataset according to different algorithms and build an end-to-end project means to present your project to an end-user that can be used for someone else by deploying it on some cloud. the best free cloud service you can use is Heroku.
- learning Data science is not just building a model, trying to understand the concepts and maths.
- Until Error does not come, learning does not make any sense, and enjoyment is not there. so you will face errors and will be stuck in any situation, so do not get hyper at that time, just google your error and StackOverflow is always there.
- when you are dealing with some problem, you must visit the official documentation of a specific library and find the results and examples from there.
- The practice is most important, not just learn, practice daily different techniques and how much you will explore yourself, you will keep improving.
- Be updated with current projects, research, and keep reading the same articles on our blog as well on medium, AnalyticsVidhya, KdNuggets, etc to be motivated and learning is a never-ending process.
Conclusion
This is a path for moving forward with Machine Learning Journey, indeed there are more libraries, more tools, and techniques you will come across when step by step you will keep moving and when you will explore yourself with the community you will find different ways to learn and grow. we will be also coming up with complete intuition behind each algorithm with hands-on practice.
Thank you so much for
ReplyDeletesuch a good roadmap 😁
Yeahh,True.
Delete