From 6014579353829239de507948a57f5cf92a791be3 Mon Sep 17 00:00:00 2001 From: nprimo Date: Tue, 28 Nov 2023 11:29:32 +0000 Subject: [PATCH] chore: run prettier --- subjects/ai/pipeline/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/subjects/ai/pipeline/README.md b/subjects/ai/pipeline/README.md index 8ad037822..c6189d5a3 100644 --- a/subjects/ai/pipeline/README.md +++ b/subjects/ai/pipeline/README.md @@ -10,7 +10,6 @@ Today we will focus on the data preprocessing and discover the Pipeline object f - The **step 1** is always necessary. Models use numbers, for instance string data can't be processed raw. - The **steps 2** is always necessary. Machine learning models use numbers, missing values do not have mathematical representations, that is why the missing values have to be imputed. - The **step 3** is required when the dimension of the data set is high. The dimension reduction algorithms reduce the dimensionality of the data either by selecting the variables that contain most of the information (SelectKBest) or by transforming the data. Depending on the signal in the data and the data set size the dimension reduction is not always required. This step is not covered because of its complexity. The understanding of the theory behind is important. However, I suggest to give it a try during the projects. - - The **step 4** is required when using some type of Machine Learning algorithms. The Machine Learning algorithms that require the feature scaling are mostly KNN (K-Nearest Neighbors), Neural Networks, Linear Regression, and Logistic Regression. The reason why some algorithms work better with feature scaling is that the minimization of the loss function may be more difficult if each feature's range is completely different. These steps are sequential. The output of step 1 is used as input for step 2 and so on; and, the output of step 4 is used as input for the Machine Learning model. @@ -363,4 +362,5 @@ The pipeline you will implement has to contain 3 steps: 1. Train the pipeline on the train set and predict on the test set. Give the score of the model on the test set. --- + ---