Browse Source

chore: run prettier

pull/2324/head
nprimo 12 months ago committed by Niccolò Primo
parent
commit
6014579353
  1. 2
      subjects/ai/pipeline/README.md

2
subjects/ai/pipeline/README.md

@ -10,7 +10,6 @@ Today we will focus on the data preprocessing and discover the Pipeline object f
- The **step 1** is always necessary. Models use numbers, for instance string data can't be processed raw. - The **step 1** is always necessary. Models use numbers, for instance string data can't be processed raw.
- The **steps 2** is always necessary. Machine learning models use numbers, missing values do not have mathematical representations, that is why the missing values have to be imputed. - The **steps 2** is always necessary. Machine learning models use numbers, missing values do not have mathematical representations, that is why the missing values have to be imputed.
- The **step 3** is required when the dimension of the data set is high. The dimension reduction algorithms reduce the dimensionality of the data either by selecting the variables that contain most of the information (SelectKBest) or by transforming the data. Depending on the signal in the data and the data set size the dimension reduction is not always required. This step is not covered because of its complexity. The understanding of the theory behind is important. However, I suggest to give it a try during the projects. - The **step 3** is required when the dimension of the data set is high. The dimension reduction algorithms reduce the dimensionality of the data either by selecting the variables that contain most of the information (SelectKBest) or by transforming the data. Depending on the signal in the data and the data set size the dimension reduction is not always required. This step is not covered because of its complexity. The understanding of the theory behind is important. However, I suggest to give it a try during the projects.
- The **step 4** is required when using some type of Machine Learning algorithms. The Machine Learning algorithms that require the feature scaling are mostly KNN (K-Nearest Neighbors), Neural Networks, Linear Regression, and Logistic Regression. The reason why some algorithms work better with feature scaling is that the minimization of the loss function may be more difficult if each feature's range is completely different. - The **step 4** is required when using some type of Machine Learning algorithms. The Machine Learning algorithms that require the feature scaling are mostly KNN (K-Nearest Neighbors), Neural Networks, Linear Regression, and Logistic Regression. The reason why some algorithms work better with feature scaling is that the minimization of the loss function may be more difficult if each feature's range is completely different.
These steps are sequential. The output of step 1 is used as input for step 2 and so on; and, the output of step 4 is used as input for the Machine Learning model. These steps are sequential. The output of step 1 is used as input for step 2 and so on; and, the output of step 4 is used as input for the Machine Learning model.
@ -363,4 +362,5 @@ The pipeline you will implement has to contain 3 steps:
1. Train the pipeline on the train set and predict on the test set. Give the score of the model on the test set. 1. Train the pipeline on the train set and predict on the test set. Give the score of the model on the test set.
--- ---
--- ---

Loading…
Cancel
Save