From da46bf9359de30bf41a918eb91a5a42770990d79 Mon Sep 17 00:00:00 2001 From: Oumaima Fisaoui <48260689+Oumaimafisaoui@users.noreply.github.com> Date: Wed, 2 Oct 2024 10:48:14 +0100 Subject: [PATCH] Chore(AI): fix problems of accuracy --- subjects/ai/credit-scoring/README.md | 2 +- subjects/ai/credit-scoring/audit/README.md | 4 ++-- subjects/ai/emotions-detector/README.md | 8 ++++---- subjects/ai/kaggle-titanic/README.md | 4 ++-- subjects/ai/nlp-scraper/README.md | 1 + 5 files changed, 10 insertions(+), 9 deletions(-) diff --git a/subjects/ai/credit-scoring/README.md b/subjects/ai/credit-scoring/README.md index 66a7f65c6..3ed8741a7 100644 --- a/subjects/ai/credit-scoring/README.md +++ b/subjects/ai/credit-scoring/README.md @@ -26,7 +26,7 @@ There are 3 expected deliverables associated with the scoring model: - The trained machine learning model with the features engineering pipeline: - Do not forget: **Coming up with features is difficult, time-consuming, requires expert knowledge. ‘Applied machine learning’ is basically feature engineering.** - - The model is validated if the **AUC on the test set is higher than 50%**. + - The model is validated if the **AUC on the test set is at minimum 55%, ideally to 62% included (or in best cases higher than 62% if you can !)**. - The labelled test data is not publicly available. However, a Kaggle competition uses the same data. The procedure to evaluate test set submission is the same as the one used for the project 1. - Here are the [DataSets](https://assets.01-edu.org/ai-branch/project5/home-credit-default-risk.zip). diff --git a/subjects/ai/credit-scoring/audit/README.md b/subjects/ai/credit-scoring/audit/README.md index 1cceee536..7eba4ec1f 100644 --- a/subjects/ai/credit-scoring/audit/README.md +++ b/subjects/ai/credit-scoring/audit/README.md @@ -46,7 +46,7 @@ project ###### Is the model trained only the training set? -###### Is the AUC on the test set higher than 50%? +###### Is the AUC on the test set is between 55% (included) to 62%(included) or higher than 62%? ###### Does the model learning curves prove that the model is not overfitting? @@ -59,7 +59,7 @@ project ```prompt python predict.py - AUC on test set: 0.50 + AUC on test set: 0.62 ``` diff --git a/subjects/ai/emotions-detector/README.md b/subjects/ai/emotions-detector/README.md index 4d3e31136..f95b70dee 100644 --- a/subjects/ai/emotions-detector/README.md +++ b/subjects/ai/emotions-detector/README.md @@ -164,10 +164,10 @@ Balance technical prowess with psychological insight: as you fine-tune your CNN ### Resources -- https://machinelearningmastery.com/what-is-computer-vision/ +- [What is computer vision](https://machinelearningmastery.com/what-is-computer-vision/) -- Use a pre-trained CNN: https://arxiv.org/pdf/1812.06387.pdf +- [Use a pre-trained CNN](https://arxiv.org/pdf/1812.06387.pdf) -- Hack the CNN https://medium.com/@ageitgey/machine-learning-is-fun-part-8-how-to-intentionally-trick-neural-networks-b55da32b7196 +- [Hack the CNN](https://medium.com/@ageitgey/machine-learning-is-fun-part-8-how-to-intentionally-trick-neural-networks-b55da32b7196) -- https://arxiv.org/pdf/1812.06387.pdf +- [Convolutional Neural Network](https://arxiv.org/pdf/1812.06387.pdf) diff --git a/subjects/ai/kaggle-titanic/README.md b/subjects/ai/kaggle-titanic/README.md index 93aeed23f..9e1c8f64c 100644 --- a/subjects/ai/kaggle-titanic/README.md +++ b/subjects/ai/kaggle-titanic/README.md @@ -74,7 +74,7 @@ All people having 100% of accuracy on the Leaderboard cheated, there's no point ```console project │ README.md -│ environment.yml +│ requirements.txt │ username.txt │ └───data @@ -90,7 +90,7 @@ project - `README.md` introduction of the project, shows the username, describes the features engineering and the best score on the **leaderboard**. Note the score on the test set using the exact same pipeline that led to the best score on the leaderboard. -- `environment.yml` contains all libraries required to run the code. +- 'requirements.txt` contains all libraries required to run the code. - `username.txt` contains the username, the last modified date of the file **has to correspond to the first day of the project**. diff --git a/subjects/ai/nlp-scraper/README.md b/subjects/ai/nlp-scraper/README.md index 69545a6bf..71209fb80 100644 --- a/subjects/ai/nlp-scraper/README.md +++ b/subjects/ai/nlp-scraper/README.md @@ -155,6 +155,7 @@ project ├── data │   └── ... ├── nlp_enriched_news.py +├── requirements.txt ├── README.md ├── results │   ├── training_model.py