@ -26,7 +26,7 @@ There are 3 expected deliverables associated with the scoring model:
- The trained machine learning model with the features engineering pipeline:
- Do not forget: **Coming up with features is difficult, time-consuming, requires expert knowledge. ‘Applied machine learning’ is basically feature engineering.**
- The model is validated if the **AUC on the test set is higher than 50%**.
- The model is validated if the **AUC on the test set is at minimum 55%, ideally to 62% included (or in best cases higher than 62% if you can !)**.
- The labelled test data is not publicly available. However, a Kaggle competition uses the same data. The procedure to evaluate test set submission is the same as the one used for the project 1.
- Here are the [DataSets](https://assets.01-edu.org/ai-branch/project5/home-credit-default-risk.zip).
@ -74,7 +74,7 @@ All people having 100% of accuracy on the Leaderboard cheated, there's no point
```console
project
│ README.md
│ environment.yml
│ requirements.txt
│ username.txt
│
└───data
@ -90,7 +90,7 @@ project
- `README.md` introduction of the project, shows the username, describes the features engineering and the best score on the **leaderboard**. Note the score on the test set using the exact same pipeline that led to the best score on the leaderboard.
- `environment.yml` contains all libraries required to run the code.
- 'requirements.txt` contains all libraries required to run the code.
- `username.txt` contains the username, the last modified date of the file **has to correspond to the first day of the project**.