mirror of https://github.com/01-edu/public.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
jrosendo
73fda016e5
|
2 years ago | |
---|---|---|
.. | ||
README.md | 2 years ago |
README.md
Credit scoring
Preliminary
project
│ README.md
│ environment.yml
│
└───data
│ │ ...
│
└───results
│ │
| |───model (free format)
│ │ │ my_own_model.pkl
│ │ │ model_report.txt
│ │
| |feature_engineering
│ │ │ EDA.ipynb
│ │
| |───clients_outputs
| | | client1_correct_train.pdf (free format)
│ │ │ client2_wrong_train.pdf (free format)
│ │ │ client_test.pdf (free format)
│ │
| |───dashboard (optional)
| | | dashboard.py (free format)
│ │ │ ...
|
|───scripts (free format)
│ │ train.py
│ │ predict.py
│ │ preprocess.py
Does the structure of the project is as below ?
Does the readme file introduce the project, summurize how to run the code and show the username ?
Does the environment contain all libraries used and their versions that are necessary to run the code ?
Does the EDA.ipynb
explain in details the exploratory data analysis ?
Machine learning model
Is the model trained only the training set ?
Is the AUC on the test set is higher than 75% ?
Does the model learning curves prove that the model is not overfitting ?
Has the training been stopped early enough to avoid the overfitting ?
Does the text document model_report.txt
describe the methodology used to train the machine learning model ?
Does predict.py
run without any error and returns the following ?
```prompt
python predict.py
AUC on test set: 0.76
```
This article gives a complete example of a good modelling approach:
Model's interpretability
Feature importance:
Are the importance of all features used by the model computed and showed in a visualisation ?
Is the mapping between between the importance of the features and the features' name is correct ? You should be careful here to associate the right variables to the their feature importance. Sometimes, the preprocessing pipeline can remove some features during the features selection step for instance.
Descriptive variables:
These are important to understand for example the age of the client. If the data could be scaled or modified in the preprocessing pipeline but the data visualised here should be "raw". This part is validated if the visualisations are computed for the 3 clients.
- visualisations that show at least 10 variables describing the client and its loan(s)
- visualisations that show the comparison between this client and other clients.