mirror of https://github.com/01-edu/public.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
221 lines
5.6 KiB
221 lines
5.6 KiB
2 years ago
|
#### Exercise 0: Environment and libraries
|
||
|
|
||
|
##### The exercice is validated is all questions of the exercice are validated.
|
||
|
|
||
|
##### Activate the virtual environment. If you used `conda` run `conda activate your_env`
|
||
|
|
||
|
##### Run `python --version`
|
||
|
|
||
|
###### Does it print `Python 3.x`? x >= 8
|
||
|
|
||
|
###### Does `import jupyter`, `import numpy`, `import pandas`, `import matplotlib` and `import sklearn` run without any error ?
|
||
|
|
||
|
---
|
||
|
|
||
|
---
|
||
|
|
||
|
#### Exercise 1: Logistic regression with Scikit-learn
|
||
|
|
||
|
##### The question 1 is validated if the predicted class is `0`.
|
||
|
|
||
|
##### The question 2 is validated if the predicted probabilities are `[0.61450526 0.38549474]`
|
||
|
|
||
|
##### The question 3 is validated if the output is:
|
||
|
|
||
|
```console
|
||
|
Coefficient:
|
||
|
[[0.81786797]]
|
||
|
Intercept:
|
||
|
[-0.87522391]
|
||
|
Score:
|
||
|
0.7142857142857143
|
||
|
```
|
||
|
|
||
|
---
|
||
|
|
||
|
---
|
||
|
|
||
|
#### Exercise 2: Sigmoid
|
||
|
|
||
|
##### The question 1 is validated if the plot looks like this:
|
||
|
|
||
|
![alt text][ex2q1]
|
||
|
|
||
|
[ex2q1]: ../w2_day2_ex2_q1.png "Scatter plot"
|
||
|
|
||
|
---
|
||
|
|
||
|
---
|
||
|
|
||
|
#### Exercise 3: Decision boundary
|
||
|
|
||
|
##### The exercice is validated is all questions of the exercice are validated
|
||
|
|
||
|
##### The question 1 is validated if the outputted plot looks like this:
|
||
|
|
||
|
![alt text][ex3q1]
|
||
|
|
||
|
[ex3q1]: ../w2_day2_ex3_q1.png "Scatter plot"
|
||
|
|
||
|
##### The question 2 is validated if the coefficient and the intercept of the Logistic Regression are:
|
||
|
|
||
|
```console
|
||
|
Intercept: [-0.98385574]
|
||
|
Coefficient: [[1.18866075]]
|
||
|
```
|
||
|
|
||
|
##### The question 3 is validated if the plot looks like this:
|
||
|
|
||
|
![alt text][ex3q2]
|
||
|
|
||
|
[ex3q2]: ../w2_day2_ex3_q3.png "Scatter plot"
|
||
|
|
||
|
##### The question 4 is validated if `predict_probability` outputs the same probabilities as `predict_proba`. Note that the values have to match one of the class probabilities, not both. To do so, compare the output with: `clf.predict_proba(X)[:,1]`. The shape of the arrays is not important.
|
||
|
|
||
|
##### The question 5 is validated if `predict_class` outputs the same classes as `cfl.predict(X)`. The shape of the arrays is not important.
|
||
|
|
||
|
##### The question 6 is validated if the plot looks like the plot below. As mentioned, it is not required to shift the class prediction to make the plot easier to understand.
|
||
|
|
||
|
![alt text][ex3q6]
|
||
|
|
||
|
[ex3q6]: ../w2_day2_ex3_q5.png "Scatter plot + Logistic regression + predictions"
|
||
|
|
||
|
##### The question 7 is validated if the plot looks like this:
|
||
|
|
||
|
![alt text][ex3q7]
|
||
|
|
||
|
[ex3q7]: ../w2_day2_ex3_q6.png "Logistic regression decision boundary"
|
||
|
|
||
|
---
|
||
|
|
||
|
---
|
||
|
|
||
|
#### Exercise 4: Train test split
|
||
|
|
||
|
##### The exercise is validated is all questions of the exercise are validated
|
||
|
|
||
|
##### The question 1 is validated if X_train, y_train, X_test, y_test match the output below. The proportion of class `1` is **0.125** in the train set and **1.** in the test set.
|
||
|
|
||
|
```console
|
||
|
X_train:
|
||
|
[[ 1 2]
|
||
|
[ 3 4]
|
||
|
[ 5 6]
|
||
|
[ 7 8]
|
||
|
[ 9 10]
|
||
|
[11 12]
|
||
|
[13 14]
|
||
|
[15 16]]
|
||
|
|
||
|
|
||
|
y_train:
|
||
|
[0. 0. 0. 0. 0. 0. 0. 1.]
|
||
|
|
||
|
|
||
|
X_test:
|
||
|
[[17 18]
|
||
|
[19 20]]
|
||
|
|
||
|
|
||
|
y_test:
|
||
|
[1. 1.]
|
||
|
```
|
||
|
|
||
|
##### The question 2 is validated if the proportion of class `1` is **0.3** for both sets.
|
||
|
|
||
|
---
|
||
|
|
||
|
---
|
||
|
|
||
|
#### Exercise 5: Breast Cancer prediction
|
||
|
|
||
|
##### The exercice is validated is all questions of the exercice are validated
|
||
|
|
||
|
##### The question 1 is validated if the proportion of class `Benign` is 0.6552217453505007. It means that if you always predict `Benign` your accuracy would be 66%.
|
||
|
|
||
|
##### The question 2 is validated if the proportion of one of the classes is the approximately the same on the train and test set: ~0.65. In my case:
|
||
|
|
||
|
- test: 0.6571428571428571
|
||
|
- train: 0.6547406082289803
|
||
|
|
||
|
##### The question 3 is validated if the output is:
|
||
|
|
||
|
```console
|
||
|
# Train
|
||
|
Class prediction on train set:
|
||
|
[4 2 4 2 2 2 2 4 2 2]
|
||
|
|
||
|
Probability prediction on train set:
|
||
|
[0.99600415 0.00908666 0.99992744 0.00528803 0.02097154 0.00582772
|
||
|
0.03565076 0.99515326 0.00788281 0.01065484]
|
||
|
|
||
|
Score on train set:
|
||
|
0.9695885509838998
|
||
|
|
||
|
#Test
|
||
|
|
||
|
Class prediction on test set:
|
||
|
[2 2 2 4 2 4 2 2 2 4]
|
||
|
|
||
|
Probability prediction on test set:
|
||
|
[0.01747203 0.22495309 0.00698756 0.54020801 0.0015289 0.99862249
|
||
|
0.33607994 0.01227679 0.00438157 0.99972344]
|
||
|
|
||
|
Score on test set:
|
||
|
0.9642857142857143
|
||
|
|
||
|
```
|
||
|
|
||
|
Only the 10 first predictions are outputted. The score is computed on all the data in the folds.
|
||
|
For some reasons, you may have a different data splitting as mine. The requirement for this question is to have a score on the test set bigger than 92%.
|
||
|
|
||
|
If the score is 1, congratulate you peer, he's just leaked his first target. The target should be dropped from the X_train or X_test ;) !
|
||
|
|
||
|
##### The question 4 is validated if the confusion matrix on the train set is similar to:
|
||
|
|
||
|
```console
|
||
|
array([[357, 9],
|
||
|
[ 8, 185]])
|
||
|
```
|
||
|
|
||
|
and if the confusion matrix on the test set is similar to:
|
||
|
|
||
|
```console
|
||
|
array([[90, 2],
|
||
|
[ 3, 45]])
|
||
|
```
|
||
|
|
||
|
As said, for some reasons, the results may be slightly different from mine because of the data splitting. However, the values in the confusion matrix should be close to these results.
|
||
|
|
||
|
---
|
||
|
|
||
|
---
|
||
|
|
||
|
#### Exercise 6: Multi-class (Optional)
|
||
|
|
||
|
##### The exercice is validated is all questions of the exercice are validated
|
||
|
|
||
|
##### The question 1 is validated if each classifier has as input a binary data as below:
|
||
|
|
||
|
```python
|
||
|
def train(X_train, y_train):
|
||
|
clf = LogisticRegression()
|
||
|
clf1 = LogisticRegression()
|
||
|
clf2 = LogisticRegression()
|
||
|
|
||
|
clf.fit(X_train, y_train == 0)
|
||
|
clf1.fit(X_train, y_train == 1)
|
||
|
clf2.fit(X_train, y_train == 2)
|
||
|
|
||
|
return clf, clf1, clf2
|
||
|
```
|
||
|
|
||
|
##### The question 2 is validated if the predicted classes on the test set are:
|
||
|
|
||
|
```console
|
||
|
array([0, 0, 2, 1, 2, 0, 2, 1, 1, 1, 0, 1, 2, 0, 1, 1, 0, 0, 2, 2, 0, 0,
|
||
|
0, 2, 2, 2, 0, 1, 0, 0])
|
||
|
```
|
||
|
|
||
|
Even if I had this warning `ConvergenceWarning: lbfgs failed to converge (status=1):` I noticed that `LogisticRegression` returns the same output.
|