eslopfer
f8fae31cf0
|
2 years ago | |
---|---|---|
.. | ||
README.md | 2 years ago |
README.md
Exercise 0: Environment and libraries
The exercise is validated if all questions of the exercise are validated.
Activate the virtual environment. If you used conda
run conda activate your_env
Run python --version
Does it print Python 3.x
? x >= 8?
Does import jupyter
, import numpy
, import pandas
, import matplotlib
and import sklearn
run without any error?
Exercise 1: Logistic regression with Scikit-learn
Is the predicted class for question 1 0
?
Are the predicted probabilities for question 2 [0.61450526 0.38549474]
?
Is the output for question 3 like this?
Coefficient:
[[0.81786797]]
Intercept:
[-0.87522391]
Score:
0.7142857142857143
Exercise 2: Sigmoid
Does the plot for question 1 look like this?
Exercise 3: Decision boundary
The exercise is validated if all questions of the exercise are validated
Does the outputted plot for question 1 look like this?
Are the coefficient and the intercept of the Logistic Regression for question 2 these?
Intercept: [-0.98385574]
Coefficient: [[1.18866075]]
Does the plot for question 3 look like this?
For question 4, does predict_probability
output the same probabilities as predict_proba
? Note that the values have to match one of the class probabilities, not both. To do so, compare the output with: clf.predict_proba(X)[:,1]
. The shape of the arrays is not important.
Does predict_class
output the same classes as cfl.predict(X)
for question 5? The shape of the arrays is not important.
Does the plot for question 6 look like the plot below? As mentioned, it is not required to shift the class prediction to make the plot easier to understand.
Does the plot look like this for question 7?
Exercise 4: Train test split
The exercise is validated if all questions of the exercise are validated
Do X_train, y_train, X_test, y_test match the output below for question 1? The proportion of class 1
is 0.125 in the train set and 1. in the test set.
X_train:
[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]
[11 12]
[13 14]
[15 16]]
y_train:
[0. 0. 0. 0. 0. 0. 0. 1.]
X_test:
[[17 18]
[19 20]]
y_test:
[1. 1.]
Is the proportion of class 1
0.3 for both sets in question 2?
Exercise 5: Breast Cancer prediction
The exercise is validated if all questions of the exercise are validated
Is the proportion of class Benign
0.6552217453505007 for question 1? It means that if you always predict Benign
your accuracy would be 66%.
Is the proportion of one of the classes approximately the same on the train and test set: ~0.65 for question 2? In my case:
- test: 0.6571428571428571
- train: 0.6547406082289803
Is this the output for question 3?
# Train
Class prediction on train set:
[4 2 4 2 2 2 2 4 2 2]
Probability prediction on train set:
[0.99600415 0.00908666 0.99992744 0.00528803 0.02097154 0.00582772
0.03565076 0.99515326 0.00788281 0.01065484]
Score on train set:
0.9695885509838998
#Test
Class prediction on test set:
[2 2 2 4 2 4 2 2 2 4]
Probability prediction on test set:
[0.01747203 0.22495309 0.00698756 0.54020801 0.0015289 0.99862249
0.33607994 0.01227679 0.00438157 0.99972344]
Score on test set:
0.9642857142857143
Only the 10 first predictions are outputted. The score is computed on all the data in the folds. For some reasons, you may have a different data splitting as mine. The requirement for this question is to have a score on the test set bigger than 92%.
If the score is 1, congratulate you peer, he's just leaked his first target. The target should be dropped from the X_train or X_test ;) !
Is the confusion matrix on the train set similar to this in question 4?
array([[357, 9],
[ 8, 185]])
and if the confusion matrix on the test set is similar to:
array([[90, 2],
[ 3, 45]])
As said, for some reasons, the results may be slightly different from mine because of the data splitting. However, the values in the confusion matrix should be close to these results.
Bonus
Exercise 6: Multi-class (Optional)
The exercise is validated if all questions of the exercise are validated
+Does each classifier have as input a binary data as below for question 1?
def train(X_train, y_train):
clf = LogisticRegression()
clf1 = LogisticRegression()
clf2 = LogisticRegression()
clf.fit(X_train, y_train == 0)
clf1.fit(X_train, y_train == 1)
clf2.fit(X_train, y_train == 2)
return clf, clf1, clf2
+Are this the predicted classes on the test set for question 2?
array([0, 0, 2, 1, 2, 0, 2, 1, 1, 1, 0, 1, 2, 0, 1, 1, 0, 0, 2, 2, 0, 0,
0, 2, 2, 2, 0, 1, 0, 0])
Even if I had this warning ConvergenceWarning: lbfgs failed to converge (status=1):
I noticed that LogisticRegression
returns the same output.