You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
eslopfer f8fae31cf0 docs(ai-audits): fix format errors, rephrase, and typos 1 year ago
..
README.md docs(ai-audits): fix format errors, rephrase, and typos 1 year ago

README.md

Exercise 0: Environment and libraries

The exercise is validated if all questions of the exercise are validated.
Activate the virtual environment. If you used conda run conda activate your_env
Run python --version
Does it print Python 3.x? x >= 8?
Does import jupyter, import numpy, import pandas, import matplotlib and import sklearn run without any error?


Exercise 1: Logistic regression with Scikit-learn

Is the predicted class for question 1 0?
Are the predicted probabilities for question 2 [0.61450526 0.38549474]?
Is the output for question 3 like this?
Coefficient:
 [[0.81786797]]
Intercept:
 [-0.87522391]
Score:
 0.7142857142857143


Exercise 2: Sigmoid

Does the plot for question 1 look like this?

alt text



Exercise 3: Decision boundary

The exercise is validated if all questions of the exercise are validated
Does the outputted plot for question 1 look like this?

alt text

Are the coefficient and the intercept of the Logistic Regression for question 2 these?
Intercept:  [-0.98385574]
Coefficient:  [[1.18866075]]
Does the plot for question 3 look like this?

alt text

For question 4, does predict_probability output the same probabilities as predict_proba? Note that the values have to match one of the class probabilities, not both. To do so, compare the output with: clf.predict_proba(X)[:,1]. The shape of the arrays is not important.
Does predict_class output the same classes as cfl.predict(X) for question 5? The shape of the arrays is not important.
Does the plot for question 6 look like the plot below? As mentioned, it is not required to shift the class prediction to make the plot easier to understand.

alt text

Does the plot look like this for question 7?

alt text



Exercise 4: Train test split

The exercise is validated if all questions of the exercise are validated
Do X_train, y_train, X_test, y_test match the output below for question 1? The proportion of class 1 is 0.125 in the train set and 1. in the test set.
X_train:
 [[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]
 [11 12]
 [13 14]
 [15 16]]


y_train:
 [0. 0. 0. 0. 0. 0. 0. 1.]


X_test:
 [[17 18]
 [19 20]]


y_test:
 [1. 1.]
Is the proportion of class 1 0.3 for both sets in question 2?


Exercise 5: Breast Cancer prediction

The exercise is validated if all questions of the exercise are validated
Is the proportion of class Benign 0.6552217453505007 for question 1? It means that if you always predict Benign your accuracy would be 66%.
Is the proportion of one of the classes approximately the same on the train and test set: ~0.65 for question 2? In my case:
  • test: 0.6571428571428571
  • train: 0.6547406082289803
Is this the output for question 3?
# Train
Class prediction on train set:
 [4 2 4 2 2 2 2 4 2 2]

Probability prediction on train set:
 [0.99600415 0.00908666 0.99992744 0.00528803 0.02097154 0.00582772
 0.03565076 0.99515326 0.00788281 0.01065484]

Score on train set:
 0.9695885509838998

 #Test

 Class prediction on test set:
 [2 2 2 4 2 4 2 2 2 4]

Probability prediction on test set:
 [0.01747203 0.22495309 0.00698756 0.54020801 0.0015289  0.99862249
 0.33607994 0.01227679 0.00438157 0.99972344]

Score on test set:
 0.9642857142857143

Only the 10 first predictions are outputted. The score is computed on all the data in the folds. For some reasons, you may have a different data splitting as mine. The requirement for this question is to have a score on the test set bigger than 92%.

If the score is 1, congratulate you peer, he's just leaked his first target. The target should be dropped from the X_train or X_test ;) !

Is the confusion matrix on the train set similar to this in question 4?
array([[357,   9],
       [  8, 185]])

and if the confusion matrix on the test set is similar to:

array([[90,  2],
       [ 3, 45]])

As said, for some reasons, the results may be slightly different from mine because of the data splitting. However, the values in the confusion matrix should be close to these results.



Bonus

Exercise 6: Multi-class (Optional)

The exercise is validated if all questions of the exercise are validated
+Does each classifier have as input a binary data as below for question 1?
def train(X_train, y_train):
       clf = LogisticRegression()
       clf1 = LogisticRegression()
       clf2 = LogisticRegression()

       clf.fit(X_train, y_train == 0)
       clf1.fit(X_train, y_train == 1)
       clf2.fit(X_train, y_train == 2)

       return clf, clf1, clf2
+Are this the predicted classes on the test set for question 2?
array([0, 0, 2, 1, 2, 0, 2, 1, 1, 1, 0, 1, 2, 0, 1, 1, 0, 0, 2, 2, 0, 0,
       0, 2, 2, 2, 0, 1, 0, 0])

Even if I had this warning ConvergenceWarning: lbfgs failed to converge (status=1): I noticed that LogisticRegression returns the same output.