docs(machine-learning-pipeline): fix audits format

2 years ago · 6e6b8bba86
1 changed files with 22 additions and 22 deletions
--- a/subjects/ai/pipeline/audit/README.md
+++ b/subjects/ai/pipeline/audit/README.md
@ -1,6 +1,6 @@
 #### Exercise 0: Environment and libraries

-##### The exercice is validated is all questions of the exercice are validated.
+##### The exercise is validated is all questions of the exercise are validated.

 ##### Activate the virtual environment. If you used `conda` run `conda activate your_env`.

@ -8,7 +8,7 @@

 ###### Does it print `Python 3.x`? x >= 8

-##### Does `import jupyter`, `import numpy`, `import pandas`, `import matplotlib` and `import sklearn` run without any error?
+###### Do `import jupyter`, `import numpy`, `import pandas`, `import matplotlib` and `import sklearn` run without any error?

 ---

@ -16,15 +16,15 @@

 #### Exercise 1: Imputer 1

-##### The exercise is validated is all questions of the exercise are validated.
+##### The exercise is validated if all questions of the exercise are validated.

-##### The question 1 is validated if the `imp_mean.statistics_` returns:
+###### For question 1 is validated if the `imp_mean.statistics_` returns:

 ```console
    array([ 4., 13.,  6.])
 ```

-##### The question 2 is validated if the filled train set is:
+###### For question 2, is the filled train set the following?

 ```console
    array([[ 7.,  6.,  5.],
@ -32,7 +32,7 @@
        [ 1., 20.,  8.]])
 ```

-##### The question 3 is validated if the filled test set is:
+###### For question 3, is the filled test set the following?

 ```console
    array([[ 4.,  1.,  2.],
@ -46,9 +46,9 @@

 #### Exercise 2: Scaler

-##### The exercise is validated is all. questions of the exercise are validated.
+##### The exercise is validated if all questions of the exercise are validated.

-##### The question 1 is validated if the scaled train set is as below. And by definition, the mean on the axis 0 should be `array([0., 0., 0.])` and the standard deviation on the axis 0 should be `array([1., 1., 1.])`.
+###### For question 1, is the scaled train set as below? And by definition, the mean on the axis 0 should be `array([0., 0., 0.])` and the standard deviation on the axis 0 should be `array([1., 1., 1.])`.

 ```console
 array([[ 0.        , -1.22474487,  1.33630621],
@ -56,7 +56,7 @@ array([[ 0.        , -1.22474487,  1.33630621],
       [-1.22474487,  1.22474487, -1.06904497]])
 ```

-##### The question 2 is validated if the scaled test set is:
+###### For question 2, is the scaled test set the following?

 ```console
 array([[ 1.22474487, -1.22474487,  0.53452248],
@ -70,9 +70,9 @@ array([[ 1.22474487, -1.22474487,  0.53452248],

 #### Exercise 3: One hot Encoder

-##### The exercise is validated is all questions of the exercise are validated.
+##### The exercise is validated if all questions of the exercise are validated.

-##### The question 1 is validated if the output is:
+###### For question 1, is the output the following?

    |    |   ('C++',) |   ('Java',) |   ('Python',) |
    |---:|-----------:|------------:|--------------:|
@ -81,7 +81,7 @@ array([[ 1.22474487, -1.22474487,  0.53452248],
    |  2 |          0 |           1 |             0 |
    |  3 |          1 |           0 |             0 |

-##### The question 2 is validated if the output is:
+###### For question 2, is the output the following?

    |    |   ('C++',) |   ('Java',) |   ('Python',) |
    |---:|-----------:|------------:|--------------:|
@ -96,9 +96,9 @@ array([[ 1.22474487, -1.22474487,  0.53452248],

 #### Exercise 4: Ordinal Encoder

-##### The exercise is validated is all questions of the exercise are validated
+##### The exercise is validated if all questions of the exercise are validated

-##### The question 1 is validated if the output of the Ordinal Encoder on the train set is:
+###### For question 1, is the output of the Ordinal Encoder on the train set the following?

 ```console
 array([[2.],
@ -108,7 +108,7 @@ array([[2.],

 Check that `enc.categories_` returns`[array(['bad', 'neutral', 'good'], dtype=object)]`.

-##### The question 2 is validated if the output of the Ordinal Encoder on the test set is:
+###### For question 2, is the output of the Ordinal Encoder on the test set the following?

 ```console
 array([[2.],
@ -122,9 +122,9 @@ array([[2.],

 #### Exercise 5: Categorical variables

-##### The exercise is validated is all questions of the exercise are validated
+##### The exercise is validated if all questions of the exercise are validated

-##### The question 1 is validated if the number of unique values per feature outputted are:
+###### For question 1, are the number of unique values per feature outputted the following?

 ```console
 age             6
@ -139,7 +139,7 @@ irradiat        2
 dtype: int64
 ```

-##### The question 2 is validated if the transformed test set by the `OneHotEncoder` fitted on the train set is as below. Make sure the transformer takes as input a dataframe with the columns in the order defined `['node-caps' , 'breast', 'breast-quad', 'irradiat']` :
+###### For question 2, is the transformed test set by the `OneHotEncoder` fitted on the train set as below? Make sure the transformer takes as input a dataframe with the columns in the order defined `['node-caps' , 'breast', 'breast-quad', 'irradiat']` :

 ```console
 #First 10 rows:
@ -157,7 +157,7 @@ array([[1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0.],

 ```

-##### The question 3 is validated if the transformed test set by the `OrdinalEncoder` fitted on the train set is as below with the columns ordered as `["menopause", "age", "tumor-size","inv-nodes", "deg-malig"]`:
+###### For question 3, is the transformed test set by the `OrdinalEncoder` fitted on the train set as below with the columns ordered as `["menopause", "age", "tumor-size","inv-nodes", "deg-malig"]`?

 ```console
 #First 10 rows:
@ -175,7 +175,7 @@ array([[1., 2., 5., 0., 1.],

 ```

-##### The question 4 is validated if the column transformer transformed that is fitted on the X_train, transformed the X_test as:
+###### For question 4, is the column transformer transformed that is fitted on the X_train, transformed the X_test as below?

 ```console
 # First 2 rows:
@ -189,7 +189,7 @@ array([[1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 2., 5., 0., 1.],

 #### Exercise 6: Pipeline

-##### The question 1 is validated if the prediction on the test set are:
+###### For question 1, are the predictions on the test set the following?

 ```console
 array([0, 0, 2, 1, 2, 0, 2, 1, 1, 1, 0, 1, 2, 0, 1, 1, 0, 0, 2, 2, 0, 0,
@ -197,6 +197,6 @@ array([0, 0, 2, 1, 2, 0, 2, 1, 1, 1, 0, 1, 2, 0, 1, 1, 0, 0, 2, 2, 0, 0,
       0, 1, 1, 1, 1, 1])
 ```

-and the score on the test set is **98%**.
+and is the score on the test set **98%**?

 **Note: Keep in mind that having a 98% accuracy is not common when working with real life data. Every time you have a score > 97% check that there's no leakage in the data. On financial data set, the ratio signal to noise is low. Trying to forecast stock prices is a difficult problem. Having an accuracy higher than 70% should be interpreted as a warning to check data leakage!**