##### There are at least 300 news articles stored in the file system or the database.
##### Run the scraper with `python scraper_news.py` and fetch 300 articles. If needed, stop the program manually when enough data has been retrieved.
##### Run the scraper with `python scraper_news.py` and fetch 3 documents. The scraper is not expected to fetch 3 documents and stop by itself, you can stop it manually.
###### Does it run without any error and store the 3 files as expected?
###### Does it run without any error and store the articles as described in the subject?
##### Topic classifier
@ -28,26 +26,24 @@
###### Does the topic classifier score an accuracy higher than 95% on the given datasets?
##### Scandal detection
###### Does the `README.md` explain the choice of embeddings and distance?
##### NLP engine output on 300 articles
###### Does the DataFrame flag the top 10 articles with the highest likelihood of environmental scandal?
###### Can you run `python nlp_enriched_news.py` without any error?
###### Is the distance or similarity saved in the DataFrame?
###### Does the DataFrame saved in the `csv` file contain 300 different rows?
##### NLP engine output on 300 articles
###### Are the columns of the DataFrame as defined in the subject `Deliverable` section?
###### Does the DataFrame contain 300 different rows?
###### Does the output of the NLP engine correspond to the output defined in the subject `Deliverable` section?
###### Are the columns of the DataFrame as defined in the subject `Deliverable` section?
##### Analyse the output: relevance of the topic(s) matched, relevance of the sentiment, relevance of the scandal detected (if detected on the three articles) and relevance of the company(ies) matched.
##### Analyse the DataFrame with 300 articles: relevance of the topics matched, relevance of the sentiment, relevance of the scandal detected and relevance of the companies matched. The algorithms are not 100% accurate, so you should expect a few issues in the results.
###### Is the information presented consistent and accurate?
##### NLP engine on 3 articles
##### Scandal detection
###### Can you run `python nlp_enriched_news.py` without any error?
###### Does the `README.md` explain the choice of embeddings and distance?
###### Does the output of the NLP engine correspond to the output defined in the subject `Deliverable` section?
###### Does the DataFrame flag the top 10 articles with the highest likelihood of environmental scandal?
##### Analyse the output: relevance of the topic(s) matched, relevance of the sentiment, relevance of the scandal detected (if detected on the three articles) and relevance of the company(ies) matched.
###### Is the distance or similarity saved in the DataFrame?