From e5fb4783594ff3f8fae5b6315a54dec1c7ececa5 Mon Sep 17 00:00:00 2001 From: lee Date: Fri, 14 Aug 2020 15:15:09 +0100 Subject: [PATCH 1/5] wget --- subjects/wget/README.md | 139 ++++++++++++++++++++++++++++++++++ subjects/wget/audit/README.md | 0 2 files changed, 139 insertions(+) create mode 100644 subjects/wget/README.md create mode 100644 subjects/wget/audit/README.md diff --git a/subjects/wget/README.md b/subjects/wget/README.md new file mode 100644 index 000000000..aa6b49b51 --- /dev/null +++ b/subjects/wget/README.md @@ -0,0 +1,139 @@ +## wget + +### Objectives + +This project objective consists on recreating some functionalities of [`wget`](https://www.gnu.org/software/wget/manual/wget.html) using **Go** + +This functionalities will include: + +- The normal usage of `wget`, downloading a file given an URL, example: `wget https://some_url.ogr/file.zip` +- Downloading a single file and saving it under a different name +- Downloading and saving the file in a specific directory +- Set the download speed, limiting the rate speed of a download +- Continue interrupted downloads +- Downloading a file in background +- Downloading multiple files at the same time, by reading a file containing multiple download links. All this asynchronously +- Main feature, will be to download an entire website, [mirroring a website](https://en.wikipedia.org/wiki/Mirror_site). + +### Introduction + +Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. + +To see more about wget you can visit the manual by using the command `man wget`, or you can visit the website [here](https://www.gnu.org/software/wget/manual/wget.html) + +#### Usage + +Your program must have as arguments the link from were you want to download the file, for instance: + +```console +student@student$ ./wget https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg +``` + +The program should be able to give feedback, displaying the: + +- Time that the program started, this must include the following format **yyyy-mm-dd hh:mm:ss** +- Status of the request. For the program to proceed to the download it must present a response to the request as status OK (`200 OK`) if not it should say which status it got and finish the operation with an error warning +- Size of the content downloaded, The content length can be presented as raw (bytes) and rounded to Mb or Gb depending on the size of the file downloaded +- Name and path of the file that is about to be saved +- A progress bar, having the following: + - A amount of `KiB` that was downloaded + - A percentage of how much was downloaded + - Time that remains to finish the download +- Time the download finished respecting the previous format + +It should look something like this + +```console +student@student$ go run main.go https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg +start at 2017-10-14 03:46:06 +sending request, awaiting response... status 200 OK +content size: 56370 [~0.06MB] +saving file to: ./EMtmPFLWkAA8CIS.jpg + 55.05 KiB / 55.05 KiB [================================================================================================================] 100.00% 1.24 MiB/s 0s + +Downloaded [https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg] +finished at 2017-10-14 03:46:07 +``` + +#### Flags + +Your program should be able to handle different flags. + +1. Download a file and save it under a different name by using the flag `-O` followed by the name you wish to save the file, example: + +```console +student@student$ go run main.go -O=meme.jpg https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg +start at 2017-10-14 03:46:06 +sending request, awaiting response... status 200 OK +content size: 56370 [~0.06MB] +saving file to: ./meme.jpg + 55.05 KiB / 55.05 KiB [================================================================================================================] 100.00% 1.24 MiB/s 0s + +Downloaded [https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg] +finished at 2017-10-14 03:46:07 +student@student$ ls -l +-rw-r--r-- 1 student student 56370 ago 13 16:59 meme.jpg +-rw-r--r-- 1 student student 11489 ago 13 10:28 main.go +``` + +--- + +2. It should also handle the path to were your file is going to be saved using the flag `-P` followed by the path to where you want to save the file, example + +```console +student@student$ go run main.go -P=~/Downloads/ -O=meme.jpg https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg +start at 2017-10-14 03:46:06 +sending request, awaiting response... status 200 OK +content size: 56370 [~0.06MB] +saving file to: ~/Downloads/meme.jpg + 55.05 KiB / 55.05 KiB [================================================================================================================] 100.00% 1.24 MiB/s 0s + +Downloaded [https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg] +finished at 2017-10-14 03:46:07 +student@student$ ls -l ~/Downloads/meme.jpg +-rw-r--r-- 1 student student 56370 ago 13 16:59 /home/student/Downloads/meme.jpg +``` + +--- + +3. The program should handle speed limit, basically the program can control the speed of the download by using the flag `--rate-limit`. If you download a huge file you can limit the speed of your download, preventing the program from using the full possible bandwidth of your connection, example: + +```console +student@student$ go run main.go --rate-limit=400k https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg +``` + +--- + +4. Downloading different files should be possible, for this the program will receive `-i` flag followed by a file name that will contain all links that are to be downloaded. Example: + +```console +student@student$ ls +download.txt main.go +student@student$ cat download.txt +http://ipv4.download.thinkbroadband.com/20MB.zip +http://ipv4.download.thinkbroadband.com/10MB.zip +student@student$ go run main -i=download.txt +content size: [10485760, 20971520] +finished 10MB.zip +finished 20MB.zip + +Download finished: [http://ipv4.download.thinkbroadband.com/20MB.zip http://ipv4.download.thinkbroadband.com/10MB.zip] + +``` + +The Downloads should work asynchronously, it should download both files at the same time. You are free to display what you want for this option. + +--- + +5. [**Mirror a website**](https://en.wikipedia.org/wiki/Mirror_site), this option should download the entire website being possible to use "part" of the website offline and for other useful [reasons](https://www.quora.com/How-exactly-does-Mirror-Site-works-and-how-it-is-done). For this you will have to download the websites file system and save it into a folder that will have the domain name. Example: `http://www.example.com`, the folder name will be `www.example.com` containing every file from the mirrored website. + +--- + +This project will help you learn about: + +- GNU Wget +- HTTP +- [FTP](https://en.wikipedia.org/wiki/File_Transfer_Protocol) +- Algorithms +- Mirror websites +- File system(fs) diff --git a/subjects/wget/audit/README.md b/subjects/wget/audit/README.md new file mode 100644 index 000000000..e69de29bb From 2fdcbb7517918d8bed693933634e032cbe26a5bb Mon Sep 17 00:00:00 2001 From: lee Date: Mon, 17 Aug 2020 18:00:13 +0100 Subject: [PATCH 2/5] audit --- subjects/wget/README.md | 33 +++++++++---- subjects/wget/audit/README.md | 89 +++++++++++++++++++++++++++++++++++ 2 files changed, 112 insertions(+), 10 deletions(-) diff --git a/subjects/wget/README.md b/subjects/wget/README.md index aa6b49b51..be844ffb6 100644 --- a/subjects/wget/README.md +++ b/subjects/wget/README.md @@ -4,7 +4,7 @@ This project objective consists on recreating some functionalities of [`wget`](https://www.gnu.org/software/wget/manual/wget.html) using **Go** -This functionalities will include: +These functionalities will include: - The normal usage of `wget`, downloading a file given an URL, example: `wget https://some_url.ogr/file.zip` - Downloading a single file and saving it under a different name @@ -12,8 +12,8 @@ This functionalities will include: - Set the download speed, limiting the rate speed of a download - Continue interrupted downloads - Downloading a file in background -- Downloading multiple files at the same time, by reading a file containing multiple download links. All this asynchronously -- Main feature, will be to download an entire website, [mirroring a website](https://en.wikipedia.org/wiki/Mirror_site). +- Downloading multiple files at the same time, by reading a file containing multiple download links asynchronously +- Main feature, will be to download an entire website, [mirroring a website](https://en.wikipedia.org/wiki/Mirror_site) ### Introduction @@ -23,7 +23,7 @@ To see more about wget you can visit the manual by using the command `man wget`, #### Usage -Your program must have as arguments the link from were you want to download the file, for instance: +Your program must have as arguments the link from where you want to download the file, for instance: ```console student@student$ ./wget https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg @@ -33,13 +33,13 @@ The program should be able to give feedback, displaying the: - Time that the program started, this must include the following format **yyyy-mm-dd hh:mm:ss** - Status of the request. For the program to proceed to the download it must present a response to the request as status OK (`200 OK`) if not it should say which status it got and finish the operation with an error warning -- Size of the content downloaded, The content length can be presented as raw (bytes) and rounded to Mb or Gb depending on the size of the file downloaded +- Size of the content downloaded, the content length can be presented as raw (bytes) and rounded to Mb or Gb depending on the size of the file downloaded - Name and path of the file that is about to be saved - A progress bar, having the following: - - A amount of `KiB` that was downloaded + - A amount of `KiB` or `MiB` (depending on the download size) that was downloaded - A percentage of how much was downloaded - Time that remains to finish the download -- Time the download finished respecting the previous format +- Time that the download finished respecting the previous format It should look something like this @@ -78,7 +78,7 @@ student@student$ ls -l --- -2. It should also handle the path to were your file is going to be saved using the flag `-P` followed by the path to where you want to save the file, example +2. It should also handle the path to where your file is going to be saved using the flag `-P` followed by the path to where you want to save the file, example: ```console student@student$ go run main.go -P=~/Downloads/ -O=meme.jpg https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg @@ -102,9 +102,11 @@ student@student$ ls -l ~/Downloads/meme.jpg student@student$ go run main.go --rate-limit=400k https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg ``` +This flag should accept different value types, example: k and M. So you can put the rate limit as `rate-limit=200k` or `rate-limit=2M` + --- -4. Downloading different files should be possible, for this the program will receive `-i` flag followed by a file name that will contain all links that are to be downloaded. Example: +4. Downloading different files should be possible. For this the program will receive `-i` flag followed by a file name that will contain all links that are to be downloaded. Example: ```console student@student$ ls @@ -125,7 +127,18 @@ The Downloads should work asynchronously, it should download both files at the s --- -5. [**Mirror a website**](https://en.wikipedia.org/wiki/Mirror_site), this option should download the entire website being possible to use "part" of the website offline and for other useful [reasons](https://www.quora.com/How-exactly-does-Mirror-Site-works-and-how-it-is-done). For this you will have to download the websites file system and save it into a folder that will have the domain name. Example: `http://www.example.com`, the folder name will be `www.example.com` containing every file from the mirrored website. +5. [**Mirror a website**](https://en.wikipedia.org/wiki/Mirror_site), this option should download the entire website being possible to use "part" of the website offline and for other useful [reasons](https://www.quora.com/How-exactly-does-Mirror-Site-works-and-how-it-is-done). For this you will have to download the websites file system and save it into a folder that will have the domain name. Example: `http://www.example.com`, the folder name will be `www.example.com` containing every file from the mirrored website. The flag should be `--mirror`. + +To mirror a website you will have to implement the following `wget` flags so that the web mirror is complete (you do not need to do the literal flags, but just the theory behind it, so your flag `--mirror` need to behave like the following wget flags combined): + +- [`--mirror`](https://www.gnu.org/software/wget/manual/wget.html) download recursive +- [`--convert-links`](https://www.gnu.org/software/wget/manual/wget.html), after the download is complete it will convert all links in the document to make them suitable for local viewing +- [`--page-requisites`](https://www.gnu.org/software/wget/manual/wget.html), downloads all files that are necessary to properly display a given HTML page +- [`--no-parent`](https://www.gnu.org/software/wget/manual/wget.html), this will not let the program ascend to the parent directory when retrieving + +### Hint + +You can take a look into the [html package](https://godoc.org/golang.org/x/net/html) for some help --- diff --git a/subjects/wget/audit/README.md b/subjects/wget/audit/README.md index e69de29bb..d57a60546 100644 --- a/subjects/wget/audit/README.md +++ b/subjects/wget/audit/README.md @@ -0,0 +1,89 @@ +#### Functional + +##### Try to run the following command "`./wget https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg`" + +###### Did the program download the file "`EMtmPFLWkAA8CIS.jpg`"? + +##### Try to run the following command with a link at your choice "`./wget `" + +###### Did the program download the expected file? + +##### Try to run the following command "`./wget https://golang.org/dl/go1.15.linux-amd64.tar.gz`" + +###### Did the program download the file "`go1.15.linux-amd64.tar.gz`"? + +###### Did the program displayed the start time? + +###### Did the start time and the end time respected the format? (yyyy-mm-dd hh:mm:ss) + +###### Did the program displayed the status of the response? (200 OK) + +###### Did the Program displayed the content length of the download? + +###### Is the content length displayed as raw (bytes) and rounded (Mb or Gb)? + +###### Did the program displayed the name and path of the file that was saved? + +##### Try to download a big file, for example: "`./wget http://ipv4.download.thinkbroadband.com/100MB.zip`" + +###### Did the program download the expected file? + +###### While downloading, did the progress bar show the amount that is being downloaded? (KiB or MiB) + +###### While downloading, did the progress bar show the percentage that is being downloaded? + +###### While downloading, did the progress bar show the time that remains to finish the download? + +###### While downloading, did the progress bar progressed smoothly (kept up with the time that the download took to finish)? + +##### Try to run the following command, "`./wget -O=test_20MB.zip http://ipv4.download.thinkbroadband.com/20MB.zip`" + +###### Did the program downloaded the file with the name "`test_20MB.zip`"? + +##### Try to run the following command, "`./wget -O=test_20MB.zip -P=~/Downloads/ http://ipv4.download.thinkbroadband.com/20MB.zip`", then go to the folder "`~/Downloads/`" + +###### Can you see the file downloaded? + +##### Try to run the following command, "`./wget --rate-limit=300k http://ipv4.download.thinkbroadband.com/20MB.zip`" + +###### Was the download speed always lower than 300KB/s? + +##### Try to run the following command, "`./wget --rate-limit=700k http://ipv4.download.thinkbroadband.com/20MB.zip`" + +###### Was the download speed always lower than 700KB/s? + +##### Try to run the following command, "`./wget --rate-limit=2M http://ipv4.download.thinkbroadband.com/20MB.zip`" + +###### Was the download speed always lower than 2MB/s? + +##### Try to create a text file with the name "`downloads.txt`" and save into it the links below. Then run the command "`./wget -i=downloads.txt`" + +``` +https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg +http://ipv4.download.thinkbroadband.com/20MB.zip +http://ipv4.download.thinkbroadband.com/10MB.zip +``` + +###### Did the program download all the files from the downloads.txt file? (EMtmPFLWkAA8CIS.jpg, 20MB.zip, 10MB.zip) + +###### Did the downloads occurred in an asynchronous way? (tip: look to the download order) + +#### Mirror + +##### Try to run the following command "`./wget --mirror http://corndog.io/`", then try to open the "`index.html`" with a browser + +###### Is the site working? + +##### Try to run the following command "`./wget --mirror https://theuselessweb.com/`" + +###### Is the site working? + +##### Try to run the following command to mirror a website at your choice "`./wget --mirror `" + +###### Did the program mirror the website? + +#### Bonus + +###### +Does the project runs quickly and effectively? (Favoring recursive, no unnecessary data requests, etc) + +###### +Does the code obey the [good practices](https://public.01-edu.org/subjects/good-practices/README.md)? From b97e14c9daa8de78e39445c07955f3bab5b5e24b Mon Sep 17 00:00:00 2001 From: OGordoo Date: Tue, 18 Aug 2020 15:01:18 +0100 Subject: [PATCH 3/5] typos correction --- subjects/wget/README.md | 32 ++++++++++++++++---------------- subjects/wget/audit/README.md | 32 ++++++++++++++++---------------- 2 files changed, 32 insertions(+), 32 deletions(-) diff --git a/subjects/wget/README.md b/subjects/wget/README.md index be844ffb6..e5c550eff 100644 --- a/subjects/wget/README.md +++ b/subjects/wget/README.md @@ -2,24 +2,24 @@ ### Objectives -This project objective consists on recreating some functionalities of [`wget`](https://www.gnu.org/software/wget/manual/wget.html) using **Go** +This project objective consists on recreating some functionalities of [`wget`](https://www.gnu.org/software/wget/manual/wget.html) using **Go**. -These functionalities will include: +These functionalities will consist in: -- The normal usage of `wget`, downloading a file given an URL, example: `wget https://some_url.ogr/file.zip` +- The normal usage of `wget`: downloading a file given an URL, example: `wget https://some_url.ogr/file.zip` - Downloading a single file and saving it under a different name - Downloading and saving the file in a specific directory - Set the download speed, limiting the rate speed of a download -- Continue interrupted downloads +- Continuing interrupted downloads - Downloading a file in background - Downloading multiple files at the same time, by reading a file containing multiple download links asynchronously -- Main feature, will be to download an entire website, [mirroring a website](https://en.wikipedia.org/wiki/Mirror_site) +- Main feature will be to download an entire website, [mirroring a website](https://en.wikipedia.org/wiki/Mirror_site) ### Introduction Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. -To see more about wget you can visit the manual by using the command `man wget`, or you can visit the website [here](https://www.gnu.org/software/wget/manual/wget.html) +To see more about wget you can visit the manual by using the command `man wget`, or you can visit the website [here](https://www.gnu.org/software/wget/manual/wget.html). #### Usage @@ -31,9 +31,9 @@ student@student$ ./wget https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg The program should be able to give feedback, displaying the: -- Time that the program started, this must include the following format **yyyy-mm-dd hh:mm:ss** -- Status of the request. For the program to proceed to the download it must present a response to the request as status OK (`200 OK`) if not it should say which status it got and finish the operation with an error warning -- Size of the content downloaded, the content length can be presented as raw (bytes) and rounded to Mb or Gb depending on the size of the file downloaded +- Time that the program started: it must have the following format **yyyy-mm-dd hh:mm:ss** +- Status of the request. For the program to proceed to the download, it must present a response to the request as status OK (`200 OK`) if not, it should say which status it got and finish the operation with an error warning +- Size of the content downloaded: the content length can be presented as raw (bytes) and rounded to Mb or Gb depending on the size of the file downloaded - Name and path of the file that is about to be saved - A progress bar, having the following: - A amount of `KiB` or `MiB` (depending on the download size) that was downloaded @@ -96,7 +96,7 @@ student@student$ ls -l ~/Downloads/meme.jpg --- -3. The program should handle speed limit, basically the program can control the speed of the download by using the flag `--rate-limit`. If you download a huge file you can limit the speed of your download, preventing the program from using the full possible bandwidth of your connection, example: +3. The program should handle speed limit. Basically the program can control the speed of the download by using the flag `--rate-limit`. If you download a huge file you can limit the speed of your download, preventing the program from using the full possible bandwidth of your connection, example: ```console student@student$ go run main.go --rate-limit=400k https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg @@ -106,7 +106,7 @@ This flag should accept different value types, example: k and M. So you can put --- -4. Downloading different files should be possible. For this the program will receive `-i` flag followed by a file name that will contain all links that are to be downloaded. Example: +4. Downloading different files should be possible. For this the program will receive the `-i` flag followed by a file name that will contain all links that are to be downloaded. Example: ```console student@student$ ls @@ -127,14 +127,14 @@ The Downloads should work asynchronously, it should download both files at the s --- -5. [**Mirror a website**](https://en.wikipedia.org/wiki/Mirror_site), this option should download the entire website being possible to use "part" of the website offline and for other useful [reasons](https://www.quora.com/How-exactly-does-Mirror-Site-works-and-how-it-is-done). For this you will have to download the websites file system and save it into a folder that will have the domain name. Example: `http://www.example.com`, the folder name will be `www.example.com` containing every file from the mirrored website. The flag should be `--mirror`. +5. [**Mirror a website**](https://en.wikipedia.org/wiki/Mirror_site). This option should download the entire website being possible to use "part" of the website offline and for other useful [reasons](https://www.quora.com/How-exactly-does-Mirror-Site-works-and-how-it-is-done). For this you will have to download the website file system and save it into a folder that will have the domain name. Example: `http://www.example.com`, will be stored in a folder with the name `www.example.com` containing every file from the mirrored website. The flag should be `--mirror`. To mirror a website you will have to implement the following `wget` flags so that the web mirror is complete (you do not need to do the literal flags, but just the theory behind it, so your flag `--mirror` need to behave like the following wget flags combined): -- [`--mirror`](https://www.gnu.org/software/wget/manual/wget.html) download recursive -- [`--convert-links`](https://www.gnu.org/software/wget/manual/wget.html), after the download is complete it will convert all links in the document to make them suitable for local viewing -- [`--page-requisites`](https://www.gnu.org/software/wget/manual/wget.html), downloads all files that are necessary to properly display a given HTML page -- [`--no-parent`](https://www.gnu.org/software/wget/manual/wget.html), this will not let the program ascend to the parent directory when retrieving +- [`--mirror`](https://www.gnu.org/software/wget/manual/wget.html): download recursive +- [`--convert-links`](https://www.gnu.org/software/wget/manual/wget.html): after the download is complete it will convert all links in the document to make them suitable for local viewing +- [`--page-requisites`](https://www.gnu.org/software/wget/manual/wget.html): downloads all files that are necessary to properly display a given HTML page +- [`--no-parent`](https://www.gnu.org/software/wget/manual/wget.html): this will not let the program ascend to the parent directory when retrieving ### Hint diff --git a/subjects/wget/audit/README.md b/subjects/wget/audit/README.md index d57a60546..dc61f182f 100644 --- a/subjects/wget/audit/README.md +++ b/subjects/wget/audit/README.md @@ -1,16 +1,16 @@ #### Functional -##### Try to run the following command "`./wget https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg`" +##### Try to run the following command `"./wget https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg"` -###### Did the program download the file "`EMtmPFLWkAA8CIS.jpg`"? +###### Did the program download the file `"EMtmPFLWkAA8CIS.jpg"`? -##### Try to run the following command with a link at your choice "`./wget `" +##### Try to run the following command with a link at your choice `"./wget "` ###### Did the program download the expected file? -##### Try to run the following command "`./wget https://golang.org/dl/go1.15.linux-amd64.tar.gz`" +##### Try to run the following command `"./wget https://golang.org/dl/go1.15.linux-amd64.tar.gz"` -###### Did the program download the file "`go1.15.linux-amd64.tar.gz`"? +###### Did the program download the file `"go1.15.linux-amd64.tar.gz"`? ###### Did the program displayed the start time? @@ -24,7 +24,7 @@ ###### Did the program displayed the name and path of the file that was saved? -##### Try to download a big file, for example: "`./wget http://ipv4.download.thinkbroadband.com/100MB.zip`" +##### Try to download a big file, for example: `"./wget http://ipv4.download.thinkbroadband.com/100MB.zip"` ###### Did the program download the expected file? @@ -36,27 +36,27 @@ ###### While downloading, did the progress bar progressed smoothly (kept up with the time that the download took to finish)? -##### Try to run the following command, "`./wget -O=test_20MB.zip http://ipv4.download.thinkbroadband.com/20MB.zip`" +##### Try to run the following command, `"./wget -O=test_20MB.zip http://ipv4.download.thinkbroadband.com/20MB.zip"` -###### Did the program downloaded the file with the name "`test_20MB.zip`"? +###### Did the program downloaded the file with the name `"test_20MB.zip"`? -##### Try to run the following command, "`./wget -O=test_20MB.zip -P=~/Downloads/ http://ipv4.download.thinkbroadband.com/20MB.zip`", then go to the folder "`~/Downloads/`" +##### Try to run the following command, `"./wget -O=test_20MB.zip -P=~/Downloads/ http://ipv4.download.thinkbroadband.com/20MB.zip"`, then go to the folder `"~/Downloads/"` ###### Can you see the file downloaded? -##### Try to run the following command, "`./wget --rate-limit=300k http://ipv4.download.thinkbroadband.com/20MB.zip`" +##### Try to run the following command, `"./wget --rate-limit=300k http://ipv4.download.thinkbroadband.com/20MB.zip"` ###### Was the download speed always lower than 300KB/s? -##### Try to run the following command, "`./wget --rate-limit=700k http://ipv4.download.thinkbroadband.com/20MB.zip`" +##### Try to run the following command, `"./wget --rate-limit=700k http://ipv4.download.thinkbroadband.com/20MB.zip"` ###### Was the download speed always lower than 700KB/s? -##### Try to run the following command, "`./wget --rate-limit=2M http://ipv4.download.thinkbroadband.com/20MB.zip`" +##### Try to run the following command, `"./wget --rate-limit=2M http://ipv4.download.thinkbroadband.com/20MB.zip"` ###### Was the download speed always lower than 2MB/s? -##### Try to create a text file with the name "`downloads.txt`" and save into it the links below. Then run the command "`./wget -i=downloads.txt`" +##### Try to create a text file with the name `"downloads.txt"` and save into it the links below. Then run the command `"./wget -i=downloads.txt"` ``` https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg @@ -70,15 +70,15 @@ http://ipv4.download.thinkbroadband.com/10MB.zip #### Mirror -##### Try to run the following command "`./wget --mirror http://corndog.io/`", then try to open the "`index.html`" with a browser +##### Try to run the following command `"./wget --mirror http://corndog.io/"`, then try to open the `"index.html"` with a browser ###### Is the site working? -##### Try to run the following command "`./wget --mirror https://theuselessweb.com/`" +##### Try to run the following command `"./wget --mirror https://theuselessweb.com/"` ###### Is the site working? -##### Try to run the following command to mirror a website at your choice "`./wget --mirror `" +##### Try to run the following command to mirror a website at your choice `"./wget --mirror "` ###### Did the program mirror the website? From 47dc525935c6fb7c2e6a68b0d99f832961f1f536 Mon Sep 17 00:00:00 2001 From: lee Date: Tue, 18 Aug 2020 15:41:10 +0100 Subject: [PATCH 4/5] audit corrections --- subjects/wget/audit/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/subjects/wget/audit/README.md b/subjects/wget/audit/README.md index dc61f182f..991ce78d6 100644 --- a/subjects/wget/audit/README.md +++ b/subjects/wget/audit/README.md @@ -40,9 +40,9 @@ ###### Did the program downloaded the file with the name `"test_20MB.zip"`? -##### Try to run the following command, `"./wget -O=test_20MB.zip -P=~/Downloads/ http://ipv4.download.thinkbroadband.com/20MB.zip"`, then go to the folder `"~/Downloads/"` +##### Try to run the following command, `"./wget -O=test_20MB.zip -P=~/Downloads/ http://ipv4.download.thinkbroadband.com/20MB.zip"` -###### Can you see the file downloaded? +###### Can you see the expected file in the "~/Downloads/" folder? ##### Try to run the following command, `"./wget --rate-limit=300k http://ipv4.download.thinkbroadband.com/20MB.zip"` From c6a21e81eb9eb0288aa897f6dfa1b213786c6f16 Mon Sep 17 00:00:00 2001 From: lee Date: Thu, 20 Aug 2020 11:44:37 +0100 Subject: [PATCH 5/5] adding flags to mirror --- subjects/wget/README.md | 39 +++++++++++++++++++++++++++-------- subjects/wget/audit/README.md | 24 +++++++++++++++++++++ 2 files changed, 54 insertions(+), 9 deletions(-) diff --git a/subjects/wget/README.md b/subjects/wget/README.md index e5c550eff..19a94dec4 100644 --- a/subjects/wget/README.md +++ b/subjects/wget/README.md @@ -129,24 +129,45 @@ The Downloads should work asynchronously, it should download both files at the s 5. [**Mirror a website**](https://en.wikipedia.org/wiki/Mirror_site). This option should download the entire website being possible to use "part" of the website offline and for other useful [reasons](https://www.quora.com/How-exactly-does-Mirror-Site-works-and-how-it-is-done). For this you will have to download the website file system and save it into a folder that will have the domain name. Example: `http://www.example.com`, will be stored in a folder with the name `www.example.com` containing every file from the mirrored website. The flag should be `--mirror`. -To mirror a website you will have to implement the following `wget` flags so that the web mirror is complete (you do not need to do the literal flags, but just the theory behind it, so your flag `--mirror` need to behave like the following wget flags combined): +The default usage of the flag will be to retrieve and parse the HTML or CSS from the given URL. This way retrieving the files that the document refers through tags. The tags that will be used for this retrieval must be `a`, `link` and `img` that contains attributes `href` and `src`. -- [`--mirror`](https://www.gnu.org/software/wget/manual/wget.html): download recursive -- [`--convert-links`](https://www.gnu.org/software/wget/manual/wget.html): after the download is complete it will convert all links in the document to make them suitable for local viewing -- [`--page-requisites`](https://www.gnu.org/software/wget/manual/wget.html): downloads all files that are necessary to properly display a given HTML page -- [`--no-parent`](https://www.gnu.org/software/wget/manual/wget.html): this will not let the program ascend to the parent directory when retrieving +You will have to implement some optional flags to go along with the `--mirror` flag. + +Those flags will work based on [Follow links](https://www.gnu.org/software/wget/manual/wget.html#Following-Links). The command `wget` has several mechanisms that allows you to fine-tune which links it will follow. For This project you will have to implement the behavior of (note that this flags will be used in conjunction with the `--mirror` flag): + +- [Types of Files](https://www.gnu.org/software/wget/manual/wget.html#Types-of-Files) (`--reject` short hand `-R`) + +> this flag will have a list of file suffixes that the program will avoid downloading during the retrieval + +example: + +```console +student@student$ ./wget --mirror -R=jpg,gif https://example.com +``` + +- [Directory-Based Limits](https://www.gnu.org/software/wget/manual/wget.html#Directory_002dBased-Limits) (`--exclude` short hand -X) + +> this flag will have a list of paths that the program will avoid to follow and retrieve. So if the URL is `https://example.com` and the directories are `/js`, `/css` and `/assets` you can avoid any path by using `-X=/js,/assets`. The fs will now just have `/css`. + +example: + +```console +student@student ./wget --mirror -X=/assets,/css https://example.com +``` ### Hint -You can take a look into the [html package](https://godoc.org/golang.org/x/net/html) for some help +You can take a look into the [html package](https://godoc.org/golang.org/x/net/html) for some help.\ +Try the real flags from the wget command to better understand their usage. --- This project will help you learn about: -- GNU Wget +- [GNU Wget](https://www.gnu.org/software/wget/manual/wget.html) - HTTP - [FTP](https://en.wikipedia.org/wiki/File_Transfer_Protocol) -- Algorithms +- Algorithms (recursion) - Mirror websites -- File system(fs) + - Follow links +- File system (fs) diff --git a/subjects/wget/audit/README.md b/subjects/wget/audit/README.md index 991ce78d6..148cbb21e 100644 --- a/subjects/wget/audit/README.md +++ b/subjects/wget/audit/README.md @@ -74,6 +74,30 @@ http://ipv4.download.thinkbroadband.com/10MB.zip ###### Is the site working? +##### Try to run the following command `"./wget --mirror https://oct82.com/"`, then try to open the `"index.html"` with a browser + +###### Is the site working? + +##### Try to run the following command `"./wget --mirror --reject=gif https://oct82.com/"`, then try to open the `"index.html"` with a browser + +###### Did the program download the site without the GIFs? + +##### Try to run the following command `"./wget --mirror https://trypap.com/"`, then use the command `"ls"` to see the file system of the created folder. + +``` +css img index.html +``` + +###### Does the created folder has the same fs as above? + +##### Try to run the following command `"./wget --mirror -X=/img https://trypap.com/"`, then use the command `"ls"` to see the file system of the created folder. + +``` +css index.html +``` + +###### Does the created folder has the files above? + ##### Try to run the following command `"./wget --mirror https://theuselessweb.com/"` ###### Is the site working?