You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

153 lines
7.3 KiB

4 years ago
## wget
### Objectives
4 years ago
This project objective consists on recreating some functionalities of [`wget`](https://www.gnu.org/software/wget/manual/wget.html) using **Go**.
4 years ago
4 years ago
These functionalities will consist in:
4 years ago
4 years ago
- The normal usage of `wget`: downloading a file given an URL, example: `wget https://some_url.ogr/file.zip`
4 years ago
- Downloading a single file and saving it under a different name
- Downloading and saving the file in a specific directory
- Set the download speed, limiting the rate speed of a download
4 years ago
- Continuing interrupted downloads
4 years ago
- Downloading a file in background
4 years ago
- Downloading multiple files at the same time, by reading a file containing multiple download links asynchronously
4 years ago
- Main feature will be to download an entire website, [mirroring a website](https://en.wikipedia.org/wiki/Mirror_site)
4 years ago
### Introduction
Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.
4 years ago
To see more about wget you can visit the manual by using the command `man wget`, or you can visit the website [here](https://www.gnu.org/software/wget/manual/wget.html).
4 years ago
#### Usage
4 years ago
Your program must have as arguments the link from where you want to download the file, for instance:
4 years ago
```console
student@student$ ./wget https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg
```
The program should be able to give feedback, displaying the:
4 years ago
- Time that the program started: it must have the following format **yyyy-mm-dd hh:mm:ss**
- Status of the request. For the program to proceed to the download, it must present a response to the request as status OK (`200 OK`) if not, it should say which status it got and finish the operation with an error warning
- Size of the content downloaded: the content length can be presented as raw (bytes) and rounded to Mb or Gb depending on the size of the file downloaded
4 years ago
- Name and path of the file that is about to be saved
- A progress bar, having the following:
4 years ago
- A amount of `KiB` or `MiB` (depending on the download size) that was downloaded
4 years ago
- A percentage of how much was downloaded
- Time that remains to finish the download
4 years ago
- Time that the download finished respecting the previous format
4 years ago
It should look something like this
```console
student@student$ go run main.go https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg
start at 2017-10-14 03:46:06
sending request, awaiting response... status 200 OK
content size: 56370 [~0.06MB]
saving file to: ./EMtmPFLWkAA8CIS.jpg
55.05 KiB / 55.05 KiB [================================================================================================================] 100.00% 1.24 MiB/s 0s
Downloaded [https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg]
finished at 2017-10-14 03:46:07
```
#### Flags
Your program should be able to handle different flags.
1. Download a file and save it under a different name by using the flag `-O` followed by the name you wish to save the file, example:
```console
student@student$ go run main.go -O=meme.jpg https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg
start at 2017-10-14 03:46:06
sending request, awaiting response... status 200 OK
content size: 56370 [~0.06MB]
saving file to: ./meme.jpg
55.05 KiB / 55.05 KiB [================================================================================================================] 100.00% 1.24 MiB/s 0s
Downloaded [https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg]
finished at 2017-10-14 03:46:07
student@student$ ls -l
-rw-r--r-- 1 student student 56370 ago 13 16:59 meme.jpg
-rw-r--r-- 1 student student 11489 ago 13 10:28 main.go
```
---
4 years ago
2. It should also handle the path to where your file is going to be saved using the flag `-P` followed by the path to where you want to save the file, example:
4 years ago
```console
student@student$ go run main.go -P=~/Downloads/ -O=meme.jpg https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg
start at 2017-10-14 03:46:06
sending request, awaiting response... status 200 OK
content size: 56370 [~0.06MB]
saving file to: ~/Downloads/meme.jpg
55.05 KiB / 55.05 KiB [================================================================================================================] 100.00% 1.24 MiB/s 0s
Downloaded [https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg]
finished at 2017-10-14 03:46:07
student@student$ ls -l ~/Downloads/meme.jpg
-rw-r--r-- 1 student student 56370 ago 13 16:59 /home/student/Downloads/meme.jpg
```
---
4 years ago
3. The program should handle speed limit. Basically the program can control the speed of the download by using the flag `--rate-limit`. If you download a huge file you can limit the speed of your download, preventing the program from using the full possible bandwidth of your connection, example:
4 years ago
```console
student@student$ go run main.go --rate-limit=400k https://pbs.twimg.com/media/EMtmPFLWkAA8CIS.jpg
```
4 years ago
This flag should accept different value types, example: k and M. So you can put the rate limit as `rate-limit=200k` or `rate-limit=2M`
4 years ago
---
4 years ago
4. Downloading different files should be possible. For this the program will receive the `-i` flag followed by a file name that will contain all links that are to be downloaded. Example:
4 years ago
```console
student@student$ ls
download.txt main.go
student@student$ cat download.txt
http://ipv4.download.thinkbroadband.com/20MB.zip
http://ipv4.download.thinkbroadband.com/10MB.zip
student@student$ go run main -i=download.txt
content size: [10485760, 20971520]
finished 10MB.zip
finished 20MB.zip
Download finished: [http://ipv4.download.thinkbroadband.com/20MB.zip http://ipv4.download.thinkbroadband.com/10MB.zip]
```
The Downloads should work asynchronously, it should download both files at the same time. You are free to display what you want for this option.
---
4 years ago
5. [**Mirror a website**](https://en.wikipedia.org/wiki/Mirror_site). This option should download the entire website being possible to use "part" of the website offline and for other useful [reasons](https://www.quora.com/How-exactly-does-Mirror-Site-works-and-how-it-is-done). For this you will have to download the website file system and save it into a folder that will have the domain name. Example: `http://www.example.com`, will be stored in a folder with the name `www.example.com` containing every file from the mirrored website. The flag should be `--mirror`.
4 years ago
To mirror a website you will have to implement the following `wget` flags so that the web mirror is complete (you do not need to do the literal flags, but just the theory behind it, so your flag `--mirror` need to behave like the following wget flags combined):
4 years ago
- [`--mirror`](https://www.gnu.org/software/wget/manual/wget.html): download recursive
- [`--convert-links`](https://www.gnu.org/software/wget/manual/wget.html): after the download is complete it will convert all links in the document to make them suitable for local viewing
- [`--page-requisites`](https://www.gnu.org/software/wget/manual/wget.html): downloads all files that are necessary to properly display a given HTML page
- [`--no-parent`](https://www.gnu.org/software/wget/manual/wget.html): this will not let the program ascend to the parent directory when retrieving
4 years ago
### Hint
You can take a look into the [html package](https://godoc.org/golang.org/x/net/html) for some help
4 years ago
---
This project will help you learn about:
- GNU Wget
- HTTP
- [FTP](https://en.wikipedia.org/wiki/File_Transfer_Protocol)
- Algorithms
- Mirror websites
- File system(fs)