From 07b35e898d56e6c2ebe6ebf444fc3bfab8876db5 Mon Sep 17 00:00:00 2001 From: miguel Date: Thu, 26 Jan 2023 14:49:21 +0000 Subject: [PATCH] docs(string_tokenizer) typos --- subjects/devops/string_tokenizer/README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/subjects/devops/string_tokenizer/README.md b/subjects/devops/string_tokenizer/README.md index 49995f58..2ebbf094 100644 --- a/subjects/devops/string_tokenizer/README.md +++ b/subjects/devops/string_tokenizer/README.md @@ -6,8 +6,8 @@ Tokenization is the process of breaking down a string into smaller pieces, calle Create a file `string_processing.py` which will have a function `tokenize(sentence)` that given a sentence will do the following: -- removes all punctuation marks and special characters -- separates all words like so: `"it's not 3" => ['it', 's', 'not', '3']` +- remove all punctuation marks and special characters +- separate all words like so: `"it's not 3" => ['it', 's', 'not', '3']` - put all the words in lowercase - return a list of all the words. @@ -30,7 +30,7 @@ $ python test.py ### Hints -The `re` library is a module for working with regular expressions it provides a set of functions for working with regular expressions, including: +The `re` library is a module for working with regular expressions. It provides a set of functions for working with regular expressions, including: - `re.sub()` : Replaces all occurrences of a regular expression pattern in a string with a replacement string. @@ -67,25 +67,25 @@ this is a test sentence. The `.split()` method is used to split the sentence into a list of words. +```python text = "This is a test sentence." words = text.split() print(words) - -```` +``` and the output: ```console ['This', 'is', 'a', 'test', 'sentence.'] -```` +``` ### References - [string methods](https://www.w3schools.com/python/python_ref_string.asp) - [replace](https://www.w3schools.com/python/ref_string_replace.asp) - [split](https://www.w3schools.com/python/ref_string_split.asp) -- import "string" module and [get all string punctuations](https://docs.python.org/3/library/string.html#string.punctuation) +- [String punctuations](https://docs.python.org/3/library/string.html#string.punctuation) - [Tokenization in text analysis](https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization) - [Word segmentation](https://en.wikipedia.org/wiki/Text_segmentation#Word_segmentation)