mirror of https://github.com/01-edu/public.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
eslopfer
3eeb99dc38
|
2 years ago | |
---|---|---|
.. | ||
README.md | 2 years ago |
README.md
string_tokenizer_count
Instructions
Create a file string_tokenizer_count.py that contains a function tokenizer_counter which takes in a string as a parameter and returns a dictionary of words and their count in the string.
-
The function should remove any punctuation from the string and convert it to lowercase before counting the words.
-
The function should return a dictionary of words and their count, sorted alphabetically by word.
Usage
Here is an example of how to use the function:
string = "This is a test sentence, with various words and 123 numbers!"
result = tokenizer_counter(string)
print(string)
And its output:
string = "This is a test sentence, with various words and 123 numbers!"
result = tokenizer_counter(string)
Hints
-
The
re
module can be used to remove non-alphanumeric characters. -
The
collections
module can be used to count the words. -
The
operator
module can be used to sort the dictionary alphabetically by word.