public

root

public

mirror of https://github.com/01-edu/public.git

1.6 KiB

Raw Blame History

string_tokenizer_count

Instructions

Create a file string_tokenizer_count.py that contains a function tokenizer_counter which takes in a string as a parameter and returns a dictionary of words and their count in the string.

The function should remove any punctuation from the string and convert it to lowercase before counting the words.
The function should return a dictionary of words and their count, sorted alphabetically by word.

Usage

Here is an example of how to use the function in a test.py script:

string = "This is a test sentence, with various words and 123 numbers!"
result = tokenizer_counter(string)
print(result)

And its output:

$ python3 test.py
{'123': 1, 'a': 1, 'and': 1, 'is': 1, 'numbers': 1, 'sentence': 1, 'test': 1, 'this': 1, 'various': 1, 'with': 1, 'words': 1}
$

Hints

The re module can be used to remove non-alphanumeric characters.
The Counter class of the collections module can be used to count the words.
The operator module can be used to sort the dictionary alphabetically by word.

Here is an example of how to sort a dictionary in python, using a test.py script:

dictionary = {'a': 5, 'c': 1, 'b': 3}
sorted_dict = dict(sorted(d.items(), key=lambda item: item[1]))
print(sorted_dict)