lab_2_spellcheck package
Submodules
Lab 2.
- lab_2_spellcheck.main.add_letter(word: str, alphabet: list[str]) list[str]
Generate all possible words by inserting a letter from the alphabet at every possible position in the word.
- Parameters:
- Returns:
A list of words with one additional letter inserted.
- Return type:
In case of corrupt input arguments, empty list is returned.
- lab_2_spellcheck.main.build_vocabulary(tokens: list[str]) dict[str, float] | None
Build a vocabulary from the documents.
- Parameters:
- Returns:
Dictionary with words and relative frequencies as keys and values respectively.
- Return type:
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.calculate_distance(first_token: str, vocabulary: dict[str, float], method: Literal['jaccard', 'frequency-based', 'levenshtein', 'jaro-winkler'], alphabet: list[str] | None = None) dict[str, float] | None
Calculate distance between two strings using the specified method.
- Parameters:
- Returns:
Calculated distance score.
- Return type:
In case of corrupt input arguments or unsupported method, None is returned.
- lab_2_spellcheck.main.calculate_frequency_distance(word: str, frequencies: dict, alphabet: list[str]) dict[str, float] | None
Suggest the most probable correct spelling for the word.
- Parameters:
- Returns:
The most probable corrected word.
- Return type:
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.calculate_jaccard_distance(token: str, candidate: str) float | None
Calculate Jaccard distance between two strings.
- Parameters:
- Returns:
Jaccard distance score in range [0, 1].
- Return type:
float | None
In case of corrupt input arguments, None is returned. In case of both strings being empty, 1.0 is returned.
- lab_2_spellcheck.main.calculate_jaro_distance(token: str, candidate: str, matches: int, transpositions: int) float | None
Calculate the Jaro distance between two strings.
- Parameters:
- Returns:
Jaro distance score.
- Return type:
float | None
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.calculate_jaro_winkler_distance(token: str, candidate: str, prefix_scaling: float = 0.1) float | None
Calculate the Jaro-Winkler distance between two strings.
- Parameters:
- Returns:
Jaro-Winkler distance score.
- Return type:
float | None
In case of corrupt input arguments or corrupt outputs of used functions, None is returned.
- lab_2_spellcheck.main.calculate_levenshtein_distance(token: str, candidate: str) int | None
Calculate the Levenshtein edit distance between two strings.
- Parameters:
- Returns:
- Minimum number of single-character edits (insertions, deletions,
substitutions) required to transform token into candidate.
- Return type:
int | None
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.count_transpositions(token: str, candidate: str, token_matches: list[bool], candidate_matches: list[bool]) int | None
Count the number of transpositions between two strings based on matching letters.
- Parameters:
- Returns:
Number of transpositions.
- Return type:
int | None
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.delete_letter(word: str) list[str]
Generate all possible words by deleting one letter from the word.
- Parameters:
word (str) – The input incorrect word.
- Returns:
A sorted list of words with one letter removed at each position.
- Return type:
In case of corrupt input arguments, empty list is returned.
- lab_2_spellcheck.main.fill_levenshtein_matrix(token: str, candidate: str) list[list[int]] | None
Fill a Levenshtein matrix with edit distances between all prefixes.
- Parameters:
- Returns:
Completed Levenshtein distance matrix.
- Return type:
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.find_correct_word(wrong_word: str, vocabulary: dict[str, float], method: Literal['jaccard', 'frequency-based', 'levenshtein', 'jaro-winkler'], alphabet: list[str] | None = None) str | None
Find the most similar word from vocabulary using the specified method.
- Parameters:
- Returns:
- Word from vocabulary with the lowest distance score.
In case of ties, the closest in length and lexicographically first is chosen.
- Return type:
str | None
In case of empty vocabulary, None is returned.
- lab_2_spellcheck.main.find_out_of_vocab_words(tokens: list[str], vocabulary: dict[str, float]) list[str] | None
Found words out of vocabulary.
- Parameters:
- Returns:
List of incorrect words.
- Return type:
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.generate_candidates(word: str, alphabet: list[str]) list[str] | None
Generate all possible candidate words for a given word using four basic operations.
- Parameters:
- Returns:
A combined list of candidate words generated by all operations.
- Return type:
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.get_matches(token: str, candidate: str, match_distance: int) tuple[int, list[bool], list[bool]] | None
Find matching letters between two strings within a distance.
- Parameters:
- Returns:
Number of matching letters. Boolean list indicating matches in token. Boolean list indicating matches in candidate.
- Return type:
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.initialize_levenshtein_matrix(token_length: int, candidate_length: int) list[list[int]] | None
Initialize a 2D matrix for Levenshtein distance calculation.
- Parameters:
- Returns:
Initialized matrix with base cases filled.
- Return type:
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.propose_candidates(word: str, alphabet: list[str]) tuple[str, ...] | None
Generate candidate words by applying single-edit operations (delete, add, replace, swap) to the word.
- Parameters:
- Returns:
A tuple of unique candidate words generated from the input.
- Return type:
In case of corrupt input arguments, None is returned.
- lab_2_spellcheck.main.replace_letter(word: str, alphabet: list[str]) list[str]
Generate all possible words by replacing each letter in the word with letters from the alphabet.
- Parameters:
- Returns:
A sorted list of words with one letter replaced at each position.
- Return type:
In case of corrupt input arguments, empty list is returned.
- lab_2_spellcheck.main.swap_adjacent(word: str) list[str]
Generate all possible words by swapping each pair of adjacent letters in the word.
- Parameters:
word (str) – The input incorrect word.
- Returns:
A sorted list of words where two neighboring letters are swapped.
- Return type:
In case of corrupt input arguments, empty list is returned.
- lab_2_spellcheck.main.winkler_adjustment(token: str, candidate: str, jaro_distance: float, prefix_scaling: float = 0.1) float | None
Apply the Winkler adjustment to boost distance for strings with a common prefix.
- Parameters:
- Returns:
Winkler adjustment score.
- Return type:
float | None
In case of corrupt input arguments, None is returned.