Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
tokenizer
text-extraction
requests
data-extraction
beautifulsoup
text-processing
tokenization
stemming
lemmatization
stopwords-removal
text-cleaning
text-normalization
extract-html
text-tokenization
text-lemmatization
-
Updated
Apr 5, 2024 - Jupyter Notebook