How this lab is wired

Each panel sends your text to a Flask endpoint in app.py. That endpoint runs the exact scikit-learn / numpy / gensim calls from the reference notebook — CountVectorizer, TfidfVectorizer, OneHotEncoder, a hand-rolled Bag-of-Words counter, an N-gram generator, and a gensim.models.Word2Vec/FastText trainer — and streams the intermediate results back as JSON. The page then reveals those results stage by stage along the pipeline tape at the top of each panel, so you can see tokenization happen before the vocabulary appears, and the vocabulary settle before the vectors fill in.

Raw sentences
Tokenize
Build vocabulary
One-hot vectors
sklearn check
Input corpus
one sentence per line
Raw corpus
Tokenize docs
Fit vocabulary
Count matrix
Transform new doc
Input corpus
one document per line
Raw corpus
Tokenize
Vocabulary
BoW matrix
Binary BoW
Cosine similarity
Input corpus
one document per line
Sentence
Unigrams
Bigrams
Trigrams
N-gram matrices
Input
Raw corpus
Term frequency (TF)
Inverse doc. freq. (IDF)
TF × IDF
sklearn matrix
Top words / doc
TF-IDF(t, d)  =  [ count(t, d) / total words in d ]  ×  log( N / (1 + df(t)) )
Input corpus
one document per line
Training sentences
Train Word2Vec
Similarity
Most similar
PCA plot
FastText OOV
Training sentences
one per line · trains live, takes a couple seconds