ML System Design Mock Interview - Build an ML System That Classifies Which Tweets Are Toxic

  Рет қаралды 9,979

Exponent

Exponent

Күн бұрын

Пікірлер: 10
@diegofabiano8489
@diegofabiano8489 7 ай бұрын
I honestly like much better the Machine Learning system design interviews, the one with the Meta engineer where he actually applied the steps was awesome!
@DrAhdol
@DrAhdol 7 ай бұрын
Something I'd like to see more from some of these ML videos are acknowledgements of approaches not leveraging NN. For something like this, you could leverage multinomial naive bayes with bag of words/tf-idf scores and get good performance with super fast inference speed as a baseline to compare the more complex NN models.
@kaanbicakci
@kaanbicakci 5 ай бұрын
Calling shuffle() method on a tf.data.Dataset instance before splitting datasets can cause data leakage. The dataset is reshuffled in every iteration so everytime one of those take() and skip() methods are called, the order of the gathered elements from the "dataset" is different which may introduce overlapping samples. Here's a small example (the output will be different everytime but you should see the overlap after running multiple times): import tensorflow as tf import pandas as pd import numpy as np num_rows = 10 dataset = tf.data.Dataset.from_tensor_slices(np.arange(1, num_rows + 1)) dataset = dataset.cache() dataset = dataset.shuffle(num_rows) dataset = dataset.batch(2) dataset = dataset.prefetch(1) train = dataset.take(2) val = dataset.skip(2).take(1) test = dataset.skip(3).take(1) def extract_ids(ds): ids = [] for batch in ds: ids.extend(batch.numpy()) return np.array(ids) train_ids = extract_ids(train) val_ids = extract_ids(val) test_ids = extract_ids(test) train_val_overlap = np.intersect1d(train_ids, val_ids) train_test_overlap = np.intersect1d(train_ids, test_ids) val_test_overlap = np.intersect1d(val_ids, test_ids) print("Train IDs:", train_ids) print("Val IDs:", val_ids) print("Test IDs:", test_ids) print("Train-Val Overlap:", train_val_overlap) print("Train-Test Overlap:", train_test_overlap) print("Val-Test Overlap:", val_test_overlap)
@mandanafasounaki2192
@mandanafasounaki2192 6 ай бұрын
Great work, solid coding skills. The thing I would want to add is that when we use BERT tokenizer, all the information, that is required to be extracted from the text for classification, is already embedded into the vectors. A simple perceptron could work well on top of the embeddings. But your approach is great for demonstrating the development lifecycle of an ML project.
@jackjill67
@jackjill67 7 ай бұрын
First useful video... otherwise most people just talk through
@alexb2997
@alexb2997 5 ай бұрын
Just to represent for recurrent networks -- It's a little unfair on LSTMs to suggest they might struggle with long term dependencies for tweets. Transformers do have an easier architecture for handling long term retrieval, but LSTMs were a specifically designed variant of RNNs for handling long term dependencies. For tweet-length documents, you'd be fine. I'm not saying don't use a transformer, just don't write off recurrent models so quickly.
@TooManyPBJs
@TooManyPBJs 5 ай бұрын
Isn't it a bit duplicative to add LSTM with BERT tokens since BERT is already sequence aware?
@alexb2997
@alexb2997 5 ай бұрын
The tokens are just simple vocab indices, there's no sequence encoding involved at that stage. The sequence magic happens within the transformer, which wasn't used here.
@ABHILASHTRIPATHI-p6d
@ABHILASHTRIPATHI-p6d 7 ай бұрын
How did you scrape this data from twitter, twitter API has lots of restrictions. Can you please explain that.
龟兔赛跑:好可爱的小乌龟#short #angel #clown
01:00
Super Beauty team
Рет қаралды 30 МЛН
Synyptas 4 | Арамызда бір сатқын бар ! | 4 Bolim
17:24
ДЕНЬ УЧИТЕЛЯ В ШКОЛЕ
01:00
SIDELNIKOVVV
Рет қаралды 4,1 МЛН
Google system design interview: Design Spotify (with ex-Google EM)
42:13
IGotAnOffer: Engineering
Рет қаралды 1,1 МЛН
Top 6 ML Engineer Interview Questions (with Snapchat MLE)
20:05
Engineering Management at Meta
32:02
Everyday Leadership
Рет қаралды 6 М.
All Machine Learning algorithms explained in 17 min
16:30
Infinite Codes
Рет қаралды 215 М.
Прикладной Data Science: как стать ML-инженером
1:26:26
Яндекс Практикум
Рет қаралды 7 М.
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 324 М.
Валерий Бабушкин "ML System Design"
1:13:17
REU Data Science Club
Рет қаралды 4,4 М.