Fine tuning Embeddings Model

Рет қаралды 1,003

Күн бұрын

Fine tuning with the new Sentence Transformers v3.0.
Join Skool Community for $129:
www.skool.com/data-society-42...
Have questions or ideas, meet similar people?
join the discord : / discord
Don't fall behind the AI revolution, I can help integrate machine learning/AI into your company.
mosleh587084.typeform.com/to/...
Notebook: github.com/mosh98/RAG_With_Mo...
This video you will learn
1. Fine tuning embeddings model
2. What types of Data sets can be used
3. How to to test fine tuned embeddings model.
What is sentence transformer?
Sentence Transformers v3.0 introduces significant improvements to the framework for creating and fine-tuning embedding models. This update includes a new training API, backed by `SentenceTransformerTrainer`, enhancing multi-GPU training and detailed loss logging. The version adds new similarity functions like cosine, dot, euclidean, and manhattan, specified via `similarity_fn_name`, for better adaptability to specific tasks Additionally, it supports hyperparameter optimization, extending capabilities from the broader `transformers` library. The release expands loss functions and datasets, ensuring a wide range of training scenarios are covered. While maintaining backward compatibility, the update encourages transitioning to the new API for full benefits.
You can used either BGE or nomic-embed-text model to fine tune your model.
Intro 0:00
Sentence Transformer v3.0 0:49
Download packages 1:08
Load Dataset 1:20
How to Adapt it to your data 2:15
Loading Data and Training Arguments 3:45
Training and Testing 4:45

Пікірлер: 9

@thevadimb 13 күн бұрын

First, thank you for your video - I really appreciate your work! A question - I see the validation loss is actually growing... Am I missing some point here?

@moslehmahamud 12 күн бұрын

You are right, i didn’t properly train the model with sufficient data or necessary steps/epochs. Please don’t be like me hahaha Hope that answers your question

@ashleeclaral3271 10 күн бұрын

how should my own custom dataset look like?

@moslehmahamud 10 күн бұрын

you can try using pair-wise, labeled dataset to train the embeddings model

@rahul01483 12 күн бұрын

do you have any video on how I can train my own dataset from scratch and create embedding vector store

@moslehmahamud 10 күн бұрын

yes, a new video will be uploaded tomorrow (as of writing), using hf model to get embeddings. You can use a chroma db to store the embeddings Hope that helps

@rahul01483 9 күн бұрын

@@moslehmahamud sure it helps, as have been using chromadb for some time now... would love to see ur impl