thank you. I didn't really understand the Tags issue, and why it is so complex to add a tag to a dataset (and what is the differences between tags and aspects)
@richardduncan34035 күн бұрын
Nice simple starting point example to follow along with
@Steelxfaze7 күн бұрын
wow this is so intimidating. Where did you learn how to do all this?
@nodematic7 күн бұрын
I personally learn best through side projects and work/consulting projects - a just-in-time learning sort of approach, with hands-on and watching videos for topics as needed. After working through fine-tuning for a few projects, the concepts and terminology become way easier. The bigger picture, for real projects, will often looks something like this kzbin.info/www/bejne/j6WypZRmeZ1gaMU.
@Steelxfaze6 күн бұрын
@@nodematic That makes a lot of sense. I'm currently just learning how to use models in Python in my own projects (though I'm still struggling quite a bit with them), but I hope this will get easier with time and I can start fine-tuning.
@ivankomakech39128 күн бұрын
Hello thanks for the wonderfull tutorial. However i get an error on the step that publishes the image to the artifact registry. The error says denied: Unauthenticated request. Unauthenticated requests do not have permission "artifactregistry.repositories.uploadArtifacts" on resource "projects/***/locations/us-central1/repositories/charity-wave-repo" (or it may not exist) So based on this I have no idea how to proceed. Please help. Thanks
@nodematic8 күн бұрын
It sounds like your authentication isn't setup. Try `gcloud init` to set that up.
@Chris-kq7ir8 күн бұрын
What if I have just one docx file that I want to fine tune the model with. How can I achieve that?
@nodematic8 күн бұрын
If it's a huge docx file, you could break it up and fine-tune on that. If you just need some sort of RAG/grounding, a no-code RAG solution like this would be best kzbin.info/www/bejne/fWTZqn5rj8yGoLM (works with docx files).
@Chris-kq7ir6 күн бұрын
@@nodematic thanks, would data preparation be the same no matter the file type? just extract the text then convert it into key pairs?
@nodematic5 күн бұрын
For fine tuning, yes, you'll just need the extracted text to train on. Something like python-docx could do the extraction. Split up the extracted text into small enough samples that you don't run out of memory during the fine-tuning process (get the best GPU you can for this, like a Colab A100).
@superfreiheit114 күн бұрын
I would like to download 1000 Arxiv papers and chat with a LLM about the content. Do you know how to do it?
@nodematic13 күн бұрын
I would suggest Google Cloud Agent Builder. The service has developed since this video, but this will give you the idea kzbin.info/www/bejne/n5zGo5V9nZaDj6s. Basically, you add the PDFs to a Google Cloud Storage bucket, then create a no-code Chatbot Agent on top of that bucket.
@nodematic8 күн бұрын
Here's a video kzbin.info/www/bejne/fWTZqn5rj8yGoLM
@aswathymg908115 күн бұрын
I am trying to download 2.2GB file...but only partial download occurs... what is the reason?
@aswathymg908115 күн бұрын
How can download the uploaded file from tus server
@ndvhareesh16 күн бұрын
I’m so glad I found your channel-this is next-level content.
@DarkShadow-32118 күн бұрын
Thanks
@sounishnath51319 күн бұрын
If I choose to bypass Cloud Logging and directly insert application service logs into BigQuery, what are the implications, considering that real-time log analysis is not a requirement? Please advise.
@nodematic18 күн бұрын
That would work fine - no major concerns. Just keep in mind that the more structured the log data, the more useful and intuitive BigQuery-based analysis will be. Unstructured or semi-structured logs are going to be easier to analyze in Log Explorer.
@sounishnath51321 күн бұрын
Thanks nodematic team.
@mandeepmails22 күн бұрын
really good content
@mallninja980523 күн бұрын
This tutorial represents the third unique config I've tried that has failed to get superset up and running. This one fails in "[wait-for-postgres, superset-init-db] ... * timed out waiting for the condition" Superset is just not ready for prime time.
@Jean-AlainMignon23 күн бұрын
Very well structured and detailed video to compare self-hosted and google managed prometheus, thanks!
@redboy-189925 күн бұрын
But in f16 merge it was saving as.bin , how to save as the safetensors natively ?
@jacobjonm051126 күн бұрын
Nice work
@alewhois28 күн бұрын
Congratulations! Very nice content.
@刘春峰-w2e29 күн бұрын
very informative. just what i need. Thanks.
@DarkShadow-321Ай бұрын
Thank you
@sounishnath513Ай бұрын
thanks for sharing
@ROKKor-hs8tgАй бұрын
It is free for all or for pro with 9$
@nodematicАй бұрын
Pro is required
@MrsamssfulАй бұрын
I had this error when I launched the job: "denied: Unauthenticated request. Unauthenticated requests do not have permission "artifactregistry.repositories.uploadArtifacts" on resource" . When I saw the logs, I found this message: "Warning: The gcloud CLI is not authenticated (or it is not installed). Authenticate by adding the "google-github-actions/auth" step prior this one." So we must configure the Authorization before the job and add the "google-github-actions/auth" action. For example: - id: auth uses: google-github-actions/auth@v2 with: credentials_json: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }} - name: install gcloud cli uses: google-github-actions/setup-gcloud@v2 with: project_id: ${{ secrets.GOOGLE_PROJECT }}
@siddhesh-shinde-trellixАй бұрын
Nice demo. Really helpful
@MitosNãoKagghamNaSuaCabeçaАй бұрын
I love the high quality of NotebookLM, what an amazing tool, but the only problem, a crucial one, is the only available two voices. People are using them like crazy, so they start to sound oversaturated, that's why I'm searching for a free and unlimited method to change the voices, keeping the quality. Found an AI that does the job with the same quality, but it's a pity it's not cheap nor unlimited. The search goes on...
@ChuckBaggettАй бұрын
They're playing music while they could be explaining what they are do. Practically no one understands what they're doing. That of explanation lead to me to thumbs down it. and stop watching. Hopefully I'll find something more useful .
@ahmadrana-c1yАй бұрын
is it worth using unsloth with amazon sagemaker ?
@lvngleyptyltdnorthsideva1148Ай бұрын
Hi, my name is Bongani from South Africa. Firstly, thank you for such an informative video. Short and straight to the point. I'm a non-technical co-founder in our startup, and I would like to ask you something that is somewhat technical. I'm in a region where the model has limited data on local languages. It's mainly good for detecting profanity. I would like to fix that by creating my own audio data sets, transcribing them and then feed those into the model to improve it. Is that something that is possible to do? I'm from a sound engineering background, now working in the telelhealth space
@martinterreni1616Ай бұрын
Very good overview video
@deltaexistsssАй бұрын
Great tutorial! Just wish we could use checkpoints using the online version...
@LukaszBrodziakАй бұрын
When I follow the tutorial I get an error: denied: Unauthenticated request. Unauthenticated requests do not have permission "artifactregistry.repositories.uploadArtifacts"
@MegaKrishnasАй бұрын
I am getting the same error as well :(
@sounishnath513Ай бұрын
I'm grateful that i found this channel without the recommendation .
@mrmurangaАй бұрын
this is super awesome😎😎...thorough and easy to follow...thanks alot 👍🏿👍🏿
@rafaelfox10Ай бұрын
Thank you!
@saireddy-o6p2 ай бұрын
Hi, I have got error like : Not found: Dataset peak-catbird-440802-b3:dataform was not found in location US i have given all permissions as you mentioned and the dataset loction is us in bigquery.
@jitendrakumarnayak88572 ай бұрын
such an intuitive video....very disheartening to see so less views, likes and subscribers😑...hope you continue making such videos
@igorcastilhos2 ай бұрын
I have a folder with many PDF's and I would like to fine tune a model to summarize these PDF's and respond to questions in my website. Is there a way to do that using the example of this video?
@abdulsami5843Ай бұрын
you would first have to figure out the parsing logic to correctly extract the text and then put it in a summarizer, if all you want is a summary then there are many good models available on hugging face that you can use directly OR just get a gemini api key and use gemini for it, it should do a decent job at it.
@igorcastilhosАй бұрын
@ thank you
@abdulsami5843Ай бұрын
@@igorcastilhos also if you want to use the PDFs as context to answer questions then you probably need to parse and then put them into a vector store so that they can be retrieved when needed, this is called RAG
@igorcastilhosАй бұрын
@@abdulsami5843 I'm using Ollama with the Web UI tool. Inside it, I'm sending to the Knowledge collection some PDFs of resolutions of attorneys, so that they can ask about them whenever they want. The main reason to use Ollama (llama3.2) instead of OpenAI API is that it is free. But I'm having problems accessing the Web UI localhost:3000 from our server in my machine, it doesn't show the models installed in the server machine.
@igorcastilhosАй бұрын
@@abdulsami5843 Also, the RAG feature doesn't have a very nice documentation. In my case, we have a distributed folder in microsoft windows (Like C:/) and inside that folder, the attorneys and advocates will send new PDFs through the website, and I wanted to use RAG for it, but it is very hard.
@TronggMjnh2 ай бұрын
How can I download it as excel to my computer?
@dante72222 ай бұрын
How do I submit a project in python. The main file (driver) main.py and other files that main.py imports? and other project files. requirements.txt, configs.json etc...
@liakat262 ай бұрын
Fantastic tutorial. Can it recognize handwriting and extract them too! Google has amazing people
@umairrkhn2 ай бұрын
Great video!
@lesptitsoiseaux2 ай бұрын
Could you link the actual notebook you are using somewhere please? Feedback wise, I'm 47, been learning forever, you have an awesome pace, quality wise, I'd pay for the content.
@nodematic2 ай бұрын
Thanks! Here's the notebook colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing
@NhatNguyen-bq6jj2 ай бұрын
thanks!
@orafaelgf2 ай бұрын
greate, tks. how to up interface from UC? Do you have any video explain that?
@nodematic2 ай бұрын
At the time of video creation, the interface wasn't available, but we'll try to cover this in a future video.
@espetosfera89662 ай бұрын
Very good.
@hamedparsa88802 ай бұрын
Thanks for the good tutorial ma'am. If you only save the QLoRA, How we can use it tools like LM Studio?
@nodematic2 ай бұрын
You will always need both the base model and the adapter layers to actually run/use the model - it's just a question of if you want to merge and store everything "together". I haven't tried LM Studio, but upon a quick look, I would suggest saving/publishing your model as merged weights (should be simpler to pull into LM Studio). The "GGUF Conversion" portion of the fine-tuning notebook might actually be best for the export, based on LM Studio's website "LM Studio supports any GGUF Llama, Mistral, Phi, Gemma, StarCoder, etc model on Hugging Face". Hope that helps!
@hamedparsa88802 ай бұрын
@@nodematic yes I've exported as gguf and used the merged option and it worked. although i faced with the model over fitting to a small database and that made it's behavior to go weird in some cases. but I'm working to expand my training data and maybe add different system prompts into the data as well... the question is, how to divide my data into training/test datasets and check the loss function for test dataset? does that colab notebook support it? or I'mma need to figure it out myself? ++ thanks for the answer. 💙
@nodematic2 ай бұрын
The loss is reported after each step in the training (you'll see this in the training cell in the notebook). A good approach is to see where the loss starts to "level off" (decrease significantly slower), and use the model at that point. You could also consider reducing the LoRA alpha value to put less emphasis on the adapter layers (and increase the emphasis on the base model). The expanded training set is a good idea, especially if you have less than ~200 examples. There isn't a traditional training/test split in fine-tuning like you would in other AI/ML problems - partly because responses are difficult to score quantitatively and with precision. Instead, people will do a post-training quality step of RLHF to integrate feedback on which fine-tuned model answers were good and which were bad, on tests. There are also some advanced methods to limit overfitting, but it's well beyond the scope of most small model fine tuning.
@Grynjolf2 ай бұрын
Does this require a load balancer? Or can you region block my instances? I'm trying to stay within the free tier and block everything but US traffic.
@nodematic2 ай бұрын
The demonstrated setup requires a load balancer - you'd have to DIY something if you're routing traffic straight to your VM public IP.
@prabhupreetamdas39712 ай бұрын
Thanks for the informational video. I really liked the way you demonstrated the workflows and detailed steps of installing Milvus.
@hamedparsa88802 ай бұрын
Nice tutorial. Got me subscribed! ❤ Buuuutt,, i want a new more detailed fine-tuning tutorial on Gemma 2-9b.. especially for coding purposes.
@nodematic2 ай бұрын
Thanks. We'll add that to our video ideas.
@hamedparsa88802 ай бұрын
@@nodematic can't wait for that to come out! 💯 keep it up! 💪🏼
@jonatasnascimento65842 ай бұрын
Hello, I'm from Brazil. I'm new to AI. I would like to build an artificial intelligence to automate university work, as I have a lot of work and I can't keep up with it. I want an AI that can write papers like me using my texts. What adjustments or training should I do? Do I need to change a parameter?
@jonatasnascimento65842 ай бұрын
In my mind I'm trying to use about 10 review texts of my articles. And 1 expanded summary. I want the AI to write like me without AI plagiarism detection.