Apologies on 720 px video quality. I had camera issues and needed to use my (lower quality) laptop camera to keep things moving and get the video out.
@francycharuto7 ай бұрын
Just in time! Thank you!
@TrelisResearch7 ай бұрын
great stuff
@irbsurfer15857 ай бұрын
Fascinating demonstration on text anonymization! Wondering how this could extend to audio streams, especially with the new GPT-4o's audio capabilities. Here's a thought: could we use a WebSocket to buffer incoming audio, run it through Whisper AI for transcription, and then use SpaCy to mark timestamps for anonymization? FFmpeg could then edit these segments by inserting silences, and HTTPx could handle streaming the edited audio to GPT-4o. Seems like a promising way to maintain privacy in audio-based LLM interactions. Would love to hear your thoughts on this!
@TrelisResearch7 ай бұрын
That's a cool thought. What you describe sounds doable, although adds latency! In principle, it should be possible to develop a spaCy type model that would work on sound directly, and then you can have an overdubbing type tool to replace faker. Definitely goes beyond what's available today - but probably some day this will exist as modules we can piece together.
@seththunder20777 ай бұрын
Can you do a follow up please where you do use the wrapper of presidio with langchain in a RAG application? I assume we'd also need ot implement it on our documents and not just the prompt if we are using closed source embeddings and llm too
@TrelisResearch7 ай бұрын
the langchain docs are really good and they have some colab notebooks you can check out. I played around with one and it's in the ADVANCED-inference repo, but langchain is a pretty high level wrapper that makes it hard to understand what's underneath so it wasn't too suited to this vid. But it's very good for this application as it has good wrappers. prob I won't do another video because I need to get back to pre-training stuff and also want to do some voice+text modeling - like gpt 4o appreciate the comment
@baohuynh54627 ай бұрын
Many thanks. I’m currently working on training a smaller LLM with about 3 billion parameters. Do you have any tips or experiences you could share on achieving good results with a small model? Thanks in advance
@TrelisResearch7 ай бұрын
actually I'm using a 3B model here! phi-3 mini, and you can see that there are good results maybe start with that as your base model. Regarding specifics on getting good results, take a look through the Trelis Research fine-tuning playlist
@baohuynh54627 ай бұрын
@@TrelisResearch thank you so much!
@ukcp2657 ай бұрын
Can llama3 can replace annonymization?
@TrelisResearch7 ай бұрын
Can you expand on your question?
@onoff56047 ай бұрын
It is possible that I am not able to hear the negatives in what you are saying, so that "can" sounds like "can't" and vice versa, but it sounds like you are saying that it is MORE secure for hospitals and schools to send private information (that MUST NEVER EVER EVER be sent to ANY third party EVER under ANY circumstances by law and by ethics) over the internet to microsoft (setting asside the appauling policies and track record for scurity for that particular company) than never having that information be sent anywhere outside of each institution's own off-line computers...that is just beyond illogical. Can you clarify please?
@TrelisResearch7 ай бұрын
well the Irish accent can('t) be a bit tough!!! But yes, if legally you are not allowed to send information to any third party, then you would strictly be limited to running an LLM locally. I'm not an expert on what rules apply where, but clearly many health services use AWS/Azure, and so they are relying on security standards and compliance like SOC 2. And yeah, it's true that there are security breaches from big companies. At the same time, it would be naive to think that it is straightforward to achieve impenetrable security locally. That said, clearly the amount of data affected in breaches is more limited if everyone stores their data locally.