Anonymizing Sensitive Data in LLM Prompts

  Рет қаралды 2,754

Trelis Research

Trelis Research

Күн бұрын

Пікірлер: 14
@TrelisResearch
@TrelisResearch 7 ай бұрын
Apologies on 720 px video quality. I had camera issues and needed to use my (lower quality) laptop camera to keep things moving and get the video out.
@francycharuto
@francycharuto 7 ай бұрын
Just in time! Thank you!
@TrelisResearch
@TrelisResearch 7 ай бұрын
great stuff
@irbsurfer1585
@irbsurfer1585 7 ай бұрын
Fascinating demonstration on text anonymization! Wondering how this could extend to audio streams, especially with the new GPT-4o's audio capabilities. Here's a thought: could we use a WebSocket to buffer incoming audio, run it through Whisper AI for transcription, and then use SpaCy to mark timestamps for anonymization? FFmpeg could then edit these segments by inserting silences, and HTTPx could handle streaming the edited audio to GPT-4o. Seems like a promising way to maintain privacy in audio-based LLM interactions. Would love to hear your thoughts on this!
@TrelisResearch
@TrelisResearch 7 ай бұрын
That's a cool thought. What you describe sounds doable, although adds latency! In principle, it should be possible to develop a spaCy type model that would work on sound directly, and then you can have an overdubbing type tool to replace faker. Definitely goes beyond what's available today - but probably some day this will exist as modules we can piece together.
@seththunder2077
@seththunder2077 7 ай бұрын
Can you do a follow up please where you do use the wrapper of presidio with langchain in a RAG application? I assume we'd also need ot implement it on our documents and not just the prompt if we are using closed source embeddings and llm too
@TrelisResearch
@TrelisResearch 7 ай бұрын
the langchain docs are really good and they have some colab notebooks you can check out. I played around with one and it's in the ADVANCED-inference repo, but langchain is a pretty high level wrapper that makes it hard to understand what's underneath so it wasn't too suited to this vid. But it's very good for this application as it has good wrappers. prob I won't do another video because I need to get back to pre-training stuff and also want to do some voice+text modeling - like gpt 4o appreciate the comment
@baohuynh5462
@baohuynh5462 7 ай бұрын
Many thanks. I’m currently working on training a smaller LLM with about 3 billion parameters. Do you have any tips or experiences you could share on achieving good results with a small model? Thanks in advance
@TrelisResearch
@TrelisResearch 7 ай бұрын
actually I'm using a 3B model here! phi-3 mini, and you can see that there are good results maybe start with that as your base model. Regarding specifics on getting good results, take a look through the Trelis Research fine-tuning playlist
@baohuynh5462
@baohuynh5462 7 ай бұрын
@@TrelisResearch thank you so much!
@ukcp265
@ukcp265 7 ай бұрын
Can llama3 can replace annonymization?
@TrelisResearch
@TrelisResearch 7 ай бұрын
Can you expand on your question?
@onoff5604
@onoff5604 7 ай бұрын
It is possible that I am not able to hear the negatives in what you are saying, so that "can" sounds like "can't" and vice versa, but it sounds like you are saying that it is MORE secure for hospitals and schools to send private information (that MUST NEVER EVER EVER be sent to ANY third party EVER under ANY circumstances by law and by ethics) over the internet to microsoft (setting asside the appauling policies and track record for scurity for that particular company) than never having that information be sent anywhere outside of each institution's own off-line computers...that is just beyond illogical. Can you clarify please?
@TrelisResearch
@TrelisResearch 7 ай бұрын
well the Irish accent can('t) be a bit tough!!! But yes, if legally you are not allowed to send information to any third party, then you would strictly be limited to running an LLM locally. I'm not an expert on what rules apply where, but clearly many health services use AWS/Azure, and so they are relying on security standards and compliance like SOC 2. And yeah, it's true that there are security breaches from big companies. At the same time, it would be naive to think that it is straightforward to achieve impenetrable security locally. That said, clearly the amount of data affected in breaches is more limited if everyone stores their data locally.
Improving LLM accuracy with Monte Carlo Tree Search
33:16
Trelis Research
Рет қаралды 16 М.
Reasoning Models and Chinese Models
33:21
Trelis Research
Рет қаралды 789
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 31 МЛН
Support each other🤝
00:31
ISSEI / いっせい
Рет қаралды 81 МЛН
Output Predictions - Faster Inference with OpenAI or vLLM
24:23
Trelis Research
Рет қаралды 1,6 М.
Predicting Events with Large Language Models
25:09
Trelis Research
Рет қаралды 3,6 М.
Create Training Data for Finetuning LLMs
22:29
APC Mastery Path
Рет қаралды 7 М.
Synthetic Data Generation and Fine tuning (OpenAI GPT4o or Llama 3)
1:17:59
Best Ways to Use Gemini 2.0 (over ChatGPT & Perplexity)!
16:06
Grace Leung
Рет қаралды 24 М.
LiteLLM - One Unified API for for all LLMs
17:37
Trelis Research
Рет қаралды 1,5 М.
Ultimate NotebookLM Guide (Google's AI Note-Taking App)
17:47
Tool Finder
Рет қаралды 9 М.
How Deepseek v3 made Compute and Export Controls Less Relevant
1:01:57
Trelis Research
Рет қаралды 4,2 М.
Are you using a Hacked AI system?
27:06
David Bombal
Рет қаралды 139 М.
How this LED Tech MADE IN CHINA is impacting the whole WORLD
15:54
Rafa Goes Around!
Рет қаралды 1,7 М.