This was a really nice interview and interesting project. It’s incredible the superpowers that we developers have gained over the last two years. Things that you could’ve asked for 10 years ago and I would’ve said maybe with a year and a few million dollars worth of headcount are now an API call away. I have LLM‘s integrated into nearly every part of my workflow and my tooling. The way I work now looks almost nothing like the way it used to. I want to know more about the price difference with Gemini flash versus Whisper for transcription particularly with all the many flavors of local whisper that are available. I’ll have to do some research on this.
@swillison10 күн бұрын
OpenAI charge $0.006 / minute for their Whisper API - so an hour of audio would cost 36 cents. Gemini 1.5 Flash is $0.075 for 1 million tokens and every second of audio is charged as 25 tokens, which means an hour is 90,000 tokens and hence costs just 0.675 cents - so it's over 50x cheaper!
@ftk5259 күн бұрын
@@swillison If you use GPU spot instances yourself you can run whisper large v3 turbo at about a penny per hour. Since this project only requires timestamping, and appears to have a high tolerance for timestamps not being exactly accurate, I would think your guest would be well served with just whisper tiny, which you can run at roughly 10x on a single CPU - basically free.
@scottieapplseed26 күн бұрын
Fantastic tool and fun examples that actually demonstrate fun little use cases.
@arpitgarg5172Ай бұрын
Is there a place where one could explore all the published datasettes?
Can i use this Plugin along with MySQL .. getting errors.
@MatthewTerry-suade3 ай бұрын
Thank you, very informative
@bbcc29603 ай бұрын
Thank you for sharing.
@energyexecs4 ай бұрын
Thank you Simon Wilson. Great information. I especially like how you demonstrated the development of your own tools. Finally my thoughts is your presentation in an executive summary format will educate policy makers in both the enterprise and government sector who seem to have fear of AI. For example my company has an existing early policy that employees are not allowed to use AI or ChatGPT. At the same time my Use Case to leverage RAG was to augment our LLM was accepted by our AI Review Committee. My thought is the enterprise companies will be careful and prudent in the rollout of LLMs and AI tools because they will want “security rails” in place. Thank you.
@canadianrepublican11854 ай бұрын
Thank you .
@schalkdormehl30574 ай бұрын
Mask, in 2023...
@Tony_Indiana4 ай бұрын
I just took a poll. And people said if you could show us OSINT using a model like mistral that is mostly uncensored (Dolphin/Instruct) or whatever your preference is then gpt4. Everyone who responded agreed that would be something we would pay for. 2024 tips and tricks LLMs and OSINT. But there are advantages to uncensored.
@tutacat5 ай бұрын
This is the best fundamental way of describing embeddings.
@tutacat5 ай бұрын
This is what microsoft recall wants to do
@tutacat5 ай бұрын
This man is truly based.
@codenocode5 ай бұрын
this is really nice! thanks for sharing.
@Clammer9995 ай бұрын
I’m totally new to embeddings and this video inspired me to want learn even more!
@codenocode6 ай бұрын
I've recently stumbled across your work which I read about in Gergely's book "Software Engineering Guidebook". Fantastic find. Love the creativity here.
@enigmeta6 ай бұрын
Love this! Would be useful to mention you need to run datasette in --root mode in order to make modifications, it took me a while to find this.
@Speejays26 ай бұрын
Is it possible to replace the OpenAI API key with local vision model instead?
@monKeman4956 ай бұрын
there should be some kind of authorized base restriction on internal llm tokens to normal public
@_ramen6 ай бұрын
very great demo, thanks for sharing! this is an excellent example of practical use of embeddings.
@brcosmin6 ай бұрын
Thanks for linking yourself on ycombinator, very interesting talk and quite engaging delivery.
@QINGCHARLES7 ай бұрын
The future is wild. Imagine how good this will be 6 months or a year from now.
@MichelBinkhorst7 ай бұрын
New to Datasette. Just installed it on OSX with Homebrew, and added the Extract plugin, but I'm not seeing the 'database actions' button. Am I missing something?
@jmottishaw7 ай бұрын
same here on Windows in a fresh venv
@AP-hv5dh7 ай бұрын
🔥
@subinalex887 ай бұрын
Nice
@ecosse647 ай бұрын
That's fantastic. Does it work across multiple websites and in different languages? For example, if you wanted to provide a list of specific events in a country where both English and Spanish or Italian are spoken but have a single database in English.
@kai.diefenbach7 ай бұрын
Awesome!
@anne-marieroy88127 ай бұрын
Thanks very interesting and useful.
@sebastianwagner58437 ай бұрын
Things start to become magical.
@muddasirkhan8057 ай бұрын
This was so good! Please do more of these - i am still in awe!! Thank you!
@zgintasz28 ай бұрын
"vibes-based search" lol. love the term you invented.
@korolyovPavel8 ай бұрын
Cool
@curtisblake26110 ай бұрын
Impressive fast talking and fast scrolling. A lot of knowledge and experience for sure. I guess I'll have to do some digging if I want to really benefit from this lecture.
@asiddiqi12310 ай бұрын
Why are you wearing a mask 😷?
@KayButtonJay9 ай бұрын
Maybe he doesn’t want to get people sick genius
@schalkdormehl30574 ай бұрын
@@KayButtonJay he wouldn't if he didn't wear a mask either.
@rileydavidjesus11 ай бұрын
What a genius
@silentbob123611 ай бұрын
Looks great, but setting it up is not easy... I have installed plugins, created a config file for the API key, and started Datasette, but nothing ever changes. A setup video for Windows or Linux demonstrating how to set up plugins would be appreciated!
@swillison11 ай бұрын
Did you run Datasette with the --root option and click the link to sign in as root? That's the most likely cause for it not working. Feel free to open an issue on GitHub if that doesn't help - and I agree, I need to build a tutorial for this.
@BillyRichardson11 ай бұрын
Very cool! thanks for sharing
@sennetor11 ай бұрын
Synthetic data 😁
@johnh6959 Жыл бұрын
OMG, the pause button got a workout. Cheers!
@miikalewandowski7765 Жыл бұрын
How does the semantic vectorization of a word look like, in a mathematical sense ? Is it like every word has it’s spatial ID (coordinate) and gets kind of multiplied with a vector array of assoziatives IDs?