Advancing AI - DBRX & The AI Playground
24:14
Turning 40 @ SQLBits!
6:25
2 ай бұрын
Пікірлер
@alexischicoine2072
@alexischicoine2072 3 күн бұрын
Deletion vectors are amazing. They improve concurrency as well which is detailed on the page about isolation and serialization. If you need to delete data about customers for compliance it’s great. Also if you need to replicate your data to another region you won’t be creating as many extra files that need to be transferred and stored so you can get good savings from that as well. Imagine if you have big gigabyte parquet files in a huge table and you need to delete a record here and there it will make a massive difference.
@FaithDamah
@FaithDamah 5 күн бұрын
❤❤❤
@ManCar1608
@ManCar1608 6 күн бұрын
How come the base layer shows what is in the source system when you have applied cleansing?
@samulimustonen2047
@samulimustonen2047 11 күн бұрын
I was asked this medallion architecture in job interview and i didnt know except ive heard the name. The interviewer explained it enthusiastically and i just commented that its the same as any other process, he didnt seem to like my response. I didnt get picked for the role and also didnt want to as the interviewer seemed arrogant. Its not rocket science.
@kb1629
@kb1629 12 күн бұрын
Only available for F64 or higher SKUs
@pini22ki
@pini22ki 12 күн бұрын
Is this working in databrick SQL mode?
@pasco_luya
@pasco_luya 13 күн бұрын
awesome demo! I dont see where the online tables were ever used, can you clarify this? thank you 😊
@Juan-PaulHynek
@Juan-PaulHynek 15 күн бұрын
Brilliantly put together, thanks Simon!
@ezequielchurches5916
@ezequielchurches5916 16 күн бұрын
clickstream_raw is mapped with BRONZE layer? clickstream_cleaned is mapped with SILVER LAYER? how could I map each delta table with the medallion layers?
@johnnywinter861
@johnnywinter861 16 күн бұрын
oomph... throwing shade at the SQL editor 😵
@alexischicoine2072
@alexischicoine2072 16 күн бұрын
Where would you use a variable instead of a widget?
@AdvancingAnalytics
@AdvancingAnalytics 16 күн бұрын
I realised as I finished filming I should have thrown SQL Widgets in as well. The main difference is Variables can be derived from the data (ie: select max(id) from mytable), but widgets can be passed in externally. Both are super useful!
@selimberntsen7868
@selimberntsen7868 16 күн бұрын
Great video, will really come in handy! Can these features only be used on SQL pools or also on Spark clusters?
@alexischicoine2072
@alexischicoine2072 16 күн бұрын
That syntax leaves you confused as to if the identifier is a column or a variable I think I prefer the syntax with the $ that snowflake has.
@alexischicoine2072
@alexischicoine2072 16 күн бұрын
Nice name no servers here. Can’t wait to try the serverless for jobs.
@TheDataArchitect
@TheDataArchitect 17 күн бұрын
Great job bro.
@skipa9906
@skipa9906 18 күн бұрын
Great video. Where can we get the sample files?
@carlmerritt4355
@carlmerritt4355 18 күн бұрын
Mazing video - subbed
@julius8183
@julius8183 22 күн бұрын
What a great video - again! You're a legend. Well spoken and explained.
@julius8183
@julius8183 22 күн бұрын
Finally a good, solid video that explains it well. Thanks! I would love to see a follow-up where you actually land some data in Bronze and transform it to Silver in development. How does the data look like in the containers? How does the tab "catalog" show the data in Databricks? How are they related? I want to know these things but can barely find anyone explaining it well. I basically want to build an enterprise lakehouse from scratch. Thanks!
@NoahPitts713
@NoahPitts713 24 күн бұрын
This is a great step in the right direction for the browser UI! Happy days ahead
@katetuzov9745
@katetuzov9745 24 күн бұрын
Before each recipe provide a short life story that is vaguely related to the topic 😂
@katetuzov9745
@katetuzov9745 24 күн бұрын
Seriously though, 1 million tokens seems like much, until you factor in that each question+answer take 600 tokens.
@AdvancingAnalytics
@AdvancingAnalytics 24 күн бұрын
1mil tokens, assume 600 p/request so 1,666 requests. I think in DBU costs that puts it around £0.01 per request in British money. Not bad at all, but could get expensive if you were using this for per-row calculations in ETL etc
@DanKeeley
@DanKeeley 25 күн бұрын
seems speed is the focus on all the new models, e.g. the brand new fast and technically teading opensource llama 3 . I have to say it's quite fun to see leaders knocked off their perch every few weeks.
@AdvancingAnalytics
@AdvancingAnalytics 24 күн бұрын
Ironically, my video production was too slow to avoid the Llama3 announcements the day after I filmed this 😅
@xpaulone
@xpaulone 25 күн бұрын
Hi, it is possible enable Change Data Feed on the mirrored delta table?
@mohammedsafiahmed1639
@mohammedsafiahmed1639 26 күн бұрын
first
@alisabenesova8911
@alisabenesova8911 26 күн бұрын
Hate notebooks and hate to do it on browser :D
@mkrichey1
@mkrichey1 27 күн бұрын
I love that they would take such a powerful tool and open source it in the middle of a AI gold rush. Excited to use it and also to see what it prompts in the future 😃
@Mahmoudalgindy
@Mahmoudalgindy 27 күн бұрын
Hmm, this means I want to learn more a lot. Thanks so much. Joining the Data world makes me learning every minute in my life 😫
@josephjoestar995
@josephjoestar995 28 күн бұрын
I was looking forward to your video on this! We do have a use case where we want to perform some analysis on our data, it’ll be interesting if DBRX can do this for us
@alexischicoine2072
@alexischicoine2072 29 күн бұрын
I’m curious how expensive it would be to have a custom dbrx model and what are the advantages versus a rag approach. One nice thing with rag is that it’s much easier to control access to proprietary information as if you don’t put it in the prompt the model doesn’t have it. With a custom model it might leak information to the wrong users so then you probably need a custom model for different permissions.
@AdvancingAnalytics
@AdvancingAnalytics 29 күн бұрын
There will certainly be a tipping point of simply having a fine-tuned model on reserved capacity rather than pay-per-token. If you're repeatedly throwing thousands of tokens at a model all day, it'll get fairly pricey. Definitely a huge amount of access/responsibility consideration when you are fine tuning though. Sounds like a video idea 😅
@john-paulallard256
@john-paulallard256 Ай бұрын
Your videos are great. I am attempting a selective overwrite on delta tables but can not get it to work with unity catalog but I can with hive metastore. Any advice or pointers?
@jpcst
@jpcst Ай бұрын
Great and simple explanation! Thanks for sharing this!
@paulnilandbbq
@paulnilandbbq Ай бұрын
Nice off to browser to try the debugger!
@Dustii208
@Dustii208 Ай бұрын
Just pronounce HIPAA as hip-ah, much easier. Thanks for the updates, these are so useful.
@vijayjavlekar
@vijayjavlekar Ай бұрын
Great thoughts. Good to know I wasn't the only one thinking in this direction. Wondering if can we go one step further and only have a 2 tier architecture? 1.Staging/Bronze layer 2. combine Silve/Gold layer and directly building star schemas delta tables?
@DataMyselfAI
@DataMyselfAI Ай бұрын
Thanks Simon, love the news series :)
@allthingsdata
@allthingsdata Ай бұрын
The BROWSE permission did not fly well with one of our teams fearing that now everyone sees what they have and it's gonna rain access requests.
@AdvancingAnalytics
@AdvancingAnalytics Ай бұрын
Hah - well the data enthusiast in me sees that as a good thing, people wanting to use and get value from the data! But if they don't want it discoverable, they don't /have/ to grant BROWSE access to any groups!
@Milhouse77BS
@Milhouse77BS Ай бұрын
Title says Dec 2023- Jan 2024". Maybe meant to end with "Apr 2024"?
@yawningbrain
@yawningbrain Ай бұрын
the LLM hallucinated :D
@AdvancingAnalytics
@AdvancingAnalytics Ай бұрын
Haha, copied the old title! Updated!!
@nicolas22199
@nicolas22199 Ай бұрын
Hey great video , quick question, when I'm doing the merge in Synapse using python I actually get duplicated rows, both the original and the updated on do you know why this is happening ?
@rahulsood81
@rahulsood81 Ай бұрын
Can you explain the difference between using Auto-loader and Structured Streaming (readStream/writeStream) Also, when/how to use foreachBatch in Databricks ??
@VladimirTheAesthete
@VladimirTheAesthete Ай бұрын
Is there a way to use Databricks Connect on clusters that do not have Unity Catalog enabled? I am using Runtime 13.3, if that is relevant.
@sivakumarkanagaraj
@sivakumarkanagaraj Ай бұрын
Thank you. You solved a million dollar question I had.
@petersandovalmoreno5213
@petersandovalmoreno5213 Ай бұрын
we may write on this volumens?
@zacharythatcher7328
@zacharythatcher7328 Ай бұрын
Thanks for the update. I am considering fabric, but having trouble understanding the consumption estimates that Microsoft provided in their docs. In particular, they claim that "Every 4MB, per 10,000" will cost 306 CU seconds per read on One Lake. Do y'all have any idea what this means, or can you potentially bring it up to some of your contacts and push for clarification? I ask because depending on how you interpret it, fabric warehouse reads can cost 250x more than sql serverless pools on synapse, or 40x less.
@eivindhaugen8640
@eivindhaugen8640 Ай бұрын
Many great announcements. From Arun's blog - the most underrated: External sharing and Metric layer (as mentioned).
@NickJe
@NickJe Ай бұрын
MGM Vegas to announce folders! 😂
@danhorus
@danhorus Ай бұрын
"Not folders. Probably subfolders." 🤣
@KangoV
@KangoV Ай бұрын
I'd love to be able to use this from Java. I'm have a TAXII server to provide access to Cyber security data in Databricks. This potentially opens the door for lots of external services.
@sumashruthika7852
@sumashruthika7852 Ай бұрын
Great video!
@udayshuklabcp2782
@udayshuklabcp2782 Ай бұрын
Looks Exciting 🎉
@yuvakarthiking
@yuvakarthiking Ай бұрын
Hi Simon, I having trouble is sending data to Domo using pydomo lib. As I am using external location in UC as source path the OS.list or other OS functions are not able to read the files in abfss path. Is there any solution to this ?