I am fascinated by the voice, there was a cleaning process or something that makes it sound so clear. I love that she give deep information on how it works, that's how I understand something well.
@jaimemcarlosa3 жыл бұрын
I 💙 Google BigQuery
3 жыл бұрын
Execution details tab helped a lot, I had to refactor some legacy queries with more than >600 stages to a cleaner and more optimized version. Love your product guys
@googlecloudtech3 жыл бұрын
Great to hear!
@dheer2112 жыл бұрын
i remember seeing an internal architecture diagram for BQ that comprises of shuffle, dremel, networking fabric (sorry forgot its name) can someone point me to that google blog or video , thanks?
@ankitlakum13 жыл бұрын
Thanks 😊
@AvinashSingh-vj3rk3 жыл бұрын
Nice video 👍
@RATANAGARWALITINFORMER3 жыл бұрын
GOOD
@jaeseokpark62413 жыл бұрын
Wonderful!!!
@gabrieldjebbar70982 жыл бұрын
Great !
@arjunk59593 жыл бұрын
Nice info !!
@majorcemp36123 жыл бұрын
So ... On your exemple there is Data skew because of the wait avg being lower than max average and read average being lower than max read ? How would you improve that ? (adding slots ? if yes, how many more ?)
@leighajarett2213 жыл бұрын
I would suggest trying to filter the data to get a more uniform distribution!
@majorcemp36123 жыл бұрын
@@leighajarett221 there is already a where like filtering, so what other filters would you use ? Or what else would you use ? Also here we take into account that the difference between avg and max is on the "wait" and "read" phase, so what could be the problem ?
@leighajarett2213 жыл бұрын
@@majorcemp3612 Ti get it more uniform you can try looking at the data to understand the distribution and then filtering the data to get rid of the "tail end" of the curve. For example, if I have a range of values from 0 to 100 and most of my rows have 95-100 this might overwhelm the slots that are processing data with those keys. Instead, you could try filtering the data to focus on just the subset of that information you need (e.g. I only care about values with 95 or above). But that might not always be possible depending on the question you are asking. Alternatively, you can split this up into two different queries - one where you analyze the information 0-95 and then other 95-100 so each has a more uniform distribution of that key. Hope that helps!
@shatakshiagrawal3062 Жыл бұрын
@@leighajarett221 great expanation!
@cslearner582 Жыл бұрын
Great video. One question: is the 'slot' and 'worker' the same in this context?
@think-tank665810 ай бұрын
Good question
@RazvanCristianLung3 жыл бұрын
Streams? Why can't we delete newly added rows?
@leighajarett2213 жыл бұрын
These videos take a few months to produce but don't worry, it's on my list!
@RazvanCristianLung3 жыл бұрын
@@leighajarett221 thank you
3 жыл бұрын
Just by the complexity of the architecture and number of caches involved is reasonable to tell that deleting a recent data is a really expensive operation and can slow down the query mechanism as a hole. There are a few workarounds that I found to bypass this issue. Try using updated_at timestamps to get more up-date-versions of a certain record or using materialized views with the filtered data.