How does query processing work in BigQuery?

Рет қаралды 45,210

Күн бұрын

Пікірлер: 22

@divertechnology 3 жыл бұрын

I am fascinated by the voice, there was a cleaning process or something that makes it sound so clear. I love that she give deep information on how it works, that's how I understand something well.

@jaimemcarlosa 3 жыл бұрын

I 💙 Google BigQuery

3 жыл бұрын

Execution details tab helped a lot, I had to refactor some legacy queries with more than >600 stages to a cleaner and more optimized version. Love your product guys

@googlecloudtech 3 жыл бұрын

Great to hear!

@dheer211 2 жыл бұрын

i remember seeing an internal architecture diagram for BQ that comprises of shuffle, dremel, networking fabric (sorry forgot its name) can someone point me to that google blog or video , thanks?

@ankitlakum1 3 жыл бұрын

Thanks 😊

@AvinashSingh-vj3rk 3 жыл бұрын

Nice video 👍

@RATANAGARWALITINFORMER 3 жыл бұрын

GOOD

@jaeseokpark6241 3 жыл бұрын

Wonderful!!!

@gabrieldjebbar7098 2 жыл бұрын

Great !

@arjunk5959 3 жыл бұрын

Nice info !!

@majorcemp3612 3 жыл бұрын

So ... On your exemple there is Data skew because of the wait avg being lower than max average and read average being lower than max read ? How would you improve that ? (adding slots ? if yes, how many more ?)

@leighajarett221 3 жыл бұрын

I would suggest trying to filter the data to get a more uniform distribution!

@majorcemp3612 3 жыл бұрын

@@leighajarett221 there is already a where like filtering, so what other filters would you use ? Or what else would you use ? Also here we take into account that the difference between avg and max is on the "wait" and "read" phase, so what could be the problem ?

@leighajarett221 3 жыл бұрын

@@majorcemp3612 Ti get it more uniform you can try looking at the data to understand the distribution and then filtering the data to get rid of the "tail end" of the curve. For example, if I have a range of values from 0 to 100 and most of my rows have 95-100 this might overwhelm the slots that are processing data with those keys. Instead, you could try filtering the data to focus on just the subset of that information you need (e.g. I only care about values with 95 or above). But that might not always be possible depending on the question you are asking. Alternatively, you can split this up into two different queries - one where you analyze the information 0-95 and then other 95-100 so each has a more uniform distribution of that key. Hope that helps!

@shatakshiagrawal3062 Жыл бұрын

@@leighajarett221 great expanation!

@cslearner582 Жыл бұрын

Great video. One question: is the 'slot' and 'worker' the same in this context?

@think-tank6658 10 ай бұрын

Good question

@RazvanCristianLung 3 жыл бұрын

Streams? Why can't we delete newly added rows?

@leighajarett221 3 жыл бұрын

These videos take a few months to produce but don't worry, it's on my list!

@RazvanCristianLung 3 жыл бұрын

@@leighajarett221 thank you

3 жыл бұрын

Just by the complexity of the architecture and number of caches involved is reasonable to tell that deleting a recent data is a really expensive operation and can slow down the query mechanism as a hole. There are a few workarounds that I found to bypass this issue. Try using updated_at timestamps to get more up-date-versions of a certain record or using materialized views with the filtered data.