How are integers encoded in Apache Parquet?

Рет қаралды 767

Learn Data with Mark

Күн бұрын

Пікірлер: 9

@danieleden1856 Жыл бұрын

Fantastic break down, thanks Mark

@learndatawithmark Жыл бұрын

Glad you liked it!

@andreasnordbass 11 ай бұрын

Awesome, hard to find good content on this topic!

@dominikseljan3043 11 ай бұрын

Amazing video Mark, your explanation and visualisation of everything was so nice!

@learndatawithmark 11 ай бұрын

Thanks! That's very kind of you :)

@pawarbi4675 Жыл бұрын

Excellent, how do you use this in practice? Check the cardinality of each column and then choose encoding before saving parquet? If schema is defined in spark for each column before saving parquet, are we doing the same thing effectively?

@learndatawithmark Жыл бұрын

I'm not sure what spark does actually - I'd have to check. I still find it kinda surprising that the parquet writers don't just optimise everything for you - that would make more sense to me! I need to see how much saving on space impacts on the query side. In theory there should be a trade off between the two, but I'm not sure how big it is

@nmstoker Жыл бұрын

Thanks for the nice video This makes sense where you have one or a few massive files, but if you've got a boatload of such files is there a way to make the computer apply rules of thumb for you (so it scales as a process rather than having a person spend five mins per file thousands of times over!)

@learndatawithmark Жыл бұрын

Which bit in particular do you mean or just in general? I reckon you could probably automate everything that I did in this video to retrospectively look at a bunch of existing parquet files and see if there's a better way to store things. Definitely wouldn't recommend doing it manually!