How are integers encoded in Apache Parquet?

  Рет қаралды 767

Learn Data with Mark

Learn Data with Mark

Күн бұрын

Пікірлер: 9
@danieleden1856
@danieleden1856 Жыл бұрын
Fantastic break down, thanks Mark
@learndatawithmark
@learndatawithmark Жыл бұрын
Glad you liked it!
@andreasnordbass
@andreasnordbass 11 ай бұрын
Awesome, hard to find good content on this topic!
@dominikseljan3043
@dominikseljan3043 11 ай бұрын
Amazing video Mark, your explanation and visualisation of everything was so nice!
@learndatawithmark
@learndatawithmark 11 ай бұрын
Thanks! That's very kind of you :)
@pawarbi4675
@pawarbi4675 Жыл бұрын
Excellent, how do you use this in practice? Check the cardinality of each column and then choose encoding before saving parquet? If schema is defined in spark for each column before saving parquet, are we doing the same thing effectively?
@learndatawithmark
@learndatawithmark Жыл бұрын
I'm not sure what spark does actually - I'd have to check. I still find it kinda surprising that the parquet writers don't just optimise everything for you - that would make more sense to me! I need to see how much saving on space impacts on the query side. In theory there should be a trade off between the two, but I'm not sure how big it is
@nmstoker
@nmstoker Жыл бұрын
Thanks for the nice video This makes sense where you have one or a few massive files, but if you've got a boatload of such files is there a way to make the computer apply rules of thumb for you (so it scales as a process rather than having a person spend five mins per file thousands of times over!)
@learndatawithmark
@learndatawithmark Жыл бұрын
Which bit in particular do you mean or just in general? I reckon you could probably automate everything that I did in this video to retrospectively look at a bunch of existing parquet files and see if there's a better way to store things. Definitely wouldn't recommend doing it manually!
Using DuckDB to diff Apache Parquet schemas
2:45
Learn Data with Mark
Рет қаралды 1,7 М.
How are strings encoded in Apache Parquet?
5:53
Learn Data with Mark
Рет қаралды 1,3 М.
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
36:55
bayGUYS
Рет қаралды 1,9 МЛН
My scorpion was taken away from me 😢
00:55
TyphoonFast 5
Рет қаралды 2,7 МЛН
Google SWE teaches systems design | EP44: Apache Parquet
13:35
Jordan has no life
Рет қаралды 2,1 М.
What Are Matryoshka Embeddings?
7:18
Learn Data with Mark
Рет қаралды 948
How Senior Programmers ACTUALLY Write Code
13:37
Thriving Technologist
Рет қаралды 1,6 МЛН
Microservices are Technical Debt
31:59
NeetCodeIO
Рет қаралды 720 М.
What is Apache Parquet file?
8:02
Riz Ang
Рет қаралды 80 М.
Row Groups in Apache Parquet
5:31
Learn Data with Mark
Рет қаралды 5 М.
SHA: Secure Hashing Algorithm - Computerphile
10:21
Computerphile
Рет қаралды 1,2 МЛН
The columnar roadmap: Apache Parquet and Apache Arrow
41:39
DataWorks Summit
Рет қаралды 34 М.
Python's decimals SOLVE the floating point problem!
17:00
Carberra
Рет қаралды 3,1 М.