OnehouseHQ

Пікірлер

@yuweixiao1943 5 ай бұрын

will existing data in postgres be synced to hudi too? or just changes since the creation of the streaming

@padam_discussion 6 ай бұрын

Interesting video... great

@SoumilShah 8 ай бұрын

great video

@onehouseHQ 8 ай бұрын

Glad you ejoyed it!

@HoorayforOranges 9 ай бұрын

Thank you so much for this. This is the only video I could find that takes a real deep dive into the data without propaganda towards any one candidate.

@JG-zu6nq Жыл бұрын

mistake at 22:41, there's no limitation that you 'cant cross over the boundary' in a query when you do partition evolution in Iceberg

@kjweller Жыл бұрын

You can cross the boundary, but the query predicates need to be right to get the same performance across both partition schemes.

@JG-zu6nq Жыл бұрын

@@kjweller what exactly does that mean, one just has to write select * from table where ts > timestamp '2023-08-21 00:00:00' and even if the partitioning was evolved from say daily to hourly on 08/25 that will work and prune the partitions

@kjweller Жыл бұрын

@@JG-zu6nq take an example if you were partitioning by date daily, and you want to evolve this to partition by userId or vice-versa. A query with only one of the predicates will be efficient just for that section of the partitioned data. It works great for evolving partitioning within different aggregate levels of same value, but struggles across different values.

@paulfunigga Жыл бұрын

@@kjweller what about schema evolution, in your article it says that hudi's schema evolution is good only on spark sql. What if I use hudi with trino? Is schema evolution going to be bad? Also, is hudi good with trino at all? In trino's slack channel they said that they prioritize iceberg.

@paulfunigga Жыл бұрын

@@kjweller also, in your "which format to choose" why didn't you add another point: hudi's table services are managed, compared to iceberg and delta lake, I think it's a big thing.

Ең жақсы KZbin

Пікірлер