05 - Columnar Databases & Compression (CMU Intro to Database Systems / Fall 2022)

Рет қаралды 20,495

CMU Database Group

Күн бұрын

Пікірлер: 22

@sontungnguyen4527 Жыл бұрын

Sound quality is much better in this one compared to previous lectures. Thank you!

@Tryagainbtf2 2 жыл бұрын

At 31:20, it is mentioned that nobody uses cubes anymore. The company (ActiveViam) I work at produces an in-memory column store paired with an aggregation engine/data cube, supporting real-time transactions. It is a very niche product, but it does still exist and fill a specific need. Loving the lecture! Cheers

@Azxsdwrrbh 2 жыл бұрын

Graduated CMU student comes back and listen the newly added lecture.

@foxl9195 2 жыл бұрын

Do you do databases after graduation? :)

@Azxsdwrrbh 2 жыл бұрын

@@foxl9195 I took 15645 at school.

@foxl9195 2 жыл бұрын

I mean, do you develop databases these days?

@OOpSjm 11 ай бұрын

@@foxl9195This is an intro to the database course.

@foxl9195 2 жыл бұрын

1:10:50 On incremental encoding: Why does Andy say you don't need to replay the whole thing? If we want to decode the "robbing" word, we have "ing" suffix and and the length of prefix, 4, but to get that prefix we first need to decode "robbed", and so on until we decode all the values up to the word that doesn't have a common prefix with a previous word, in this example it's "rob".

@monfera 2 жыл бұрын

One technnique with differential (delta) encoding is to periodically restate the full value (not necessarily in the same physical column, as it may need more bits) so there's a latch-on point. A bit like how video compression does a bunch of delta compression between frames but also periodically restates the full frame (that's lossy encoding so it's also for errors to not cumulate too much)

@monfera 2 жыл бұрын

I just see here it's a different technique, the prefix can be referenced from the Common Prefix table, so there's no need to start from the beginning. It's not super clear to me how it saves space. Maybe it uses a tree structure with structural sharing for representing the strings in the Common Prefix table, or it relies on better compressibility of the two derived vectors compared to the original vector (esp. if there's exact repetition of the prefix, often the case if there are lots of words in alphabetical order). Just guessing though

@建平許 2 жыл бұрын

Wow! The new chapter.

@olegpatraschku3736 Жыл бұрын

at 55:53 and later on I think the id column didn't skip id 5 on purpose ?

@haitaoyang-gf5fk 8 ай бұрын

About the bit-packing algorithm, how does data system know the max value of a column? scan the whole column?

@allencheri9286 2 жыл бұрын

OLTP, row store. OLAP, column store. Which store used by HTAP? hybrid store?

@stackunderflow5951 2 жыл бұрын

Typically they shore the OLTP data in the row store component, which are asynchronously replicated into the column store component. There is something identifying the category of queries (OLTP query or OLAP query) and routes them to the appropriate component.

@manmohanmundhraa3087 Жыл бұрын

can small tables with say less then 5 column , can replace columner database ?

@rainwave5 Жыл бұрын

51:50 Someone wasn't paying attention in earlier lectures 😄

@juan-tj1xf Жыл бұрын

hIT IT!

@sasuke_2145 2 жыл бұрын

Can somebody explain this part: kzbin.info/www/bejne/p2W6ZqVpfLdjhbc , i didn't quite get that fixed length values in column part. Like how does stiching of column works when fetching entire row?

@mephistotel87 Жыл бұрын

The same. You can't just jump to this offset because columns are in pages, which are unordered and scattered around database file. Still don't understand this part.