05 - Buffer Pools + Memory Management (CMU Databases Systems / Fall 2019)

Рет қаралды 42,790

Күн бұрын

Пікірлер

@kuntaliem 5 жыл бұрын

Thank you for the awesome video. I guess PostgreSQL also supports some form of scan sharing. For each relation, it keeps track of the starting location (implemented in ss_get_location()) from where a scan should begin. It tries to synchronize multiple scans on the same relation.

@marcoq7160 4 жыл бұрын

0:15 DJ Drop Tables has some problems 0:48 Administrivia 1:18 Database workloads 1:47 Bifurcated environment: OLTP data silos -> Extract-Transform-Load (ETL) -> OLAP data warehouse 3:17 HTAP shifts some of the analytical queries to the OLTP side 3:59 Database storage 4:54 Spatial vs Temporal control 5:48 Disk-oriented DMBS 6:49 Today's agenda: buffer pool manager, replacement policies, other memory pools 7:22 Buffer pool organization: array of fixed-size frames 9:11 Buffer pool meta-data: page table, dirty flag, pin counter 12:30 Locks (for DB's logical contents) vs latches (for internal structures, mutex) 15:19 Page table (in-memory) vs Page directory (on disk) 16:30 Allocation policies: global vs local policies 17:53 Buffer pool optimizations: multiple buffer pools, pre-fetching, scan sharing, buffer pool bypass 18:42 Multiple buffer pools (reduce latch contention and improve locality) 21:26 Multiple buffer pools: 1) object ID, 2) hashing 22:58 Pre-fetching (based on a query plan) 28:24 Scan sharing 32:55 Where do intermediate query results go? 35:07 Reminder: relational model is unordered 36:05 A question from student 36:44 Buffer pool bypass 38:21 OS page cache (use direct I/O, O_DIRECT to bypass it) 40:13 Postgres demo 55:06 Why don't other systems use OS page cache as Postgres does? (hard to guarantee consistent behavior across different OSs) 58:19 Buffer replacement policies 1:00:15 Least-recently used (LRU) 1:01:00 Clock (approximation of LRU) 1:03:36 Sequential flooding 1:05:23 Better policies: LRU-K (estimates the interarrival times of references on a page by page basis) 1:06:21 Better policies: localization (evict on a per txn/query basis) 1:07:10 Better policies: priority hints 1:08:39 Dirty pages (trade-off) 1:10:10 Background writing 1:11:06 Other memory pools 1:11:20 Conclusion 1:11:47 Project #1 1:12:45 Task #1 - clock replacement policy 1:13:37 Task #2 - buffer pool manager 1:14:23 Getting started

@luisponce3580 4 жыл бұрын

beast

@anuragshah3433 11 ай бұрын

Why did query performance not improve post moving every tuple to buffer pool at 49:13? Time stamp of query run with tuples being fetched from disk - 43:37.

@linziye7714 5 жыл бұрын

precious resource for database! Thank you CMU!

@AndersonSilva-dg4mg 5 жыл бұрын

Hooray! new video. Thankyou very much!

@anikethdeshpande8336 2 жыл бұрын

@4:33 ---> which is the new hardware where we can push the execution logic to disk ?

@andypavlo 2 жыл бұрын

pages.cs.wisc.edu/~jignesh/publ/SmartSSD.pdf

@rodypar317 2 жыл бұрын

kzbin.info/aero/PL5Q2soXY2Zi-Mnk1PxjEIG32HAGILkTOF

@roly4301 4 жыл бұрын

Thank you making available these high-quality lectures and programming assignments. Was wondering if there's any chance you could release the grading scripts for previous offerings of the course?

@andypavlo 4 жыл бұрын

We're working on it.

4 жыл бұрын

Yes. I love to. This is an another great resource for this amazing course.

@AshishNegi1618 4 жыл бұрын

@Vipul Bhardwaj i am also interested. But i think concern for Andy is that online discussion forum will cause solutions to leak. we can privately discuss in our emails.

@prathibhapb 3 жыл бұрын

22:50 - How does Approach #2 (Hashing) help if each database has its own buffer pool?

@zhenghanghu2430 2 жыл бұрын

I think the point is if you don't have the one-buffer-pool-per-table set up, then you can use hashing to find out the buffer pool each page belongs to. If the one-buffer-pool-per-table set up is applied, then you don't need hashing. (Or, you can have multiple buffer pools for each table, that way hashing can be useful.) We also see in the lecture we can have a buffer pool for each query, that case we don't need hashing neither.

@ljratner 5 жыл бұрын

Slide 16 is missing from the PDF.

@drevil7vs13 4 жыл бұрын

30:58 I wonder if Oracle supports cursor sharing for 2 queries with the same sql_id but with different bind variable values

@moazeldefrawy4379 3 жыл бұрын

It probably does? since the values for each row are stored together (aka .. it's a row database)

@yashwinenamadi1673 4 жыл бұрын

Why does any os allow DBMS to maintain its own memory i.e., the buffer pool? DBMS is a mere process running on the os. Can't the buffer pool be swapped out of memory by the os to accommodate other processes' memory? Or is there a hardware support for this buffer pool dedicated to be managed by a DBMS software?

@meamzcs 3 жыл бұрын

Sure the OS can but that's not that DBs job to worry about. The OS does its own bookkeeping to determine what memory pages it swaps and if it's the buffer pool that's swapped out that means most likely every other page in memory is better to have in memory than the buffer pool... And once the DBMS accesses the buffer pool again you will get a page fault anyway which will lead to the OS pulling it back into memory...

@sebastiannyberg3008 Жыл бұрын

The memory used by the buffer pool is no different from any other virtual memory given by sbrk. The buffer pool memory can be "swapped out" to a swap partition to avoid OOMkill, but not written out by the OS like pages from a file opened without O_DIRECT. Note that with a DBMS you usually don't want swap partitions due to the hard-to-debug performance degradation.

@sebastiannyberg3008 Жыл бұрын

Note that the Linux kernel used to separate the buffer cache from the page cache, but the two were merged some ways back. The OS obviously needs to bring in the blocks from disk, and does some (minimal) caching even when using O_DIRECT. You can read more about the O_DIRECT flag in the open(2) syscall manpage

@williamweiler449 3 жыл бұрын

How do multiple buffer pools reduce latch contention if threads still must contend for latches on the same page?

@rofamohamed2398 3 жыл бұрын

Each buffer pool has a page table related to it, so you have multiple buffer pools and also multiple page tables, which will reduce the latch contention due to we increased the number of page tables. Hope i cleared it enough.

@williamweiler449 3 жыл бұрын

@@rofamohamed2398 Ah, so latches are placed on the page table, not the buffer page itself? thanks for the reply

@rofamohamed2398 3 жыл бұрын

@@williamweiler449 Yeah, latches are placed on the page table, to prevent concurrent access to the position that will hold the pointer to the buffer pool frame, which will be helpful when we need to get a specific page to know which frame the page resides in and that's the job of page table (to know the positions of the pages inside the buffer pool).

@himanshuhbk953 5 жыл бұрын

Can non-CMU students submit these assignments ?

@jcdyer3 4 жыл бұрын

Probably not, that's asking for a big donation of unpaid work from the course staff, but you can do them yourself, and then research the answers to check if you were right.

@drakenguyen2390 4 жыл бұрын

thank you for this high quality content. This is a long shot but is there anyway we can have a solution for the project? Like sample code for for eg.

@bisakhmondal8371 3 жыл бұрын

Coolest Prof. ever. Thanks for the awesome contents. Btw who is "tk"(he/she is in every video)!! XD

@sheikhmuhammedshoaib1124 4 жыл бұрын

Can we get project source code for reference?

@alielzahaby3315 2 жыл бұрын

Should I implement the assignments when im listening online or is it additional thing?

@alielzahaby3315 2 жыл бұрын

Like would it matter that much with understanding the course?

@willw2596 3 жыл бұрын

I'm not buying the benefit of pre-fetching, that assumes the pages on the disk are prearranged to be adjacent for the particular query type. In real life, pages are allocated in random in regarding to the ordering of the data. The scanning of the index pages looks dubious. B+Tree pages are only allocated when split, which happens randomly based on the data inserted. I mean you can say this big chunk of pages have all the index pages and load all of them at one shot, but so can mmap to load the big chunk of pages at one shot.

@arhyth 5 жыл бұрын

do distributed databases also have buffer pools? i would assume they don't. as the communication cost of keeping members in sync with each other defeats the use of buffer pools.

@adamjaffe1712 5 жыл бұрын

I think each node in the distributed database will manage its own buffer pool. There is no need to synchronize it since the buffer pool is an optimization to avoid disk access - it doesn't affect the correctness of the underlying data.

@AshishNegi1618 4 жыл бұрын

currently in this lecture we are studying single machine database internal details. Different machines don't share memory and hence any of buffer pools etc data structures. Buffer pools is way to access 100 GB database file with 1 GB ram - like virtual memory - without higher layers knowing about it.

@prathibhapb 3 жыл бұрын

Distributed databases can still have node local buffer pools as they have some data storage in each node.