Thank you for the awesome video. I guess PostgreSQL also supports some form of scan sharing. For each relation, it keeps track of the starting location (implemented in ss_get_location()) from where a scan should begin. It tries to synchronize multiple scans on the same relation.
@marcoq71604 жыл бұрын
0:15 DJ Drop Tables has some problems 0:48 Administrivia 1:18 Database workloads 1:47 Bifurcated environment: OLTP data silos -> Extract-Transform-Load (ETL) -> OLAP data warehouse 3:17 HTAP shifts some of the analytical queries to the OLTP side 3:59 Database storage 4:54 Spatial vs Temporal control 5:48 Disk-oriented DMBS 6:49 Today's agenda: buffer pool manager, replacement policies, other memory pools 7:22 Buffer pool organization: array of fixed-size frames 9:11 Buffer pool meta-data: page table, dirty flag, pin counter 12:30 Locks (for DB's logical contents) vs latches (for internal structures, mutex) 15:19 Page table (in-memory) vs Page directory (on disk) 16:30 Allocation policies: global vs local policies 17:53 Buffer pool optimizations: multiple buffer pools, pre-fetching, scan sharing, buffer pool bypass 18:42 Multiple buffer pools (reduce latch contention and improve locality) 21:26 Multiple buffer pools: 1) object ID, 2) hashing 22:58 Pre-fetching (based on a query plan) 28:24 Scan sharing 32:55 Where do intermediate query results go? 35:07 Reminder: relational model is unordered 36:05 A question from student 36:44 Buffer pool bypass 38:21 OS page cache (use direct I/O, O_DIRECT to bypass it) 40:13 Postgres demo 55:06 Why don't other systems use OS page cache as Postgres does? (hard to guarantee consistent behavior across different OSs) 58:19 Buffer replacement policies 1:00:15 Least-recently used (LRU) 1:01:00 Clock (approximation of LRU) 1:03:36 Sequential flooding 1:05:23 Better policies: LRU-K (estimates the interarrival times of references on a page by page basis) 1:06:21 Better policies: localization (evict on a per txn/query basis) 1:07:10 Better policies: priority hints 1:08:39 Dirty pages (trade-off) 1:10:10 Background writing 1:11:06 Other memory pools 1:11:20 Conclusion 1:11:47 Project #1 1:12:45 Task #1 - clock replacement policy 1:13:37 Task #2 - buffer pool manager 1:14:23 Getting started
@luisponce35804 жыл бұрын
beast
@anuragshah343311 ай бұрын
Why did query performance not improve post moving every tuple to buffer pool at 49:13? Time stamp of query run with tuples being fetched from disk - 43:37.
@linziye77145 жыл бұрын
precious resource for database! Thank you CMU!
@AndersonSilva-dg4mg5 жыл бұрын
Hooray! new video. Thankyou very much!
@anikethdeshpande83362 жыл бұрын
@4:33 ---> which is the new hardware where we can push the execution logic to disk ?
Thank you making available these high-quality lectures and programming assignments. Was wondering if there's any chance you could release the grading scripts for previous offerings of the course?
@andypavlo4 жыл бұрын
We're working on it.
4 жыл бұрын
Yes. I love to. This is an another great resource for this amazing course.
@AshishNegi16184 жыл бұрын
@Vipul Bhardwaj i am also interested. But i think concern for Andy is that online discussion forum will cause solutions to leak. we can privately discuss in our emails.
@prathibhapb3 жыл бұрын
22:50 - How does Approach #2 (Hashing) help if each database has its own buffer pool?
@zhenghanghu24302 жыл бұрын
I think the point is if you don't have the one-buffer-pool-per-table set up, then you can use hashing to find out the buffer pool each page belongs to. If the one-buffer-pool-per-table set up is applied, then you don't need hashing. (Or, you can have multiple buffer pools for each table, that way hashing can be useful.) We also see in the lecture we can have a buffer pool for each query, that case we don't need hashing neither.
@ljratner5 жыл бұрын
Slide 16 is missing from the PDF.
@drevil7vs134 жыл бұрын
30:58 I wonder if Oracle supports cursor sharing for 2 queries with the same sql_id but with different bind variable values
@moazeldefrawy43793 жыл бұрын
It probably does? since the values for each row are stored together (aka .. it's a row database)
@yashwinenamadi16734 жыл бұрын
Why does any os allow DBMS to maintain its own memory i.e., the buffer pool? DBMS is a mere process running on the os. Can't the buffer pool be swapped out of memory by the os to accommodate other processes' memory? Or is there a hardware support for this buffer pool dedicated to be managed by a DBMS software?
@meamzcs3 жыл бұрын
Sure the OS can but that's not that DBs job to worry about. The OS does its own bookkeeping to determine what memory pages it swaps and if it's the buffer pool that's swapped out that means most likely every other page in memory is better to have in memory than the buffer pool... And once the DBMS accesses the buffer pool again you will get a page fault anyway which will lead to the OS pulling it back into memory...
@sebastiannyberg3008 Жыл бұрын
The memory used by the buffer pool is no different from any other virtual memory given by sbrk. The buffer pool memory can be "swapped out" to a swap partition to avoid OOMkill, but not written out by the OS like pages from a file opened without O_DIRECT. Note that with a DBMS you usually don't want swap partitions due to the hard-to-debug performance degradation.
@sebastiannyberg3008 Жыл бұрын
Note that the Linux kernel used to separate the buffer cache from the page cache, but the two were merged some ways back. The OS obviously needs to bring in the blocks from disk, and does some (minimal) caching even when using O_DIRECT. You can read more about the O_DIRECT flag in the open(2) syscall manpage
@williamweiler4493 жыл бұрын
How do multiple buffer pools reduce latch contention if threads still must contend for latches on the same page?
@rofamohamed23983 жыл бұрын
Each buffer pool has a page table related to it, so you have multiple buffer pools and also multiple page tables, which will reduce the latch contention due to we increased the number of page tables. Hope i cleared it enough.
@williamweiler4493 жыл бұрын
@@rofamohamed2398 Ah, so latches are placed on the page table, not the buffer page itself? thanks for the reply
@rofamohamed23983 жыл бұрын
@@williamweiler449 Yeah, latches are placed on the page table, to prevent concurrent access to the position that will hold the pointer to the buffer pool frame, which will be helpful when we need to get a specific page to know which frame the page resides in and that's the job of page table (to know the positions of the pages inside the buffer pool).
@himanshuhbk9535 жыл бұрын
Can non-CMU students submit these assignments ?
@jcdyer34 жыл бұрын
Probably not, that's asking for a big donation of unpaid work from the course staff, but you can do them yourself, and then research the answers to check if you were right.
@drakenguyen23904 жыл бұрын
thank you for this high quality content. This is a long shot but is there anyway we can have a solution for the project? Like sample code for for eg.
@bisakhmondal83713 жыл бұрын
Coolest Prof. ever. Thanks for the awesome contents. Btw who is "tk"(he/she is in every video)!! XD
@sheikhmuhammedshoaib11244 жыл бұрын
Can we get project source code for reference?
@alielzahaby33152 жыл бұрын
Should I implement the assignments when im listening online or is it additional thing?
@alielzahaby33152 жыл бұрын
Like would it matter that much with understanding the course?
@willw25963 жыл бұрын
I'm not buying the benefit of pre-fetching, that assumes the pages on the disk are prearranged to be adjacent for the particular query type. In real life, pages are allocated in random in regarding to the ordering of the data. The scanning of the index pages looks dubious. B+Tree pages are only allocated when split, which happens randomly based on the data inserted. I mean you can say this big chunk of pages have all the index pages and load all of them at one shot, but so can mmap to load the big chunk of pages at one shot.
@arhyth5 жыл бұрын
do distributed databases also have buffer pools? i would assume they don't. as the communication cost of keeping members in sync with each other defeats the use of buffer pools.
@adamjaffe17125 жыл бұрын
I think each node in the distributed database will manage its own buffer pool. There is no need to synchronize it since the buffer pool is an optimization to avoid disk access - it doesn't affect the correctness of the underlying data.
@AshishNegi16184 жыл бұрын
currently in this lecture we are studying single machine database internal details. Different machines don't share memory and hence any of buffer pools etc data structures. Buffer pools is way to access 100 GB database file with 1 GB ram - like virtual memory - without higher layers knowing about it.
@prathibhapb3 жыл бұрын
Distributed databases can still have node local buffer pools as they have some data storage in each node.