Пікірлер
@nickmillerable
@nickmillerable 3 күн бұрын
A real cliffhanger.
@kirkwolak6735
@kirkwolak6735 3 күн бұрын
Wow... 2024 is destroying me. My Bingo card has the following open squares: - UFO Lands and Alien sniffs the President of the USA (inappropriately) - Nikolay Samokhvalov switches away from PostgreSQL - Michael Christofides comes out with a line of Hair Products - Tom Lane releases a Thread Based version of PostgreSQL If I get ANY one of those... I will have BINGO! LOL. With Love and Humor guys!
@rosendo3219
@rosendo3219 3 күн бұрын
bro he is not joking
@DavidPechCZ
@DavidPechCZ 3 күн бұрын
Hi, I've heard this episode yesterday and I could it literally today (TOAST compression - 2TB table with a single JSONB field). Just thanks guys!
@NikolaySamokhvalov
@NikolaySamokhvalov 3 күн бұрын
you're welcome!
@LearningSFR
@LearningSFR 10 күн бұрын
Can you guys do an episode on what is an appropriate ratio of Database Administrators x number of servers that is humanly possible to manage. Specially in cloud environments, or startup companies the database requirements grow fast, but not many new DBAs are hired to maintain these databases at the same pace as the environment grows. I love the podcast. Keep up the good work.
@jocketf3083
@jocketf3083 15 күн бұрын
Thanks for another great episode! We once ran out of space after a small application change. Because of (...) reasons we needed to have our temp storage limit set high. The application change altered a query in a way where it took a very long time to finish. The query slowly consumed temp storage space as it went along. Since the application kept kicking off new instances of that query we ran out of space pretty fast! Captain Hindsight has a few lessons for us there, but at least the fix was easy. To be safe, we failed over to a standby replica and set the application's account to NOLOGIN. Once the application deployment had been rolled back we unbanned the account. We then took our time to clone the database to our old primary and let it rejoin our Pgpool load balancer as a replica.
@kirkwolak6735
@kirkwolak6735 17 күн бұрын
Great Stuff as usual. I believe everyone should have something monitoring disk free space, and alerting at some level of low disk, extra early. Would have been nice to hear what "formulas" you guys tend to use. Like at least 3x Daily WAL Max, or some such. We've hit this in the past with another vendor. Because someone left Tracing on. And a TON of logfiles were being produced that filled that disk...
@jirehla-ab1671
@jirehla-ab1671 22 күн бұрын
if I were to add a mesage que for real time OLTP database workloads, would that also induce latency which makes the OLTP database workload not in real time anymore? if so then whats the point of message queues then if its going to be used in real time OLTP database workloads?
@NikolaySamokhvalov
@NikolaySamokhvalov 22 күн бұрын
The whole point of message queue is to "detach" some work, to make it async. This allows to response faster (lower latency) but guarantee that the work will be done. And if this work is done very soon - it's almost real time. But you're right, there is certain trade-off here, and this "detaching" usually makes sense when system becomes more complex.
@davidfetter
@davidfetter 26 күн бұрын
If that unix socket regression is real, it's very likely a bug. Also, the fact that there's a huge difference between the TCP version and the unix socket version suggests that there are improvements to be had in the listener code.
@deadok68
@deadok68 Ай бұрын
Hi guys, very proud of you and yours creativity.Found this channel through Nickolay's some 5 years old podcasts and got here, almost 1/3 already listened, thanks
@RU-qv3jl
@RU-qv3jl Ай бұрын
It would also be interesting to use different machines and try it out with the different connection pooling options. I imagine that could be interesting too. You would add latency for sure. Sadly I don’t have the credits to try something like that :(
@mehmanjafarov5432
@mehmanjafarov5432 Ай бұрын
hi @NikolaySamokhvalov . I regularly listen to your podcasts, and I've been actively researching memory management topics in the documentation as well as various sources on Google. While doing so, I came across several misleading or inaccurate blog posts regarding certain cases. Therefore, I have a specific question: Is Vacuum Buffers considered a form of local memory (similar to work memory) or shared memory? Thanks
@NikolaySamokhvalov
@NikolaySamokhvalov 29 күн бұрын
shared (256KB ringbuffer)
@NikolaySamokhvalov
@NikolaySamokhvalov 29 күн бұрын
see also: BUFFER_USAGE_LIMIT and vacuum_buffer_usage_limit in PG16+
@RU-qv3jl
@RU-qv3jl Ай бұрын
Really good content as always, thanks for sharing your knowledge.
@Neoshadow42
@Neoshadow42 Ай бұрын
Subtitles are incredible, thanks guys!
@poppop101010
@poppop101010 Ай бұрын
great content thnx for the effort!
@kirkwolak6735
@kirkwolak6735 Ай бұрын
I loved this one! I love how PG allows the entire record to be easily encoded and stored. We implemented an audit feature like this in Oracle. It was way too much code. We stored the OLD and NEW record. When I saw how easy it was for a single table in PG... I started falling in love... For us, the table, and the timestamp was always attached. To Answer the question: "How do you show this to the Manager" (the record changes. Assuming you stored the table_name, table_id columns with it. Then you would create a visible link that pointed to that record if there was an updated record in existence. And if it is a deleted recorded you will either need to merge it into the results, or show them NEAR the messages of the rough similar timestamp. You don't have to show them. Just show that they exist, with an easy way to get to them. FWIW, Day 1 in training. We showed users that all of their edits were stored. And Deletes were Stored as well. we only had to recover a couple of times.
@drasticfred
@drasticfred Ай бұрын
i always do add a reserve "flag column" to my tables, usually type int, no matter what the table serves for, it comes very handy, gives a flexibility that glue it any other table, service or logic etc.
@obacht7
@obacht7 Ай бұрын
Thank you for another nice episode! I like that you started out with a very gentle introduction what the topic is about, why it is important, and what the main issues are related to Postgres. In some of the past episodes, I was sometimes a bit lost because I couldn't follow your deep knowledge quickly while not knowing enough about the postgres-specific challenges/internals myself. So thanks for setting the stage a bit for the beginners and Postgres-"foreigners" (pun intended) 👍
@NikolaySamokhvalov
@NikolaySamokhvalov Ай бұрын
Thanks I needed to hear this. Passed to Michael too. I think we'll do it more - basics in the beginning of an episode
@LearningSFR
@LearningSFR Ай бұрын
Awesome work. I would love to hear more about logical replication on high intensive workloads (master node with hundreds of databases x 1 replication slot per database)
@jianhe5119
@jianhe5119 Ай бұрын
at 1:10:35, i use tmux, when i use mouse select text, it will automatically copy the text, so i don't have "search with google" option.
@chralexNET
@chralexNET Ай бұрын
In a personal project I am making, I am trying to build a backend and database where foreign keys isn't the default (without thinking) mechanism to use in all cases, I wrote a comment about that for episode 69. I think this video validates a lot of what I am experimenting with, but definitely I think using foreign keys is okay on tables that you know are very low activity because it reduces the complexity of the application code if it has to handle relationships not being guaranteed to be valid. In the end what I'll end up with is something where foreign keys aren't used, where hard deletes without cascades are used on the root ("parent" tables), and where the application will clean-up data during regular maintenance and do the full database vacuum during maintenance. It will work well for my project, because it will have daily maintenance with downtime, where I am aiming for that to be 5 minutes or less. It is a personal project, so 99.99% uptime isn't a concern of mine, but performance during the advertised operational hours of the system is important. The backend is basically just infrastructure for some game servers for old games, and the thing about these old games is that they get more unstable the longer they run for. So the user-facing application (the game server) will have downtime, I am just using that down-time window for my backend and database as well.
@NikolaySamokhvalov
@NikolaySamokhvalov Ай бұрын
Thanks for sharing your experience. Worth noting, "heavy loads" I mention all the time are quite rare - say, starting at 10k TPS. Before that, I would use FKs without doubts.
@chralexNET
@chralexNET Ай бұрын
​@@NikolaySamokhvalov What I am getting from this is experience in building a system, that can work without relying on foreign keys, just as one thing. It actually tries to do a lot of the things you have been talking about on your streams, even if it is on a small scale. It will most importantly give me experience for building this sort of system, both the application side and the database side, but the most important things is on the application side because it changes what kind of code should be written.
@chralexNET
@chralexNET Ай бұрын
@@NikolaySamokhvalov Uhm, I had to edit my comment before because I misread what you wrote, I thought you wrote "worth nothing", but you wrote "Worth noting", so I went on a bit of a tangent. Sorry about that, you can just forget about what I wrote before, unless that is actually what you meant. And it is a good point that I should only expect benefits at the higher TPS.
@NikolaySamokhvalov
@NikolaySamokhvalov Ай бұрын
@@chralexNET no worries. My comment was my own worry that when I talk about edge/corner-case problems, I forget to mention that to meet those problems, you need to grow your workloads quite high. So it might provoke false impression like "FKs are really bad" - this I wouldn't want to happen. They are good. It's just, really heavy loads are challenging, and edge cases are not good :)
@RU-qv3jl
@RU-qv3jl Ай бұрын
I mean I think that the benefits of partitioning are obvious. I also think that there are a lot of people who don‘t know internals and won‘t think about it. I also think that with partitioning it is worth cautioning not to go too far. By default the planner will only re-order, I think, 8 tables or something like that? So too many partitions can lead to worse plans as you run into the genetic optimiser more quickly right? I think that would also be worth discussing (Says me just part way through the episode) :) Another really nice chat but the way, thanks. I always like hearing your thoughts.
@rosendo3219
@rosendo3219 Ай бұрын
gratz boys on episode 100! always listening to you in my car while driving to my boring work
@bhautikin
@bhautikin Ай бұрын
One of the best episodes!
@dshukertjr
@dshukertjr Ай бұрын
Congrats on episode 100! Sorry if this has been covered in the past episodes, but I would love to know more about the following. 1. Why do you seem to discourage using foreign keys? 2. It seemed like all of the three companies rarely perform joins within their databases, but do they do they perform joins on the application layer? Is this common for large scale databases to not join within the database?
@NikolaySamokhvalov
@NikolaySamokhvalov Ай бұрын
FKs are great and I personally use them everywhere. However, they have risks: 1) perf. overhead required to maintain them (that's ok usually), 2) perf. cliff related to multixact IDs - going to demonstrate soon with the PostgresAI bot.
@utenatenjou2139
@utenatenjou2139 Ай бұрын
I large scale, having (constraint) foreign key make managing data real difficult. under complex structure, imagine when there are data to be correct. Note: for small data set no prob.
@anbu630
@anbu630 Ай бұрын
Congrats on your 100th episode !! Watched your 1st one and here the 100th one as well :-)
@JamesBData
@JamesBData Ай бұрын
Congrats on reaching episode 100!
@RajanGhimiree
@RajanGhimiree 2 ай бұрын
Can you guys make a complete episode on logical replication, from configuration to replicate data from source server to replication server.
@davidcarvalho2985
@davidcarvalho2985 2 ай бұрын
Okay, you guys convinced me. I will try pgbadger. Thanks for this interview by the way. Really nice
@kirkwolak6735
@kirkwolak6735 2 ай бұрын
So, I was wondering... Wouldn't it be nice if there were 2-3 types of plans based on some of the values of the parameters, so you get the most optimum plan and maybe the optimizer does Parameter Peeking to determine which of the X plans to choose... And then I realized. Wow... The application could do this. Create 3 prepared statements for the same query. And execute against the one TUNED for the query parameter types forcing the best plan to be used by design... Hmmm... We have this situation. We have a complicated search. But when the value we are searching for is small (lots of hits) vs large (few hits). It wants to choose the wrong one after a few queries and then a switch. Unfortunately, this is inside of a Procedure where the statement is prepared around us. We would have to basically duplicate the complex query just to make the condition so that it executes the right right way. But I might still try that.
@kirkwolak6735
@kirkwolak6735 2 ай бұрын
Yes, you should test with your extensions. You should have a few general procedures you run that exercise using all of the extensions. And you should monitor log sizes. In case something is going wrong, and it's only in the log files. I like using htop in linux, and watching how much memory the various threads are using and the total. In case memory consumption has changed... This can lead to issues. Reading the documentation for the release. YES, it is good documentation. But it can feel a bit overwhelming because they document so much...
@Marekobi
@Marekobi 2 ай бұрын
This is gold !! :)
@pdougall1
@pdougall1 2 ай бұрын
Can ya'll talk about the best way to think about adding indexes? What is the problem when adding too many on a table for instance. Or when to reach for one when a query is slow. Confounding factors when there are other queries using the same column (not sure that's relevant). I'm sure there is a lot to consider that are just unknown unknowns for me.
@NikolaySamokhvalov
@NikolaySamokhvalov 2 ай бұрын
hey Patrick - have you listened to episode "068 Over-indexing"?
@pdougall1
@pdougall1 2 ай бұрын
@@NikolaySamokhvalov I have not, but definitely will. Also looks like there's one on under indexing as well! Might be exactly what I'm looking for, thanks!
@kirkwolak6735
@kirkwolak6735 2 ай бұрын
Michael, thank you for sticking to your guns to get your explanation out there. There is a subtle difference in the AUDIENCE you two seem to be addressing. Nikolay seems to not care about launching a long-running query... Because when he sits down, he likely either knows he has a problem already, OR he's got such deep experience in PG, that he knows to check a few thing before he starts pounding out a query. I believe he implies this when he talks about how he adds the LIMIT based on what he is expecting (eg, when he might be wrong, he will do a LIMIT 2 and let the error guide him). Whereas you were (IMO) driving from a Novice (like me) who *thought* that just adding a LIMIT was *always* a decent safety approach. And my understanding is currently limited to (LIMIT + Order By = Red Flag). Your point goes deeper than that. So, now I realize the correct formula is: (LIMIT + (Order By|Index Range Scan) = Red Flag). Meaning the optimizer might be doing what looks like a simple range scan on some column, but it is orthogonal to the data being found, and can quickly become a semi-seq_scan (find first row with the index, and the seq_scan in reverse until the number of records hit the limit... Which may never happen! Making it scan to the beginning/end). That's two wildly different target audiences. And I could be completely wrong. It's my guess. Of course I look up to both of you, so I apologize if I misstated your positions!
@michristofides
@michristofides 2 ай бұрын
Thank you Kirk, for the kind words and the wonderful summary! I think you're spot on, and am glad to hear it was helpful
@pdougall1
@pdougall1 2 ай бұрын
Ya'll are great! Its really important to hear professional db people talking about how all of this works in practice. Beyond a basic explanation that can be found in books (books are also really important btw)
@hamzaaitboutou8563
@hamzaaitboutou8563 3 ай бұрын
more of this please <3
@iury0x58
@iury0x58 3 ай бұрын
Great content, guys! Binging the channel
@iury0x58
@iury0x58 3 ай бұрын
Thank you for this content. Very nice
@davidfetter
@davidfetter 3 ай бұрын
I just love the way this episode captured the processes that actually go into doing the thing! BTW, the repository for the web site is, as far as I know, also a git repository, and I suspect that rebase requests--NEVER use merge--would be easier to get into it than patches sent to the -hackers mailing list for the core code would be.
@keenmate9719
@keenmate9719 3 ай бұрын
Looking forward for this one... paging and limits, it's like naming and cache invalidation :-)
@nishic1
@nishic1 3 ай бұрын
Woww.Excellent video..Very informative..
@nitish5924
@nitish5924 3 ай бұрын
What about massive updates ? We recently had a usecase where we have a postgres database that has 250 million rows and we introduced a new date column, we are facing so many issues in backfilling this column today. it would be great if you could share your insights on how to handle such massive updates
@NikolaySamokhvalov
@NikolaySamokhvalov 3 ай бұрын
it's very similar - batching is very much needed additional complexity is index write amplification - all indexes have to be updated (unlikje for DELETEs), unless it's a HOT UPDATE
@kirkwolak6735
@kirkwolak6735 2 ай бұрын
@@NikolaySamokhvalov Excellent point on indexing adding writes. I would certainly add the column. Batch some updates. And only when updates are finished would I consider adding the index on that column. Otherwise it feels like a Footgun!
@awksedgreep
@awksedgreep 3 ай бұрын
What you need is UUID's across maybe 25 tables with FKs between each, no on delete cascade, and a need to keep the data from all 25 tables elsewhere(archive schema). Getting <200 deletes a second after moving the data off to the archive schema. Had one set of deletes on a large table going for 20 days. Not speaking from experience or anything. haha
@TimUckun
@TimUckun 4 ай бұрын
Why doesn’t Postgres have unsigned integers? Also why doesn’t timestamp with Timezone actually store the Timezone of the timestamp? If my app is crossing time zones I really want to know the Timezone of the writer.
@wstrzalka
@wstrzalka 4 ай бұрын
Love your podcast. Its fun to listen for both advanced and basic topics as always something new will pop up. And the attached articles list makes waiting for the next episode more bearable 🐘
@chralexNET
@chralexNET 4 ай бұрын
One thing about timestamps. In my current project I only use timestamp without time zone, because I always save UTC for all my dates and times. If I do that I guess there are no problems with using timestamp without time zone then? Edit: Okay so reading more of the wiki article they suggest not doing that, because they think it is going to give you issues if you are going to do calculations with other timestamps that have time zones, but I am never going to do that, because all my dates and times will be in UTC.
@kirkwolak6735
@kirkwolak6735 4 ай бұрын
I switched away from that, and one of the reasons was DST.
@chralexNET
@chralexNET 3 ай бұрын
@@kirkwolak6735 I don't know why that would cause any issues. UTC is not affected by DST. What a client application does when it receives a timestamp is to work with that timestamp in UTC in its logic, and for displaying the timestamp it explicitly converts it to the local time of the client by adding the time zone offset to the timestamp.
@ilyaportnov181
@ilyaportnov181 4 ай бұрын
comparasion of timestamps is nontrivial topic at all due to timezones :)
@andherium
@andherium 4 ай бұрын
Would love it if you guys could make a video on schemas and roles
@dartneer
@dartneer 4 ай бұрын
a lot of gold bits here! Thank you guys..👏💪
@PostgresTV
@PostgresTV 4 ай бұрын
thanks! keep watching - and let us know if there are ideas (there is a doc: docs.google.com/document/d/1PNGSn_d0A7gTR4C0p6geQyHOTgdpitKCRvafCNnT-44/edit)
@sanity_equals_nil
@sanity_equals_nil 4 ай бұрын
Would be interesting to listen about how caching is implemented in postgres.
@PostgresTV
@PostgresTV 4 ай бұрын
good topic - included to the list of ideas docs.google.com/document/d/1PNGSn_d0A7gTR4C0p6geQyHOTgdpitKCRvafCNnT-44/edit#
@maudrid
@maudrid 4 ай бұрын
You mentioned some pitfalls with trigram. Do you have links to any media that goes into more detail?
@NikolaySamokhvalov
@NikolaySamokhvalov 4 ай бұрын
Well, the bottom of the docs for pg_trgm already give an idea that things are not easy and require significant effort. a) too low level (cannot be used, for example, with full text search to have fast correction of typos without creation an additional table and a dance around it); b) under heavy loads, for large volumes, GIN fastupdate, pending lists - this will become harder and harder to tune to avoid performance issues (same thing as for any other GIN indexes in general) So these days, I would now perhaps consider some fast LLM for proper typo corrections.
@PostgresTV
@PostgresTV 4 ай бұрын
also, what the bot says about it: postgres.ai/chats/018e9250-abff-73fd-af40-1b06ad17919d // Nik