$250 Proxmox Cluster gets HYPER-CONVERGED with Ceph! Basic Ceph, RADOS, and RBD for Proxmox VMs

Рет қаралды 91,322

Күн бұрын

Пікірлер: 213

@robertopontone 2 жыл бұрын

Your channel has great potential, it already has its own style. I hope you keep the momentum, I will be watching: meaning I find your video useful and interesting 😁Thanks.

@apalrdsadventures 2 жыл бұрын

Thanks! I'm really enjoying these projects, and sharing them with you all

@gustersongusterson4120 2 жыл бұрын

Thanks for the video, I really appreciate the practical step by step explanation. Looking forward to more ceph videos!

@apalrdsadventures 2 жыл бұрын

Thanks! I have a video on custom rules (forcing HDD/SSD/NVMe for a pool) and Erasure Coding next, which is the first step beyond what the Proxmox GUI can provide on its own. Proxmox *just* added Erasure Coding support on their end in February, so AFAIK it's not even in the subscription branch yet and not in the GUI either

@mikebakkeyt Жыл бұрын

Excellent content thank you. Ceph scares me but I will get there at some point hopefully. Really like your editing which removes all the whitespace which too many others leave in.

@bluesquadron593 2 жыл бұрын

Man, super enjoyed this presentation of CEPH

@apalrdsadventures 2 жыл бұрын

Thanks! I've really enjoyed working with Ceph, but it's just too much content for a single video. This should at least be enough to get started.

@bluesquadron593 2 жыл бұрын

@@apalrdsadventures Yeah, I didn't get into it much at all just set it up from some KZbin tutorial (not much of it) and enjoying it in my three node cluster. My drives are nvmi, so I get a decent speed moving VMs around.

@apalrdsadventures 2 жыл бұрын

That would certainly speed things up! I filmed a segment on how to set a crush rule to force pools to a specific device type, and also on creating and adding erasure coded pools to Proxmox, so those will make the next episode.

@rimonmikhael 2 жыл бұрын

You got it man .. I read hyperconverged and I was like damnnnnn lol 😆.. 250$ for what we pay millions for between azure aws and 3 data centers .. that gotta be awesome 👌 👏 but it still fun to watch thank u

@apalrdsadventures 2 жыл бұрын

It's awesome that it works at all, but definitely way too many corners cut in storage bandwidth to make it usable for anything real

@ewenchan1239 Жыл бұрын

LOVE this video! Thank you! I'm just getting around to setting up my 3-node HA Proxmox Cluster with ceph and this video is TREMENDOUSLY helpful.

@Chris_Cable 2 жыл бұрын

If anyone is getting the message "Error ENOENT: all mgr daemons do not support module 'dashboard', pass --force to force enablement" when trying to enable the dashboard, running apt install ceph-mgr-dashboard on all the nodes in the cluster fixed it for me.

@marconwps 8 ай бұрын

Proxmox 8.2.2. i try and don't work i hope to be integreted in next release of software

@twincitiespcmd 2 жыл бұрын

Really appreciate the step by step detail on CEPH as it relates to Proxmox and an inexpensive home lab setup. Looking forward to future videos!

@apalrdsadventures 2 жыл бұрын

Thanks!

@twincitiespcmd 2 жыл бұрын

@@apalrdsadventures I have started to look at your other videos. They're great! Technical step by step how tos for the home lab without breaking the budget. I really like that you have content on Proxmox. One minor suggestion. If you could enlarge your screen when you are typing commands so we can follow along with what you are doing would be great. Thanks for all you doing!

@apalrdsadventures 2 жыл бұрын

Some programs are definitely harder to capture than others, it's something I'm trying to improve as I get better at the production side

@JosephJohnson-sq4bu Жыл бұрын

Just stumbled into this absolute gem. Thank you for the incredible content

@funkijote 8 ай бұрын

So youtubers are just out here hyperconverging their hardware for views now? Disgusting! (Thanks, this was very helpful)

@kelownatechkid 6 ай бұрын

Fun video! It's cool that proxmox makes ceph available to new users in a stripped-down way. I've found it to be excellent for home use, since it has none of the limitations of traditional NAS solutions it allows the kind of random cobbled together setups that are common outside of enterprise. CephFS in particular is an absolute godsend, and as a whole ceph is among the most reliable software projects I've ever used. Issues have always been possible to work out and I've never lost any data despite hardware failures. I've changed and added parts, altered crush configs, and upgraded across major versions without any downtime too (from 15->16->17 over the years).. It's real FOSS too (I've had some small PRs merged) and despite the IBM happenings, it actually feels like the community aspect is growing still

@MarkConstable 2 жыл бұрын

I'm about to jump into Ceph so I watched this one again and really appreciate your Ceph coverage. We anxiously await the next PROMISED instalment. Heh, no pressure 🙂

@apalrdsadventures 2 жыл бұрын

I actually just reinstalled everything with the latest versions (PVE 7.3 / Ceph 17) two days ago, for an episode on more diverse pools (erasure coding, SSD/HDD mix, different failure domains, tired caching ...). Actually started filming already! CephFS is still on the horizon though.

@MarkConstable 2 жыл бұрын

@@apalrdsadventures I am really looking forward to this next one. I should have 4 nodes ready to go tomorrow, if mr bezos is on time. Three of them with a pair of 2TB SSDs... a Minisforum HM90, a Terramaster F2-423 and a QNAP TS-453D, all with a pair of 2.5 GbE nics. A bit of a mongrel cluster but it's at least all-flash based and should be good enough to get me through 2023.

@apalrdsadventures 2 жыл бұрын

I'm actually planning on talking about mixing SSDs and HDDs too, so no need for all flash. It's a super well supported use case with Ceph.

@MarkConstable 2 жыл бұрын

@@apalrdsadventures The other problem I am trying to solve is that my 10 yo HP Microserver running PBS barely gets to 30 MB/s so if I reboot a VM with 2+ TB of storage it can take 10+ hours to rebuild it's dirty-bitmap.

@apalrdsadventures 2 жыл бұрын

I actually have an HP Microserver too, and it's faster than any of my thin client based nodes.... so my setup will be a lot slower than yours. Maybe the VM doesn't need such large disks, and could mount CephFS (or RGW or RBD) on its own? In general I do separate network mounts within the VM for 'bulk' data and then deal with data replication of those separately from Proxmox

@fbifido2 Жыл бұрын

@3:49 - what speed nic to use for public & private? which one get the most traffic?

@tomschi9485 4 ай бұрын

*Thanks a lot for sharing your knowhow!, your Videos are great.*

@mistakek 2 жыл бұрын

Great video. I'm going to have to watch this a few times. You really go in to great detail on proxmos, which is exactly what I've been looking for.

@apalrdsadventures 2 жыл бұрын

Glad you like it! I have another episode in the works on erasure coded pools, but it's just too long for one video

@dn4419 2 жыл бұрын

This was really helpful. Great explanation and just the right amount of detail for me. Thank you very much!

@apalrdsadventures 2 жыл бұрын

Glad it was helpful!

@fbifido2 Жыл бұрын

@6:23 - do you have to install the Ceph dashboard on each host? what if pve1 goes down, would you still have access to the Ceph-dashboard?

@vinaduro 2 жыл бұрын

I was eagerly awaiting this video. 🙂

@apalrdsadventures 2 жыл бұрын

Hopefully it lived up to your expectations

@vinaduro 2 жыл бұрын

@@apalrdsadventures it might actually have made my life more complicated, because now I'm considering changing my Proxmox cluster to Ceph, instead of ZFS on top of Truenas. Both have pros and cons, I guess I need to figure out what's best for my situation. Although, considering it's a home lab, every situation is my situation. lol

@apalrdsadventures 2 жыл бұрын

The real benefit to Ceph is that you get redundancy at the host level in storage. With TrueNAS and ZFS you get drive failure redundancy, but not host failure. Proxmox clustering already has host level redundancy in compute, but if the storage isn't redundant then it becomes a single point failure (and also a traffic bottleneck over the network, potentially). Realistically, host failure is actually 'host down for maintenance' in the home lab, and is a real thing that does happen more frequently than we'd like.

@vinaduro 2 жыл бұрын

@@apalrdsadventures Yup, and the 'host down for maintenance' tends to cause a lot of complaints from the wife. I guess it's the same as trying to decide how much redundancy you can afford in an array. Cost vs. convenience. This is the reason why we have our home labs though, so we can play around, and change our minds whenever we want.

@apalrdsadventures 2 жыл бұрын

I still use TrueNAS + single node Proxmox for Home Assistant and keep as little as posible on that Proxmox box so experimentation doesn't break important things While filming the thin client series I had a bunch of problems with the Proxmox host running out of RAM and the OOM killer killing off the security camera VM, but now I can film that sequence on the cluster

@JohnSmith-yz7uh 2 жыл бұрын

You can use ZFS as backend for CEPH. This way you get best of both, but speed is not a priority in that setup. Although this is true for ZFS in general

@apalrdsadventures 2 жыл бұрын

You really can't use ZFS as a backend for Ceph. You can use ZFS as a backend for Gluster, since Gluster is a filesystem only and distributes files across the cluster to be stored on other filesystems. So ZFS underneath Gluster is a good idea. Ceph (with Bluestore) uses the raw disk, and you don't really gain any of the zfs benefits since Ceph already has all of those features on its own (data integrity checking, scrubs, data redundancy, snapshots) but get an extra layer of caching and read-modify-write. LVM to merge a few disks into a single OSD isn't a terrible idea, and LVM to split up an SSD into db disks is common, but LVM is way lighter than ZFS and isn't duplicating features that Ceph already has.

@JohnSmith-yz7uh 2 жыл бұрын

@@apalrdsadventures hmm, I could have sworn I've seen a tutorial on it. Could have been gluster, all I remember was that the ZFS pools had to have the exact name on each cluster member

@apalrdsadventures 2 жыл бұрын

With Gluster it's the recommended setup to use ZFS. Older versions of Ceph stored data in chunk files on top of another filesystem, so back then ZFS may have been recommended. With the newer backend ('bluestore') it's not recommended to have much if anything between the OSD and the disk. With Proxmox, you can do ZFS replication with a separate ZFS pool on every single node (all having the same name) and Proxmox can sync data across the cluster using zfs replication. But then you have to keep a copy of the VM disk on any Proxmox node which could potentially run the VM if it were to be migrated due to HA rules.

@goodcitizen4587 Жыл бұрын

Thanks, very good demo and presentation.

@apalrdsadventures Жыл бұрын

Glad you liked it!

@mikeguschke 3 ай бұрын

Enjoyed your demonstration of the Ceph implementation. I'm looking into converting my ReadyNAS RNDP4000 to a storage server with multiple hhd's. How would you proceed with this scenario utilizing Promox, Ceph, etc? How would you configure the enclosure?

@mzimmerman1988 2 жыл бұрын

Thanks!

@apalrdsadventures 2 жыл бұрын

Thanks for the donation! Glad you enjoyed the video!

@cberthe067 Жыл бұрын

Do you plan to continue your videos serie on Ceph ? Talking about CephFS, Ceph Balancer, Ceph Healing, etc ...

@curtisjones8795 2 жыл бұрын

Love your channel. Thanks for the great video!

@apalrdsadventures 2 жыл бұрын

Glad you enjoyed it!

@enkaskal 2 жыл бұрын

outstanding experiment! thanks for sharing 😀👍🏆

@PCMagikHomeLab 2 жыл бұрын

great vid! Nice to see You again in new project :)

@apalrdsadventures 2 жыл бұрын

Glad you enjoyed it!

@SB-qm5wg Жыл бұрын

I didn't even know prox had a ceph wizard. Cool 👍

@stuarttener6194 Жыл бұрын

I currently use a TrueNAS server with ZFS (an old IBM x3650 "M1" or 7979) well fortified with 48GB of RAM, 8 SAS drives, dual Ethernet ports, and dual XeoN 5500 series CPUs. It works rather well but uses a lot more watts than my 2 NUCs that also have each have a 2TB SSD in them. The system is rather noisy as well (though I aim to put all my x3650 servers in a rack in my garage anyway). I have read a lot about the overhead that CEPH can place on any sizable server not to mention small home lab style servers (especially given the light weight "servers" you used along with USB sticks, though I run have 2 NUCs with i7 CPUs, 32GB of RAM and SSDs). I would be interested to know what kind of overhead you observed with a VM (or more than one if you tested that) running on each server as juxtaposed against the overhead placed upon said "servers" by running CEPH as well. Thank you for your videos, some are quite interesting to me. Stuart

@apalrdsadventures Жыл бұрын

In general it's not as big of an overhead problem as a latency problem, in disk IO for the VMs. Every IOP has to go over the network to the 'first' OSD, and from there across the cluster network to the other OSDs involved in that PG. It's not so much that the work is significantly harder than say ZFS, but all of the network hops involved add latency. So random IO and synchronous IO performance tanks (vs ZFS), but high queue depth synchronous throughput is still fine until you run into disk or network bottlenecks. At least for sata/sas drives, for NVMe it's a bit of a different story, it's not particularly well optimized for NVMe even on high end hardware yet. With my USB drives I also have issues where Ceph faults OSDs for being too slow, because the USB drives are actually really slow.

@stuarttener6194 Жыл бұрын

@apalrdsadventures So it sounds like you are suggesting that if someone does not have a 10GB or faster network to leverage for CEPH's cluster network, then CEPH is going to really kill network performance and have really bad latency is what you are saying? It seems like in my use case I am way better off keeping the TrueNAS SCALE ZFS NAS going and bagging CEPH.

@apalrdsadventures Жыл бұрын

It's not going to kill network performance, it's going to hurt random read/write disk performance of the VMs. Sequential and high queue depth IO is limited by network bandwidth. Spread out over a number of nodes with a number of OSDs and VMs the performance is quite good in aggregate, so the scalability is a lot better than a single node.

@stuarttener6194 Жыл бұрын

So if my use case is having a dozen VMs running and most are sitting there doing little work each day (FreeIPA and pfSense do get used for routing and login authorization on my home lab LAN) and I have the VMs distributed across the 3 nodes it will likely run okay then you mean to suggest? Or will it seems very slow and I'll end up moving back to TrueNAS for using shared storage? I know its a bit of a guess, just curious to know your thoughts.

@apalrdsadventures Жыл бұрын

I'd guess that neither of those do much disk activity and won't really care about somewhat slower disk IO. The VM will still do its own filesystem caching, so it would be more like running the VMs on a spinning drive (which tends to have poor IOPs but can have good sequential bandwidth). Using a shared network storage has the same effect, so it shouldn't be significantly worse until you start running out of network bandwidth (which Ceph will do sooner than NFS since it has to do network IO for replication). Try it and see how you like it.

@subhobroto 2 жыл бұрын

Great ceph videos! It would be awesome to learn how to replicate data between 2 separate ceph clusters for geographical data redundancy

@apalrdsadventures 2 жыл бұрын

I'm currently working through a series of videos that cover one ceph cluster, erasure coding, CephFS, etc.

@subhobroto 2 жыл бұрын

@@apalrdsadventures got it. Yeah - if you showed how to expose RBDs to systems that are not in Proxmox, that would be nice. Imagine I have a Proxmox HA cluster and wanted to expose a reliable (due to Ceph) volume to another machine (external to Proxmox, say a PC/Laptop/Raspi) on the same network. The issue I have with Proxmox's Ceph is that they are behind Ceph releases, which is fine if Ceph just exists to support Proxmox storage but not so great if my objective is to use Ceph itself.

@apalrdsadventures 2 жыл бұрын

My plan was to expose CephFS to other systems, but RBD would be similar (using CephX to authorize new clients outside of Proxmox's automation would be the key bit). Proxmox is actually entirely up to date with Ceph's stable release (16.2.7), they don't use Debian's package repos for Ceph and have a deb repo just for up-to-date Ceph.

@thanhlephuong7687 Жыл бұрын

Thanks so much! I have a lot of time contact proxmox but i just lose some minutes for understanding.

@patrickjoseph3412 2 жыл бұрын

i have the wyze 5010, it has a sata port and will fit a SSD if you remove it from its case. ssd i used was a samsung 850 1tb

@metafysikos13 Жыл бұрын

Hello dear Aparld! Your awesome video guided me to setup a ceph cluster inside my 3 node proxmox cluster! Thank you very much for that! I have one question though. Im using a separate nvme disk of 1TB on each proxmox node just for ceph. So my ceph cluster is made of 3 OSDs, 3 monitors, 1 manager and 1 pool. I am also using a separate 10Gbit private LAN just for ceph's cluster/private network. Ceph's public network is using the 1Gbit uplink of each proxmox node. Everything works and i get no error messages or whatsoever. But the "strange" thing is the read/write performance of ceph. I was expecting something aroung 1GByte per second of maximum performance but instead im getting 160MBytes per second of reads and writes, when I benchmark ceph. Is this normal? Also, when I use only the 1Gbit uplink for ceph's public AND private network, ceph's benchmark results are something like: - Reads: 150MBytes per second - Writes: 75Mbytes per second So, reads are about the same when using either the 1Gb uplink or the 10Gb LAN, and writes are doubled with 10Gb LAN. I feed that something is not right here. p.s I also test the 10Gbit LAN - network performance from node to node using iperf and i get: - 9,4Gbit bandwidth - 1,1Gbyte of transfer per second. p.s2 I am using a 10G switch which has a switching capacity of 320Gbit per second. p.s3 Sorry for the long message! Have a good day and cheers!

@metafysikos13 Жыл бұрын

With a little research, what I understand is that data, destined to ceph storage, is transferred through public network, to cluster network. So, if your public network is 1Gbit, you wont get read/write speed you expect from the 10Gbit private network.. Maybe I got it all wrong, I dont know.

@apalrdsadventures Жыл бұрын

Your research is correct. The 'public' network is what Ceph clients use to access Ceph data. The 'cluster' network is used by Ceph to transfer data among itself - for replication, erasure coding, and to rebalance PGs. The Ceph Client isn't the user end client, it's whatever software is accessing the Ceph cluster (often a gateway for user traffic). A normal write has the client connect to the 'first' OSD in a PG via the public network, and that OSD will then connect to the rest of the OSDs involved in replication / erasure coding via the cluster network (so client -> OSD X via public, OSD X -> OSD Y, OSD Z via cluster). In this case, Proxmox (qemu) is the 'client', so it will access Ceph via the public network. So going to 10G cluster will speed up writes since they normally involve 3 transfers and 2 of them will go across the cluster network, but not reads, since there is still the initial access via the public link either way. The reason you see >100MB/s (the expected limit with gigabit) is since one of the three OSDs is on the local system, so a random access has a 1/3 chance of going to the same system as the test and not going over the network at all.

@metafysikos13 Жыл бұрын

@@apalrdsadventures I actually did it that way. I cleaned all ceph configuration from my 3 proxmox nodes and reconfigured it to use my 10G LAN as public and private network. I created my monitors and pools from scratch and now my benchmark results are way better. I get something like: - Reads: 1700MB/s - Writes: 650MB/s So now i have to simulate some tests so I can see if this performance is acceptable for my production environment between my web apps, desktop apps and databases. Dude thank you again so much! Keep up the good work!

@eherlitz 2 жыл бұрын

Just what I was looking for, thank you! I mainly have the need for HA storage for containers like MinIO and logging from various machines (e.g. vm and docker), but where downtime of such storage is a no-go. I figured that ceph is a great match for this but I'd like to hear your opinion.

@apalrdsadventures 2 жыл бұрын

Ceph can be a great match depending on how big you need to scale. At the small scale, it's a big pain to setup and expand. As you scale up, the benefits to using Ceph over really anything else become huge. But, it can certainly keep storage highly available to go with other HA compute solutions.

@mattblakely7036 Жыл бұрын

Is ceph possible with 2 nodes and using a qdevise for quorate? Im really interested in having 1 vm and 1 ct online 100% with zero down time accross 2 nodes in an HA. This would allow the vm or ct to migrate without rebooting or starting up again.

@CJ-vg4tg 9 ай бұрын

Hi there. Thanks for the detailed vids. Is there anyway of installing the mgr dashboard on proxmox 8?

@thinguyen937 Жыл бұрын

I got this error, please tell me how to fix it : Error ENOENT: module 'dashboard' reports that it cannot run on the active manager daemon: PyO3 modules may only be initialized once per interpreter process (pass --force to force enablement)"

@ernestoditerribile 2 жыл бұрын

Very different scale of computing. We use Lenovo Truscale with 64 Maxed out Thinksystem SR 670 v2 systems. Using Proxmox and Ceph. To have a reliable low latency datacenter. Even €2.500.000 is not enough to buy everything in that datacenter. We only Use Cisco, IBM(only for the supercomputers) and Lenovo.

@apalrdsadventures 2 жыл бұрын

I'd love to work up to a larger scale, but working with Ceph at a small scale is still a ton of fun

@ernestoditerribile 2 жыл бұрын

@@apalrdsadventures I got your video in my recommendations. Thought it was good. Was surprised that people even tried to use Ceph at a home environment. With such a cheap solution. Also a good solution to get young kids into networking by playing around with it. Or for IT-Students to get into Proxmox, VMWare, Ceph, and all kinds of different Linux distributions.

@apalrdsadventures 2 жыл бұрын

I've certainly learned a lot about Ceph just by making this video, too! It's a great solution, even for small-medium sized data, and a lot of people probably overlook it due to the perceived complexity.

@ewenchan1239 3 ай бұрын

Just as a quick update to this video -- the command "ceph dashboard create-self-signed-cert" no longer works in Proxmox 8.2.4 with Ceph Quincy 17.2.

@apalrdsadventures 3 ай бұрын

There has been a whole saga of ceph dashboard breaking (entirely, not Proxmox-specific), then the simple fix for Proxmox being to temporarily remove the self-signed-cert option that required a Python dependency that didn't like running in ceph manager (multiple sub-interpreters).

@ewenchan1239 3 ай бұрын

@@apalrdsadventures Yeah -- I remember that for the newer(est?) version of Ceph. But for Ceph Quincy 17.2 -- just tried deploying it again last night and ran into the error at the `ceph dashboard create-self-signed-cert` step. FORTUNATELY, the error message also TELLS you how to generate, I think, it's the RSA 2048-bit key (or maybe it was PKCS X.509 -- I don't remember off the top of my head now), and then how to import both the key, and the to sign the cert, for the dashboard. So at least, I was able to copy-and-paste those instructions into my deployment notes, and then use them, and that worked. (I thought that they had patched that Python issue where it broke Ceph?) In any case, just wanted to bring this to your attention (and anybody else who might be using this video as a guide/tutorial on how to deploy Ceph on Proxmox). Thank you!

@jaykavathe Жыл бұрын

I used your guide to set up by 3 node ceph cluster. But somehow during Proxmox update my cluster seems to have gone corrupted/unresponsive. Can I wipe out my nodes, reinstall ceph and import my current OSD/data? Any help or a video on reimporting CEPH would be hugely appreciated.

@johnwalshaw 2 жыл бұрын

What are your thoughts about using Optane on each Proxmox ceph cluster host for DB and WAL disk? e.g. each host with 2TB NVME OSD plus 118GB Optane for DB and WAL.

@Breeegz 2 жыл бұрын

In your small "hyperconverged" cluster, I remember you had a 2.5Gbit USB-Ethernet adapter. I built a slightly larger, more expensive version out of Lenovo Tiny's, and I'm feeling constrained. You mentioned that there's a performance benefit to having a "Private" Ceph network and a "Public" Ceph network, do you think that the trade-offs of adding a USB 3.1 Ethernet Adapter is worth that performance? I'm getting 50-70MB/s write, with some spikes to 150MB/s when I share the Public/Private. If I added a 1Gbit backend, what kind of performance gain could I see? What about adding a 2.5Gbit backend? For perspective, a 1Gbps backend would run me about $35, where the 2.5Gbit would cost me roughly $200. Basically, I should have bought full towers (optiplex's or equivalent) so I could add cheap NICs, bonded them, room for more drives/ect..

@apalrdsadventures 2 жыл бұрын

I haven't tested my own cluster with a separate private/public network, but in general the public network is used for client access (Proxmox/Qemu -> Ceph) and the private is used to replicate from one OSD to another (Ceph -> Ceph). A single write requires one public and two private transactions (Qemu -> first OSD, first OSD -> second and third replicas), so it should theoretically see twice as much bandwidth in the standard 3 way replica config. Some of those transactions go straight to the local system OSDs and bypass the network, and some will go between the two other nodes on the network, so that's how you end up with the 50-70 MB/s write speed on a network that should be able to do ~110MB/s. So, it depends on how badly you need to improve the 50-70MB/s and how much bandwidth also you need for VM traffic, which in my setup is sharing the same 1G link. I did buy USB 2.5G NICs and they will be in a future Ceph video, so I had the same idea as you. I think the costs I anticipated were lower, but I'm not buying a new switch. For me, with USB flashdrives, they are slower than the network so it's not a major issue yet.

@Breeegz 2 жыл бұрын

@@apalrdsadventures I would appreciate it if you ran a test on 1Gbit combined, 1+1Gbit and 1+2.5Gbit in that upcoming video. The write tests I was doing was "dd if=dev/random of=file1 bs=10M count=100" and then I would "rsync -ah --progress file1 file2" then I could easily run these two commands to take a second reading. As far as my 50-70 MB/s is concerned, I'm trying to squeeze out all I can from this cluster, and I can see how very very BIG Ceph is, so tuning it is difficult. If 70 MB/s is all that's expected, then I know I'm not missing a config or some sort of tuning. PGs, WAL, Separate DB disk, etc.. Every where I look, the consenses is, run it on dual 10GBit links!! Dual 40GBit links!! You will be sorry if you don't have at least a 100Gbit backend link!!!! ...then there's the one guy that says one 1Gbit is enough. I just want to know what I can expect so I can make the $0 decision, the $35 decision, or the $200 decision. (yes, I would need a new switch to handle 2.5Gbit)

@apalrdsadventures 2 жыл бұрын

There's a big difference in what you need for a production network at scale (which assumes your scale is large enough to require Ceph) and what you need for Ceph to run at all. The recommendation for 10G/40G comes from a company selling boxes with up to a petabyte of storage each, so the smallest cluster they are considering is probably on the order of 100TB. Depending on the use of the data (archival vs 'hot' data), 10G to 100G would be prudent for a zfs+nfs array of that size anyway. Obviously with a slow network, you'll get much lower IO bandwidth than with a fast network. With Ceph and a single network, you'll also get lower IO bandwidth than you would with NFS/SMB over the same network due to the additional bandwidth of Ceph replication across nodes. Your numbers match some back of the hand math and seem in the right ballpark to me. You'll also be much more vulnerable to PG rebalancing (especially with a failed disk or when new disks are added), since a massive amount of data will need to move to its new location (or be replicated if a replica is lost). This happens in the background as the pool is active and more IO on the pool just delays the rebalancing.

@Breeegz 2 жыл бұрын

@@apalrdsadventures I really appreciate your time. None of the above is lost on me. I know I'm not building enterprise level file stuff, which is why it's hard to search for what to expect on the internet, because so many people on Reddit are trying to build enterprise level stuff. I want to play with the big toys at home, and I think I'm pretty close, but I may have designed myself into a corner. Before I make another ($200) step in that direction, I'd like to have a better understanding of what to expect. If it really bumps my performance, how much are we talking? That's the crux of my issue, and I'm not asking you to solve it, think of it as a hopeful suggestion for future content.

@apalrdsadventures 2 жыл бұрын

I'm working on getting a full 2.5G setup for my cluster, but I've been having issues with the 10Gbe to SFP+ transceivers I bought negotiating down to 2.5G. My Mikrotik transceiver works fine, but then I got a cheaper brand for the cluster project and they aren't working even though they claim to support 802.3bz. I might just get the 2.5G switch, eventually I'll need it anyway since I'm planning on bringing the microserver into the cluster on 2.5G + dual 1G and also working on an ingest station that I'd like to connect at 2.5G. Another option is to add more nodes to spread the bandwidth across more of them. The scale-out nature of Ceph works well for this.

@niklasxl 2 жыл бұрын

so are there reasons not to use proxmox as a NAS with this? apart from parity/duplicate data needing to go over ethernet instead of within the a node? this seems like a flexible way for home servers to easily expand both compute and storage

@apalrdsadventures 2 жыл бұрын

It's extremely flexible, but the minimum setup is 3 servers for a functional Ceph cluster. If you want an all-in-one solution, this is not it. If you want a highly available solution for both compute and data, this is definitely it. ZFS / TrueNAS is usually a single point of failure even with clustered Proxmox. You can keep backup copies synchronized so you don't lose data, but with Ceph the data is not only duplicated as it's written (so a sync write that completed is safe from a host failure immediately), but also keep the data online and available to clients during a host failure. It's like how RAID is not a backup but lets you continue operating when a drive fails, except for entire servers. The only downside is that you normally use an NFS and/or SMB gateway for filesystem users and that gateway server can become a single point failure for clients who are not native Ceph users. Proxmox is a native Ceph user, but your desktops/laptops probably are not and will go through a gateway server.

@niklasxl 2 жыл бұрын

@@apalrdsadventures oh this is really interesting might have to try it at some point :D

@apalrdsadventures 2 жыл бұрын

Basically, do you want to scale up or scale out? that's the zfs vs ceph question.

@niklasxl 2 жыл бұрын

@@apalrdsadventures i dont really know yet exactly what i want :D but flexibility and availability are always nice. but basically just a home server(s) / lab

@araujobsdport Жыл бұрын

Really nice example! Well done :)

@apalrdsadventures Жыл бұрын

Thanks a lot!

@v0idgrim 2 жыл бұрын

I have a question. In this setup is the vm running on all the nodes (active-active) or is it running on one node (active-passive) and what would the recovery speed take for the vm to be usable / reachable again in case of a say powerfailure of one of the nodes.

@apalrdsadventures 2 жыл бұрын

The Ceph side is all active, so data is always accessible from clients at any time. The Ceph monitor (web dashboard) is active-passive but you don't need it for data access Proxmox is active-passive, it will migrate a VM to a new node when a node goes down and restart it from disk.

@thestreamreader Жыл бұрын

Any experience with Harvester? It seems like it might be a great option and has a different clustering system.

@karloa7194 10 ай бұрын

It has been a year now since you made this video. Are you still running Ceph?

@apalrdsadventures 10 ай бұрын

Only in testing, I only have two 'real' nodes in the lab (+ the 3 thin clients) but the thin clients are too slow to do more than experiment.

@marco114 Жыл бұрын

I got errors and am reluctant to start over.

@JuriCalleri 2 жыл бұрын

I subscribed (and liked) because these videos happen in the perfect moment! I'm trying to build a proxmox cluster, hyper-converged and HA but I only have 2x identical computer, simple Ryzen 3 4c/8t 32GB rig, an intel nuc 8th gen and 1 raspberry pi 4. Your prev. video helped me understanding the Qdevice and how to actually get a cluster with quorum to work, and I can use that on the hardware I have, but it is not clear to me if I can install Ceph (or glusterfs) on the 2 identical nodes and let the intel nuc or the raspberry simply out of this but still replicate the VM disk on them for that single VM that, no matter what, has to be HA. Or, maybe, do ceph and gluster only work when installed on 3 nodes? Like, literally, your videos showed up to me at the perfect time! Like the Room of Requirement in Harry Potter . That's wicked! Tanks!

@apalrdsadventures 2 жыл бұрын

Ceph doesn't handle 2 node clusters nearly as gracefully as Poxmox does. You do not need more than 1 monitor, but if you have more than 1, then you will have quorum requirements for the monitors which mean you really need 3. Additionally, you will have replication requirements at the that may not be able to be fulfilled with only 2 nodes - by default the 3/2 replication rule requires there be 3 OSDs to store a given placement group, on 3 different hosts. With only 2 hosts, it will be forever stuck at the min_size, which means failure of either host takes the pool offline. You should be able to run the Ceph monitor on the Pi 4 to get to the quorum requirement, although I'm not sure who builds the latest version of Ceph for aarch64. That doesn't fix the replication issue. You can reduce the replication rule from host to OSD (so then you need at least 3 disks instead of 3 hosts), but a single host failure can then bring you at or below min_size and again take the pool offline. 45Drives recommends starting with extra Ceph nodes in virtual machines initially (i.e. 3x VMs on 1x host, migrate the 3x VMs to 3x hosts when you build more nodes) to deal with this issue without configuring your cluster in a way that allows less redundancy if you plan on growing into a proper cluster in the future. This just you from having to recreate pools with new rules when you expand into a proper HA setup, but doesn't fix the HA issue for a 2 node ceph cluster. In your setup I'd recommend using ZFS instead of Ceph and relying on Proxmox's ZFS replication for HA VMs. They will potentially be 'behind' (since it syncs the VM disks every 15 minutes instead of truly keeping the storage coherent across the network like Ceph does), but it works in 2 node clusters.

@JL-db2yc 2 жыл бұрын

@@apalrdsadventures thank you for this detailed answer! I have a similar setup to what Juri Calleri described and had the same question. Based on your recommendation I will keep to ZFS.

@zparihar Жыл бұрын

Question: Can you add a WAL Disk and DB disk later? (after you've created your CEPH OSD's)?

@apalrdsadventures Жыл бұрын

Usual recommendation is to delete / re-create OSDs when changing things, although it may be possible to move the LVM VGs around

@BigBadDodge4x4 Жыл бұрын

I have 4 homes, plus servers at a datacenter. All sites connected via VPN's. If I put a proxmox system at each site, can they be clustered? Or should I just put one cluster at datacenter site ( has dual 10Gig internet lines).

@apalrdsadventures Жыл бұрын

Proxmox is not happy about higher latency links. I'd keep the cluster to the datacenter only. You can share a backup server between them though, which can make migrating VMs very that difficult (backup -> restore).

@Catskeep 2 жыл бұрын

thanks for the Great video..!! i'm still a little confused because i have limitation in language.. I want to ask, if I have a vm that is on host1, and then host 1 has a problem let say its gose down, will the chep automatically move the vm to host2 or host3 ? If the answer is yes, will the vm move in a state on or state off?

@apalrdsadventures 2 жыл бұрын

By default, Proxmox will not move VMs. If you configure the VM as a high availability resource, then it will wait to be sure that the host has gone down (~3 minutes by default) before restarting it on another host. At that point, the VM will be booted from the disk image, so it won't transfer it live if the host goes down. You can configure it to transfer the VM live when the host is shut down (for maintenance).

@MikeDeVincentis 2 жыл бұрын

Awesome video. Just what I've been looking for to get started. I have 3 Dell R410's I'm currently building out. How does it work with expansion of storage across the cluster? If I put 1 physical drive in each server to use as an OSD, (I have an SSD for boot and another SSD for cache, and can add 4 spinning drives) can I expand later by adding a single disk at a time to each server? I'd probably start with 3 x 10 TB drives for storage and expand as needed.

@apalrdsadventures 2 жыл бұрын

It depends on if you want OSD level or host level redundancy. By default the rules require host-level. So having 3xR410s each with 1x10TB each will get you 30TB in the pool and 10TB usable with replication (or 20TB with erasure coding, although not all pool types can be erasure coded). Adding 1x10TB to a random server won't get you anything since it can't maintain host level redundancy, but adding 1x10TB to each server will double your capacity. However, it will suddenly spend a bunch of time moving data around to rebalance the cluster when you do this, so performance might take a hit during the process.

@fbifido2 Жыл бұрын

How would you upgrade your hyper-converged? hypervisor & ceph ???

@PonlayookMeemeskul 2 жыл бұрын

If there's a need to frequently migrate VMs across nodes (Overcommitting number of VM on limited physical resource/node) Would Ceph solve the problem of newly migrated VM having data to start right away? And what is the "actual" usable storage once the setup has completed following this tutorial? Thank you very much

@apalrdsadventures 2 жыл бұрын

Yes mostly. With Proxmox, migration across nodes will always require the VM to either shut down or the RAM to be migrated, which will take the VM offline for a short period. With shared any type of shared storage (Ceph, NFS, iSCSI, SMB), Proxmox will sync the VM disk to the shared storage right before it moves the VM, and rely on the shared storage to keep the VM disk changes up to date. With Ceph, you get shared storage that's also guaranteed to be consistent as entire hosts go down, whereas something like NFS/SMB you can cluster but it's harder to guarantee a file write is atomic across the whole cluster. The only case where you'll be waiting on VM data is when you use ZFS replication instead of shared storage. Actual usable storage is 1/3 of the total, since it's keeping 3 copies of the data, assuming all disks are equal sized. With some more manual setup you can use erasure coding for the data (still need 3x replication for metadata), which has math more like RAIDx (5,6,7,...) but usable capacity depends a lot more on how many nodes you have and how much storage is in each node. tl;dr it's usually better to have more nodes than large nodes in Ceph if you want host-level redundancy.

@PonlayookMeemeskul 2 жыл бұрын

@@apalrdsadventures Thanks a million for very detailed answer, really appreciate it. My friend has a few GPUs laying around from his mining rig, so I'm building 2 gaming PCs for our daughters and us dads to play games remotely. So far, I'm juggling 4 VMs on these 2 hosts, where I'll "migrate" and boot up VM on the host that GPU isn't being used (kinda like four of us doing time-share on the GPUs lol) The only problem is the data. First I was gonna do simple game running on NAS, but decided to look into Ceph which seems like fun. Thanks again, looking forward to more of your vids. Cheers.

@apalrdsadventures 2 жыл бұрын

You'll likely have better performance in the VMs with a NAS, since Ceph tends to be better at parallel IOPS across many VMs and less good at throughput and latency for individual VMs. If you don't need to sustain the failure of a storage node, then a NAS will work fine.

@carsten612 2 жыл бұрын

just hit like for the statement "jsut for the clickbait - it is hyper converged" :D

@DawidKellerman 2 жыл бұрын

Please also discuss snapshots !!! in ceph

@apalrdsadventures 2 жыл бұрын

There's only so much I can fit in one video! But I already have a follow up planned

@NirreFirre 2 жыл бұрын

A bit too deep sys admin for me but ceph seems to be very similar to MongoDB clusters in a lot of areas. Cool but our ops have consolidated to use NetApp Ontap and Trident stuff. My dev teams just wants huge, robust and fast storage :)

@user-gw9el1ew2f 2 жыл бұрын

great video! can't wait for ceph file clustering on text youtube

@apalrdsadventures 2 жыл бұрын

I can't wait either, Ceph is a monster topic though so working through basic RBD first.

@FuzzyScaredyCat Жыл бұрын

Newer versions seem to require that ceph-mgr-dashboard is installed on all nodes otherwise you get an error: *Error ENOENT: all mgr daemons do not support module 'dashboard', pass --force to force enablement*

@apalrdsadventures Жыл бұрын

It needs to be installed on all nodes which have Manager installed, in my case I only installed manager on one node since it isn't a critical service

@meroxdev 2 жыл бұрын

For a setup with 3 optiplex, ceph storage will work if the optiplex have only 1 ssd ? ( so 1 ssd per node where will be also proxmox installed ), or ceph need dedicated disks ? Thank you! Amazing content 🤜🤛

@apalrdsadventures 2 жыл бұрын

It will work with only 1 disk per node. You'd probably want to install Proxmox on Debian instead of using the Proxmox installer, setup a custom partition layout with most of it going to Ceph, and then install Proxmox and Ceph on that. Performance wise, your options are only 3x replication (so the usable space is the size of the smallest host) or k=2 m=1 erasure code (1 redundant shard, total space is double that of the smallest host).

@randomvideosoninternet7897 Жыл бұрын

sir, how to add sdb storage on proxmox? there is no sdb storage on my proxmox tq

@zparihar 2 жыл бұрын

Great video bud!

@apalrdsadventures 2 жыл бұрын

Glad you liked it! Plenty of Ceph projects in the works, eventually

@cypher2001 2 жыл бұрын

For UNDER 20.00, you could have got a low end 120gb SSD. Shucked it, and its a direct replacement for the 16gb drive. Just pops right in the slot. Only modification i've had to make is bending the memory shield a little to accommodate.

@apalrdsadventures 2 жыл бұрын

Sharing the OS drive with Ceph is a bit more painful than it should be on Proxmox, since the installer doesn't let you do custom partitions

@itsmedant Жыл бұрын

@@apalrdsadventures I was able to get custom partitions installed with Proxmox, but it’s still saying I don’t have a disk available for an OSD. Do you have any idea how to do the install into a different partition?

@tomokitaniguchi7908 Жыл бұрын

I keep getting the following error when i try to use the ceph pool "modinfo: ERROR: Module rbd not found.” Did I miss a step?

@apalrdsadventures Жыл бұрын

RBD should be installed by the Proxmox kernel package, which the Proxmox installer should have installed. Did you install on Debian or something?

@ShahzadKhanSK 2 жыл бұрын

thanks for explaining the concept. I recently started tinkering with proxmox. I have two physical and one virtual node. Each node has two 1TB ssd, so 2TB each node. For HA, I am using NAS (a single NVME) and all of my HA VM are stored there. Any idea, how this storage could be configured to take advantage of Ceph ?

@apalrdsadventures 2 жыл бұрын

Proxmox does work fine with 2 nodes, but Ceph really needs at least 3 nodes with storage to work. So, unless you want to migrate your NAS to Proxmox as well, it won't really be a good experience.

@ShahzadKhanSK 2 жыл бұрын

@@apalrdsadventures thanks for explaining this. I got all 3 nodes up and ceph is working as expected. My situation, i have two SSD and OS is running in a separate SSD in each node. Should i use both SSD as OSD or one OSD and second with DB/Wall disk. What would be a good composition. Any idea?

@apalrdsadventures 2 жыл бұрын

DB/WAL disks are for when you have significantly faster storage available to store metadata. Since SSDs are (roughly) the same speed, they should be separate OSDs. If you have NVMe, sometimes it's recommended to partition a drive and run a few OSDs on it for better multithreaded performance of the OSD. For SATA this is not recommended.

@ShahzadKhanSK Жыл бұрын

@@apalrdsadventures I have one 1TB NVME and on 1TB SSD. I can create two 500G partition in NVME and leave single partition in 1TB SSD. The OSD performance will improve on NVME because of two different threads and SSD still operate on single thread. did i pictured it right?

@blckhwk8024 2 жыл бұрын

Nice explanation, thanks!

@jefferytse Жыл бұрын

First, your videos have been awesome, thank you. I'm in the process to migrate some of the VM from vmware to Proxmox. Initially, I was going to do a HA for the VM into a second server but after watching this video. I wonder if I should use ceph instead. Tried to join your discord but it's not working. Love to pick your brain.

@apalrdsadventures Жыл бұрын

HA can be used with Ceph or ZFS, are you talking about ZFS replication vs Ceph then?

@jefferytse Жыл бұрын

@@apalrdsadventures let me clarify this. I was going to do zfs for all individual servers but now I have a cross road right now to choose between zfs or ceph. I also have zfs over iscsi setup as well. I want to make sure that I won’t lose accessibility or data if any of the servers go down

@posalab 2 жыл бұрын

For HA lab study is more simple build a gluster FS storage data, on the same 3 external drives. Just my humble opinion... But ceph is obviously a good choice.

@thestreamreader Жыл бұрын

I am reading that ceph doesn't run very well on slower harder as displayed here. Can you do a video on using GlusterFS with ZFS as underlying disk type. Then explain the benefits of this vs Ceph and vice versa? I think the other option is ZFS in HA but your sync only happens in 1 min intervals.

@apalrdsadventures Жыл бұрын

Gluster really just pushes resource limits outside of its control onto the OS (i.e. ZFS) where Ceph manages the full stack on its own. Ceph also deals natively with block devices while Gluster only replicates file IO, so you end up with the qcow2 driver on top as well. Hence, the suggestion to use Ceph for native block device VMs, plus its better integration with Proxmox.

@thestreamreader Жыл бұрын

@@apalrdsadventures Is it safe to run ceph on a 1gb network?

@apalrdsadventures Жыл бұрын

Safe? Definitely. Fast? Depends on your expectations but it does certainly have some performance loss vs zfs, in return you get strong guarantees that data is always safe from host level failure cluster wide (where zfs can only tolerate disk failure). Latency is also rather important to Ceph, so an underutilized network will help.

@thebrotherhoodlc 2 жыл бұрын

Do you need 10Gbe + ethernet for this to be used in Production scenarios?

@apalrdsadventures 2 жыл бұрын

It depends on the bandwidth of your storage and if you actually need to saturate that bandwidth or just need the space. I have really slow USB3 flash drives with a 120MB/s read speed, which is roughly the same as gigabit Ethernet. In my specific case, gigabit is well matched to these really slow flash drives, assuming there isn't any additional traffic from the Proxmox VMs themselves (or you are using a separate NIC for that traffic) If you go to spinning rust, you should get roughly 150MB/s write speed per disk, which means you should look at 2.5Gbe minimum, but with a realistic number of spinning drives per node, 10G would be good. If you want to run NVMe or SSD based storage and want to saturate it bandwidth wise, you'll need above 10G for the cluster network at a minimum. 45Drives usually recommends 10G public and 40G cluster for their large spinning rust pools (~40 drives + SSD DB drives per host). If you are just doing Ceph for redundancy and scale out space (i.e. archiving data) and not for any scaling out of speeds, you can of course use gigabit and tolerate everything being slow. Rebalanace and backfill will take a long time, so the pool will be degraded for a long time if you have a drive or host failure to recover from.

@thebrotherhoodlc 2 жыл бұрын

@@apalrdsadventures Awesome thanks

@s.m.ehsanulamin7235 2 жыл бұрын

while implementing this i didnot find out /dev/sdb. So i cannot able to create the osd. could you propose me some kind of solution?

@apalrdsadventures 2 жыл бұрын

If you ls -l /dev/disk/by-id it should show you all of the disk names. If you don't recognize the disk there, it's a hardware issue. -l will show you what the hard link is to, so you can see the /dev path for the disk name. It's a bit unfortunate that Proxmox doesn't directly use the by-id paths, since those are more reliable with hardware changes.

@s.m.ehsanulamin7235 2 жыл бұрын

@@apalrdsadventures Actually I can see /dev /sda with it's partition. But I cannot see /dev/sdb. If I cannot see this then how can I make the ceph? Will be glad to have some sort of solution from your side. What can I do now next?

@sjefen6 2 жыл бұрын

Would a 2 node proxmox + nas with ceph and qdevice be a feasible option?

@apalrdsadventures 2 жыл бұрын

It depends on how you deploy Ceph, but a 2 node cluster is really not possible in Ceph while maintaining high availability. Ceph's monitor needs 3 for high availability. You could install a monitor on the qdevice. The manager is not required, so you can install 2 of them and not lose access to the pool when the managers are all down. But you still can't get high availability storage with only 2 storage nodes in Ceph. You need at least 3 nodes to meet the placement group requirements (size 3 / min size 2), allowing you to lose a node and continue operating. Otherwise, with 2 nodes, you will always be at min size and losing either node makes the pool inaccessible.

@sjefen6 2 жыл бұрын

@@apalrdsadventures Yeah. Then would not running ceph and qdevice on a NAS allow one proxmox node to continue operate if one proxmox node fails?

@mithubopensourcelab482 2 жыл бұрын

Whether snapshot are possible for rolling back under Ceph ????

@apalrdsadventures 2 жыл бұрын

Ceph does have a snapshot features and Proxmox will use it through the Snapshot menu for VMs

@Megatog615 Жыл бұрын

What are your thoughts on MooseFS/LizardFS?

@apalrdsadventures Жыл бұрын

Ceph and Gluster are much more commonly deployed. Ceph also has the advantage of not requiring a metadata server (which can be a bottleneck) for non-filesystem workloads, and also natively supports workloads with more limited semantics (RBD for block devices and RGW for S3-compatibility, which has fewer features than POSIX compliance so it's lighter weight to implement). Proxmox also natively integrates Ceph, and Ceph is *extremely* flexible in dealing with mixed storage setups and mixed levels of redundancy

@AdrianuX1985 2 жыл бұрын

I come across negative comments about CEPH quite often. What is your opinion?

@apalrdsadventures 2 жыл бұрын

It's not for people who aren't ready to scale OUT. But I've had no negative experiences running it on the cluster.

@camaycama7479 2 жыл бұрын

I have a 8 server cluster (mainly dell r830 and r730). I've alwaysbe tempted to start using Ceph but... Watching you video I think I'll give it a try. Does Ceph can restore a failed quorum? It would asume that OS drive are part of Ceph, which is scary.

@apalrdsadventures 2 жыл бұрын

I'm running with a ZFS OS drive for Proxmox, so no the OS drive is not part of Ceph. I don't believe there's a way to boot off the Ceph cluster, but you can use partitions or LVM to give Ceph some of the boot drive space (just don't put Ceph on a zfs zvol). Ceph will recover from a quorum failure (of the 3+ monitors) on its own, but during the quorum failure it will be inaccessible to clients entirely. The Managers (stats and dashboard) are active-passive and can all fail without affecting pool IO. If you fail out enough OSDs without monitor failures you can also get into a scenario where IO failures start to occur because the PGs can't meet the minimum replication rules with OSDs that remain. That's also possible if your replication rules are impossible to achieve with the number of OSDs and hosts in the system.

@camaycama7479 2 жыл бұрын

@@apalrdsadventures thank you so much! This clarifies even more the fact that I have to test it on the test-lab. Cheers!

@mtartaro 2 жыл бұрын

FYI erasure coding is really zero suppression and usually requires a minimum of nodes.

@apalrdsadventures 2 жыл бұрын

You can build an erasure coded pool with 3 nodes as well, you just don't get the same storage efficiency or failure tolerance as you can get with a much larger setup (if you stick with host-level redundancy instead of OSD-level). The only option would be a 2/1 pool (2 data shards + 1 coded shard). For the same efficiency, a 4/2 pool would handle twice as many failures, or you could go even higher (i.e. 5/1 or 14/4 or whatever) to get good storage efficiency if your cluster is big enough to spread out the shards. Erasure coding is functionally equivalent to RAID5/6 or RAIDZ1/2/3+ but with a lot more control over how the data is split and how much failure tolerance you have, so like all of Ceph, it's a great solution if you have enough data to make it worthwhile.

@chriswiggins3896 Жыл бұрын

Add a 2.5Gbe usb dongle to increase network throughput.

@zippytechnologies 2 жыл бұрын

Now we just need ceph nas setup for samba using a vm to manage...

@apalrdsadventures 2 жыл бұрын

I'm working on CephFS (and erasure coded pools in RBD in Proxmox), but the video was already getting too long to include all of that information at once

@zippytechnologies 2 жыл бұрын

@@apalrdsadventures sounds like a new video shall soon be made, no?

@apalrdsadventures 2 жыл бұрын

The next Ceph video is going to be erasure coded RBD pools. No guarantees on timing of that. CephFS will come after that video.

@kimcosmos 2 жыл бұрын

The Wal reduces write latency but what about a cache for read latency of that spinning rust? The NVME DB is only for the metadata and lookup speed. VM disks on NVMe are nice but not deduplicated. LAN games levels need a cache "If you ARE using OSD level redundancy, then don’t use partitions for your DB disks." Lol. Node level redundancy instead. No redundancy and with a partitioned NVMe DB? Can't it be in a replicated pool? With a separate backup system for data you care about. Eg entire redundant clusters like HA storage pods? Or a local ML DB cache P.S. I thought the special in ZFS used cached writes to speed repeat reads (eg gaming levels)? So that the storage merely slows if it fails.

@apalrdsadventures 2 жыл бұрын

When you lose the db disk of an osd, you lose the osd. If you can only tolerate the failure of some number of OSDs (instead of host-level redundancy, which can tolerate the failure of some number of hosts), you need to make sure the db disks can't cause more osds to fail than your redundancy level can recover from. In general, you should not be using any raid layers under ceph and should only use lvm for partitioning drives for db/wal disks, not to rely on it for RAID1 or something like that. I didn't mention it in the video, but yes you can add disks in a way that forces the system to rebalance to what its new layout will be before removing the old disks (it keeps the old 'layout' active but calculates what data will need to be on the new disks and fills them before switching clients to the new layout - this is called a 'backfill'). The other option is a direct disk swap where you disable rebalancing, remove the failed osd, add a new osd on the new disk (which will claim the same ID as the old one), and enable rebalancing again. If the new osd has the same ID and capacity, the CRUSH map shouldn't change, so it won't have to move data all over the cluster like a normal osd add/delete would do. The ZFS special vdev contains the uberblock and all of the metadata (and meta-metadata), so it's health is more important than the data drives. Once you add the special device, new metadata goes to the special device only until it's full, so failures of the special device mean failure of the whole pool. Single-sector failures might not be quite as bad since zfs triplicates meta-metadata and doubles metadata so there are backup copies (it does not do this for file data by default). ZFS doesn't do tiered write caching at all, but separating metadata to faster storage makes most operations faster since it's faster to look up directories and file block maps. The slog is different, it's purely to write the ZIL (zfs intent log) to disk faster so it can return a sync write guarantee with lower latency. It is NOT a write cache, the transaction group is still kept in memory and still needs to be written to the data disks before it removes it from the dirty data buffer, meaning it will still increase write pressure while the data is on the ZIL but not yet on the data disks. The ZIL is only read when the system has a hard shutdown, and if you lose it, you only lose the transaction groups which weren't completed on the data disks (a few seconds of data, but data ZFS guaranteed to the application was safe).

@amyslivets 2 жыл бұрын

Cool. Keep going 👍🏻

@apalrdsadventures 2 жыл бұрын

I have more videos coming, eventually!

@peanut-sauce 2 жыл бұрын

So is CEPH or useful with only two nodes?

@apalrdsadventures 2 жыл бұрын

Not really, 3 nodes is a much better setup. Replication rules by default require 3 copies to be on 3 hosts. It's technically possible to go down to 2 monitors and change the rules to be per-OSD instead of per-host, and 3 drives is still the absolute minimum.

@peanut-sauce 2 жыл бұрын

@@apalrdsadventures So if I want to set up a proxmox cluster with only 2 devices in total (no NAS) do I just forgo shared storage and not do live migration?

@apalrdsadventures 2 жыл бұрын

At 2 nodes, your best bet is local ZFS on each node and ZFS replication between the two. Non-live migration, but still high availability is possible as long as you have a Qdevice or a third node acting as quorum only. Ceph really doesn't work great below 3 nodes, that's really its minimum.

@peanut-sauce 2 жыл бұрын

@@apalrdsadventures Oh, I see. Thanks for being so helpful! But is there any point in a proxmox cluster at all with two nodes and no quorum-keeper?

@apalrdsadventures 2 жыл бұрын

If you need more than one node for CPU/RAM reasons, being able to migrate is handy. You can also force the cluster to maintain quorum with a node out during maintenance (pvecm expected 1), so at least if you shut down a node intentionally to work on it you can keep the system running. You don't get HA without a third source of quorum, but you can live migrate manually

@thinguyen937 Жыл бұрын

I use the command: ceph mgr module enable dashboard

@EzbonJacob Жыл бұрын

Great video. I've learned alot about proxmox from this channel. I have one question with the ceph-mgr I'm getting an "Access Denied" after login in with the user we created. I'm not seeing any helpful logs anywhere on why I'm getting a http 403 on the dashboard. Any suggestions on how to debug this?

@AamraNetworksAWS Жыл бұрын

Hi, have question while installing ceph dashboard in proxmox version 7.3 - 3, getting error. I have followed your other video kzbin.info/www/bejne/pKrLeqSbrN53eM0 but could not mange to install. If possible share a step by step tutorial or a guide to follow through.

@ilducedimas 8 ай бұрын

ceph is tough

@norriemckinley2850 2 жыл бұрын

Great

@apalrdsadventures 2 жыл бұрын

Glad you enjoyed it!

@naturkotzladen Жыл бұрын

Beside all the valuable tech tips, please set up some affiliate links for your shirt collection, I would buy the make-fail shirt NOW... ;-)

@apalrdsadventures Жыл бұрын

Almost all of the shirts I wear are from other KZbin creators - the make-fail-make-fail-make one is from Evan & Katelyn - shopevanandkatelyn.com/products/make-fail-mens-tee

@jwspock1690 10 ай бұрын

Top

@FrontLineNerd Жыл бұрын

Nope. Dashboard does not work as shown. Can’t create cert. can’t create user. Those commands don’t work.

@AdrianuX1985 2 жыл бұрын

@apalrdsadventures 2 жыл бұрын

Glad you liked it!

@yatokanava 2 жыл бұрын

Спасибо! очень наглядное видео!

@moogs Жыл бұрын

Love ceph but it’s slow AF…

@apalrdsadventures Жыл бұрын

For the decrease in performance you gain guarantees that writes are committed across the cluster with host level redundancy, which is a tradeoff a lot of clusters are willing to make for data security and to use lower cost / less failure-resistant hosts.