Nice work! Thanks for making this easy! I need to try it out someday!
@Jims-Garage6 ай бұрын
Thanks, Tim. I'm finding it particularly useful for K3S Servers and my firewall. Having the VMs failover automatically means there's no disruption to the cluster, no pulling pods etc.
@ewenchan12396 ай бұрын
1) You don't TECHNICALLY need a separate drive, you just need a separate PARTITION that Ceph can take over and have full control over. For example, in my OASLOA Mini PC (N95, 16 GB, 512 GB NVMe 2242 M.2 SSD), I partitioned the 512 GB NVMe SSD on each of my 3 nodes such that 128 GB is given for the Proxmox install, and the local-lvm, and then the rest is a separate partition that is given to Ceph to have dominion over. (My OASLOA Mini PC doesn't HAVE another slot where I can add additional storage devices, so I had to make do with what it has.) Once you have it partitioned like that, you can proceed with putting the 3 nodes into a Proxmox HA cluster, per usual, and you can then set up the Ceph cluster as well, also via the Proxmox GUI to perform the initial install, and also to set up your first monitor. 2) re: iGPU passthrough This is why I DON'T recommend you install any VMs/CTs until the infrastructure has been set up to be what you want it to be. Set up the clustering and Ceph first, THEN set up your VMs/CTs. That way, the IOMMU groups will stabilise, such that it will be USABLE for what you're trying to do with it before deploying VMs/CTs/services.
@Jims-Garage6 ай бұрын
Thanks for the tips, I'll consider that on the next deployment.
@ewenchan12396 ай бұрын
@@Jims-Garage No problem. In my case, because my storage was dependent on the Ceph RBD/Ceph FS being up and running, before I can store the VM/CT disks, so; that meant that the clustering and Ceph had to be up and running first before I could do anything else. I know that you are storing the VM/CT disks on local storage, rather than storing it on the Ceph storage system, so you were able to start installing VMs/CTs before your Ceph system was set up.
@0xKruzr6 ай бұрын
yeah, but you don't want to write-exhaust the device if it's also booting the node.
@ewenchan12396 ай бұрын
@@0xKruzr Depends on how much traffic you're putting on the system/cluster. For my case, my 3-node HA Proxmox cluster running Ceph exists only to serve Windows AD DC, DNS, and AdGuard Home. So none of that is intensive. The monthly backups is probably more write intensive than anything else that happens for the rest of the month. (My N95 Mini PC, with only 16 GB of RAM, is too slow to really do much of anything else.)
@MrNGm4 ай бұрын
In the constrained setup ewanchan1239 describes, using a separate partition on a single drive may be acceptable. Readers with other setups and/or reliability wishes should take into account that Ceph's reliability stems from (among others) being able to spread out data chunks to a larger number of OSDs (object storage daemons), such that unavailability of 1, 2, or 10 OSD's doesn't impact the cluster. The latter depends on the configured rules regarding failure domains (further reading in the Ceph documentation: CRUSH maps). I would always advise reading a bit more on Ceph, its architecture on a high level, and the failure modes. In setup ewenchan1239 describes (3-replicated Ceph with Proxmox), the cluster will become unavailable if you're, for example, performing maintenance on 1 host, and the disk of another one fails. Nevertheless, having a setup where VM data is accessible on all hypervisors through shared (network) storage, maintenance on a single hypervisor becomes a lot more simple.
@Layer2Clouds3 ай бұрын
Great Video - we support Hosted Proxmox clusters in the US and your guides are a go to for our clients! Thank you Jim.
@Jims-Garage3 ай бұрын
@@Layer2Clouds wow, thanks for sharing. That's great to hear.
@Chris-rm1pn6 ай бұрын
MS-01s also have vPro which supports Serial over Lan, so if you lock yourself out and don't have GPU used by host you can use that to fix issues
@Jims-Garage6 ай бұрын
Thanks, I'm still to get that working. It's quite buggy from my limited trialling.
@Chris-rm1pn6 ай бұрын
@@Jims-Garage I recommend using meshcentral and their guides if you haven't tried it's the best working solution I found so far
@Andy-fd5fg6 ай бұрын
Long live the serial port! Tis a shame they don't have physical 9 pin serial connector
@cschwartz6 ай бұрын
@@Jims-Garageagreed. The implementation unfortunately is lacking and quirky. I loaded the meshcommander firmware on it to get web based kvm without needing meshcommander sw running on client or hosted app. However even that had quirks but enhanced functionality. I ended up giving up and going lacp with the 2.5 ports and reverted back to a trusty raritan ipkvm and a usb tty console. I never could get the wol aspect of it working and had to be in a booted state for it to function.
@cschwartz6 ай бұрын
@@Andy-fd5fgtty to usb…. No need for a db9
@muhammadabidsaleem70486 ай бұрын
Thank You Jim Keep posting new videos specially on SDN please
@davidbuchaca4 ай бұрын
Very nice and detailed tutorial! abbadon, sanguinius, dorn, proposing names for the following nodes: lion, khan, corax
@Jims-Garage4 ай бұрын
@@davidbuchaca awesome! Sage choices too!
@DS-ou7xm6 ай бұрын
Its Ok, Mate nothing wrong with having Cold and Flu symptoms..... And awesome video ... thanks
@johnwalshaw6 ай бұрын
I opted for 3x Nextorage NEM-PA2TB for 2GB DDR4 SDRAM. Very happy so far. It's great having a 3 node CEPH cluster.
@Jims-Garage6 ай бұрын
That's great, sounds like a solid setup.
@nadtz6 ай бұрын
If I hadn't already built a new proxmox host before the MS01 came out I might have gone this route (though with dedicated hardware for opnsense), it's kind of crazy what minisforum was able to pack into the MS01 for the price and that ceph + proxmox HA is available for home users for free.
@Jims-Garage6 ай бұрын
I agree. There are quirks but it's impressive.
@Carlos-RodriguesАй бұрын
I was waiting for this machine for so many years. Now I have 4 of MS-01. 3 for the cluster and another just for OPNSense. It's fast. It's stable. It's amazing. I just wonder if I can create a network with the MS-A1 through Thunderbolt so I can use it as a backup server with PBS.
@Insightfill6 ай бұрын
Oh! I've been looking forward to this one!
@Jims-Garage6 ай бұрын
Hope you like it!
@NickS342522 ай бұрын
Excellent video - I've been following along while tinkering with my own cluster. When it comes to fast nodes like the MS-01, it's a bit tricky to figure out what to put into ceph vs local storage given the performance limitations.
@Jims-Garage2 ай бұрын
@@NickS34252 thanks. I totally agree! I'm often scratching my head thinking which should I use.
@cschwartz6 ай бұрын
If you are going to continue to do iGPU passthrough, have you thought of passing a TTY console via USB to serial, that way you can connect up should HW change and pve wants to move around your NIC naming.
@Jims-Garage6 ай бұрын
Good idea, I'll look into that. Thanks
@fbifido26 ай бұрын
@4:33 - the thunderbolt backhaul does not show up as a network bridge inside Proxmox ???
@Jims-Garage6 ай бұрын
Eno5 and eno6 are the thunderbolt adapters. You could create a bridge if you wanted.
@rodneykahane49943 ай бұрын
not sure what the performance implications are, but the nvme osds that were created were classified as ssd. in the advanced tab, you can manually select the drive type (hdd,ssd, or nvme).
@Jims-Garage3 ай бұрын
@@rodneykahane4994 thanks, let me check that!
@johnvandenhurk8650Ай бұрын
First of all, I love your videos and have watched many of them. I have had a similar CEPH configuration on MSI Cubi Proxmox cluster using Samsung 990 Pro NVME SSD's. I was pretty happy with this until I noticed that less then six months in the SMART monitoring is failing on two of VNME's. Wearout for the three 990 Pro's, are (150% ,255%, 6%). On the proxmox forum I'm told that this is due to consumer grade SSD's. The 255% is from the node that does the most IO, but my no means these are heavily loaded systems. i wonder what your experience is so far on wearout because of Ceph?
@Jims-GarageАй бұрын
@@johnvandenhurk8650 thanks. It does chew through consumer SSDs. Mine is on about 40%, I think it's good for about 4 years in total.
@johnvandenhurk8650Ай бұрын
@@Jims-Garage Thanks for the swift response! perhaps it is only mine that have an issue, but mine are failing within a year. I will reach out to my vendor and create a ticket. I hope yours are better! How happy are you with your MS-01's? I'm considering an upgrade to an MS01 (i9-12900) cluster for the SFP+
@jeffersonsantos46036 ай бұрын
Great job, man. Do you have full network performance for Opnsense via the VirtIO bridges?
@Jims-Garage6 ай бұрын
Yeah, it maxes out 10Gb via iperf3 and full 2Gb up/down via speedtest.net
@romseaaccthree14486 ай бұрын
@@Jims-Garage i'm assuming this is for the same VLAN iperf test. Would you also be able to test iperf results for inter VLAN traffic?
@georgelza10 күн бұрын
... have you done a video where you expose ceph storage to a K8S cluster via a csi driver? I have a Proxmox cluster with Ceph configure over it, running a K8S cluster and would like to place my shared block storage for the EBS onto my ceph pool.
@sku20076 ай бұрын
there's some pcie passthrough translation in pve8. meaning you can set the hw for each node and in the vm the "friendly name" (don't know their wording right now, it's in datacenter somewhere)
@Jims-Garage6 ай бұрын
Thanks, wasn't aware of that. I'll take a look
@sku20076 ай бұрын
it's called resource mappings, right below metric server
@Jims-Garage6 ай бұрын
@@sku2007 thanks, I took a look just now and the i226-v isn't on the node. Very odd!
@sku20076 ай бұрын
@@Jims-Garage very odd! even when forwarded, the HW gets listed with lspci in host shell. with lspci -v you'll see a line with Kernel driver in use: vfio-pci
@Jims-Garage6 ай бұрын
@@sku2007 I've tried all of those to no avail. I'm going to load a live Linux installation. If I don't see it I'll rma
@MarkConstable6 ай бұрын
I'm pretty sure if you used the gui to set up Ceph you would have had less problems. I've done it a number of times and did not have to use the cli at all.
@Jims-Garage6 ай бұрын
The cli is necessary for the backhaul network. if it was simply the vmbr0 route then you're right, GUI would be a good choice.
@vonwerderc6 ай бұрын
Very interesting. I'm curious how HA with OPNsense would work. Wouldn't the WAN connection from your Modem only go into one node? If that one dies, how would the other nodes be connected?
@Jims-Garage6 ай бұрын
The WAN connection goes into a switch that splits the internet to the nodes via a vLAN. They are all members.
@headlibrarian19966 ай бұрын
How does routing work then? Only one member of the cluster should get the traffic and the switch wouldn’t know which one that is.
@Jims-Garage6 ай бұрын
@@headlibrarian1996 well there's only one firewall at a time.
@zxxz-ob7ll4 ай бұрын
The grim reality of the universe requires a grim order. The machine requires perfection. Any error can become a catastrophe
@Jims-Garage4 ай бұрын
Prophetic
@CastilloCrasher3 ай бұрын
How would one tap into this Ceph cluster from a Kubernetes cluster running on VMs in the HA Proxmox cluster?
@Jims-Garage3 ай бұрын
@@CastilloCrasher you'd simply select the storage volume on the ceph as the storage volume for the VM. You can see that in my OPNSense video afterwards whereby the OPNSense uses the ceph storage to make it HA with a single node.
@majoryoshi6 ай бұрын
I could be mistaken on this, but in regards to your HA OPNsense is there any reason why you couldn't your WAN in to a switch (even an unmanaged would do the trick) and plug whatever port your WAN ports on your notes into said switch? Since you're doing HA through Proxmox/Ceph and not through OPNsense, I see no reason why that wouldn't work. Please correct me if I'm wrong though.
@Jims-Garage6 ай бұрын
That's what I'm going to try.
@dimitristsoutsouras27125 ай бұрын
Nice presentation of the procedure and your special case scenario as well. At the part where you created a cephfs (after you created individual ceph managers), where does that fs created on? The same1Tb nvme storage? If yes shouldn t it have some kind of partition seperation between VMs storage and ISOs or those object storage services arrange that automatixally (where goes what).
@hyperprotagonist6 ай бұрын
He’s only gone and bloody done it 👏
@Jims-Garage6 ай бұрын
Haha, thanks. A lot of late nights behind this one for something that on the surface is quite straightforward!
@hyperprotagonist6 ай бұрын
@@Jims-Garagekudos for persevering. On twitter you highlighted the setbacks, on discord you kept everyone reassured, and in the video your demeanour was as if it was merely a hiccup. You weren’t lying when you said I didn’t know half of it 😂
@DavidC-rt3or5 ай бұрын
After having setup somewhat of a test PBS server and backing up the nodes of the cluster, trying to find the steps of how to do a restore of a node that is in a cluster and has ceph.. just to make sure all of the needed information was backed up and how to restore (ahead of time :) ) Ideas?
@Copernicus226 ай бұрын
Hi, very impressive work! are those ceph benchmark speeds normal though? I was expecting more given 25gbit/NVMe?
@Jims-Garage6 ай бұрын
Normal for consumer devices. Ceph isn't about performance, it's about reliability. It's perfectly fine from my experience. Anything super heavy you want local.
@Copernicus226 ай бұрын
@@Jims-Garage ok thanks, yeah I did it once years ago, I think I had stimular results with ceph using microk8s.
@Eli-q5z9h27 күн бұрын
in the system file /etc/hosts, I put the ip addresses of the public network or the ceph network?
@janstasik90942 ай бұрын
Hello, may i ask you about stability of ms-01 from time you've deployed th4 and ceph? I've ordered boxes but meanwhile i've read horrible stories about ms-01, how hard is to deploy vPRO, proxmox installation is nightmare, bios upgrade and microcode deployment nearly unrealistic, how impossible is to configure and run TH4 ports and overal ceph and box stability is nightmare, every 3 days to reboot etc..what is your real life experience? Is it worth to buy em? From my side, the best hardware for homelab. Thank you.
@Jims-Garage2 ай бұрын
I haven't had a single issue since buying about 3 months ago. They've been on all that time, are on stock bios and are running ceph via TB4. Proxmox installation is the same as any other device. I don't vpro as I don't have a need to but I've heard it's a nightmare. Only issue I had was to disable ASPM in the BIOS.
@janstasik90942 ай бұрын
@@Jims-Garage Thanks...
@JonatanCastro6 ай бұрын
This is amazing, I just got the MS-01 to create some content for my channel, but definitely would love to have the needed hardware to do a CEPH setup. Anyway, I digress; just want to ask you how quick it is to move a CT, considering you can't live migrate them, but on the other hand, the storage is already shared!
@kienanvella6 ай бұрын
You can absolutely run with spinning disks with ceph, but you need quite a few of them, and definitely want some SSD DB/WAL devices. I'm running a cluster of 4 nodes, with 24 spinning disks, 6 per node. 3:1 OSD to DB/WAL drive ratio (3 OSDs share one DB/WAL SSD). Having said that, it's not stupendously fast - especially for my write-heavy workload, but it's fast 'enough'. I've got about 35 guests, which includes a Zabbix server with DB, 3x elasticsearch, and a graylog system. It was quite affordable however, buying used drives in bulk.
@Jims-Garage6 ай бұрын
That's awesome, thanks for sharing. I'll do some more testing.
@monish05m6 ай бұрын
May i ask for a video on how to set up that virtual nic you have running on you opnsense. Thanks and really loved your video.
@simuman6 ай бұрын
Hey jim, really like your videos. I tried this a few months back and not sure if I got this ceph system wrong or not, but couldn't get it to work with a connected external NAS storage through mapped CIFS mount as the HA did not recognize the IP address for media for plex on fail over. Do you know if this is possible or have I got the wrong end of the stick about HA and how it works?
@Irish20866 ай бұрын
I have been looking for this answer for a while... How would one figure out the right number for a 5-7-9 nodes CEPH configuration... I just foun information about a 3 nodes config
@headlibrarian19966 ай бұрын
I like 5 more than 3, but 5 MS-01s is fairly pricey and you can’t do a full-mesh thunderbolt network with 5. With five shutting down a node for maintenance doesn’t completely degrade the cluster and erasure coding works better with more nodes. A 5-node Qotom cluster is interesting because they have 2 SFP+ 10G ports, but I don’t know how well it would actually perform. You could have one set of SFP interfaces on a dumb switch for the private backhaul network, and you need 5 ports on your main switch for the public facing interfaces.
@lsimsdj2 ай бұрын
My mini pcs have one 512GB NVME SDD each... This will not work? Does it mean I need to buy one additional NVME SSD for each mini pc in the cluster?
@Jims-Garage2 ай бұрын
Correct, CEPH requires a dedicated drive.
Ай бұрын
@9:33 Try to _ALWAYS_ have a serial console. That never fails.
@RoiskiaFilms6 ай бұрын
I just noticed that naming scheme and i am confused. Failbaddon the Harmless and then the two primarchs? Anyway, great video. Looking forward to try this myself in the future.
@Jims-Garage6 ай бұрын
Thanks 👍 Cadia stands (oh wait!) 😲
@orgind77786 ай бұрын
Thanks great video
@Jims-Garage6 ай бұрын
Glad you enjoyed it
@voldllc96216 ай бұрын
I did not see you creating a shared storage for vm and ct disks. Cephfs cannot host these because that gives you posix file storage only, not block storage. You need RADOS block storage.
@Jims-Garage6 ай бұрын
Thanks, as mentioned that was in the previous video.
@voldllc96216 ай бұрын
Sorry, i missed that, probably since i saw you installing Ceph from scratch,and after creating a replicated pool, going straight to Cephfs for ISO and CT template file storage. ISO and CT template are not crucial for HA.
@DavidC-rt3or6 ай бұрын
In my setup I've got one crush rule and pool setup for ssd's for the vm disk and another with hdd's for data virtual disk of the vms. Not a high volume/performance need
@snowballeffects4 ай бұрын
SO... that lock out problem when you pass through the GPU - I have a standby PCI (yup PCI 😂) GPU that I popped into that previously annoyingly unused slot - leaving the original gpu in place. plug in the SVGA monitor 😂 and boom - hello cli 😅
@Jims-Garage4 ай бұрын
@@snowballeffects nice, that's a good failsafe!
@cberthe0676 ай бұрын
There is no Erasure Coding in Crush Rule ?
@Jims-Garage6 ай бұрын
It's a trade off from my understanding. Erasure coding ensures better replication (data loss prevention) but impacts on performance. As I always abstract my data I'm less worried about it as a long term storage mechanism (more for failover capability).
@BenjaminBenStein6 ай бұрын
🎉
@MelroyvandenBerg5 ай бұрын
is covid back again in the country? blehh.
@Jims-Garage5 ай бұрын
@@MelroyvandenBerg yeah, I think there has been a summer spike
@dazealex5 ай бұрын
@@Jims-Garage Even here in California.
@mridulranjan10695 ай бұрын
You didn't show or guide through the setup of anything, just talked, showed your face and a couple of screenshots. Seriously man, what CRAP!
@Jims-Garage5 ай бұрын
@@mridulranjan1069 did you ensure that your monitor was on and that the sound wasn't muted?
@randallsalyer3 ай бұрын
the fix for your ipv4 is now in the setup documentation , you have it after your source line, just fyi hope you see this also add this is as the last line to the interfaces file unless there is a sources file in which case put it immeditately before the sources lines (or delete the sources line) /etc/network/interfaces # This must be the last line in the file unless there is a sources line in which case put this immediately above the sources line (or delete the sources line) post-up /usr/bin/systemctl restart frr.service