Nice work! Thanks for making this easy! I need to try it out someday!
@Jims-Garage5 ай бұрын
Thanks, Tim. I'm finding it particularly useful for K3S Servers and my firewall. Having the VMs failover automatically means there's no disruption to the cluster, no pulling pods etc.
@Layer2Clouds2 ай бұрын
Great Video - we support Hosted Proxmox clusters in the US and your guides are a go to for our clients! Thank you Jim.
@Jims-Garage2 ай бұрын
@@Layer2Clouds wow, thanks for sharing. That's great to hear.
@ewenchan12395 ай бұрын
1) You don't TECHNICALLY need a separate drive, you just need a separate PARTITION that Ceph can take over and have full control over. For example, in my OASLOA Mini PC (N95, 16 GB, 512 GB NVMe 2242 M.2 SSD), I partitioned the 512 GB NVMe SSD on each of my 3 nodes such that 128 GB is given for the Proxmox install, and the local-lvm, and then the rest is a separate partition that is given to Ceph to have dominion over. (My OASLOA Mini PC doesn't HAVE another slot where I can add additional storage devices, so I had to make do with what it has.) Once you have it partitioned like that, you can proceed with putting the 3 nodes into a Proxmox HA cluster, per usual, and you can then set up the Ceph cluster as well, also via the Proxmox GUI to perform the initial install, and also to set up your first monitor. 2) re: iGPU passthrough This is why I DON'T recommend you install any VMs/CTs until the infrastructure has been set up to be what you want it to be. Set up the clustering and Ceph first, THEN set up your VMs/CTs. That way, the IOMMU groups will stabilise, such that it will be USABLE for what you're trying to do with it before deploying VMs/CTs/services.
@Jims-Garage5 ай бұрын
Thanks for the tips, I'll consider that on the next deployment.
@ewenchan12395 ай бұрын
@@Jims-Garage No problem. In my case, because my storage was dependent on the Ceph RBD/Ceph FS being up and running, before I can store the VM/CT disks, so; that meant that the clustering and Ceph had to be up and running first before I could do anything else. I know that you are storing the VM/CT disks on local storage, rather than storing it on the Ceph storage system, so you were able to start installing VMs/CTs before your Ceph system was set up.
@0xKruzr5 ай бұрын
yeah, but you don't want to write-exhaust the device if it's also booting the node.
@ewenchan12395 ай бұрын
@@0xKruzr Depends on how much traffic you're putting on the system/cluster. For my case, my 3-node HA Proxmox cluster running Ceph exists only to serve Windows AD DC, DNS, and AdGuard Home. So none of that is intensive. The monthly backups is probably more write intensive than anything else that happens for the rest of the month. (My N95 Mini PC, with only 16 GB of RAM, is too slow to really do much of anything else.)
@MrNGm3 ай бұрын
In the constrained setup ewanchan1239 describes, using a separate partition on a single drive may be acceptable. Readers with other setups and/or reliability wishes should take into account that Ceph's reliability stems from (among others) being able to spread out data chunks to a larger number of OSDs (object storage daemons), such that unavailability of 1, 2, or 10 OSD's doesn't impact the cluster. The latter depends on the configured rules regarding failure domains (further reading in the Ceph documentation: CRUSH maps). I would always advise reading a bit more on Ceph, its architecture on a high level, and the failure modes. In setup ewenchan1239 describes (3-replicated Ceph with Proxmox), the cluster will become unavailable if you're, for example, performing maintenance on 1 host, and the disk of another one fails. Nevertheless, having a setup where VM data is accessible on all hypervisors through shared (network) storage, maintenance on a single hypervisor becomes a lot more simple.
@muhammadabidsaleem70485 ай бұрын
Thank You Jim Keep posting new videos specially on SDN please
@davidbuchaca3 ай бұрын
Very nice and detailed tutorial! abbadon, sanguinius, dorn, proposing names for the following nodes: lion, khan, corax
@Jims-Garage3 ай бұрын
@@davidbuchaca awesome! Sage choices too!
@Chris-rm1pn5 ай бұрын
MS-01s also have vPro which supports Serial over Lan, so if you lock yourself out and don't have GPU used by host you can use that to fix issues
@Jims-Garage5 ай бұрын
Thanks, I'm still to get that working. It's quite buggy from my limited trialling.
@Chris-rm1pn5 ай бұрын
@@Jims-Garage I recommend using meshcentral and their guides if you haven't tried it's the best working solution I found so far
@Andy-fd5fg5 ай бұрын
Long live the serial port! Tis a shame they don't have physical 9 pin serial connector
@cschwartz5 ай бұрын
@@Jims-Garageagreed. The implementation unfortunately is lacking and quirky. I loaded the meshcommander firmware on it to get web based kvm without needing meshcommander sw running on client or hosted app. However even that had quirks but enhanced functionality. I ended up giving up and going lacp with the 2.5 ports and reverted back to a trusty raritan ipkvm and a usb tty console. I never could get the wol aspect of it working and had to be in a booted state for it to function.
@cschwartz5 ай бұрын
@@Andy-fd5fgtty to usb…. No need for a db9
@johnwalshaw5 ай бұрын
I opted for 3x Nextorage NEM-PA2TB for 2GB DDR4 SDRAM. Very happy so far. It's great having a 3 node CEPH cluster.
@Jims-Garage5 ай бұрын
That's great, sounds like a solid setup.
@DS-ou7xm5 ай бұрын
Its Ok, Mate nothing wrong with having Cold and Flu symptoms..... And awesome video ... thanks
@Insightfill5 ай бұрын
Oh! I've been looking forward to this one!
@Jims-Garage5 ай бұрын
Hope you like it!
@NickS34252Ай бұрын
Excellent video - I've been following along while tinkering with my own cluster. When it comes to fast nodes like the MS-01, it's a bit tricky to figure out what to put into ceph vs local storage given the performance limitations.
@Jims-GarageАй бұрын
@@NickS34252 thanks. I totally agree! I'm often scratching my head thinking which should I use.
@nadtz5 ай бұрын
If I hadn't already built a new proxmox host before the MS01 came out I might have gone this route (though with dedicated hardware for opnsense), it's kind of crazy what minisforum was able to pack into the MS01 for the price and that ceph + proxmox HA is available for home users for free.
@Jims-Garage5 ай бұрын
I agree. There are quirks but it's impressive.
@Carlos-Rodrigues3 күн бұрын
I was waiting for this machine for so many years. Now I have 4 of MS-01. 3 for the cluster and another just for OPNSense. It's fast. It's stable. It's amazing. I just wonder if I can create a network with the MS-A1 through Thunderbolt so I can use it as a backup server with PBS.
@zxxz-ob7ll3 ай бұрын
The grim reality of the universe requires a grim order. The machine requires perfection. Any error can become a catastrophe
@Jims-Garage3 ай бұрын
Prophetic
@hyperprotagonist5 ай бұрын
He’s only gone and bloody done it 👏
@Jims-Garage5 ай бұрын
Haha, thanks. A lot of late nights behind this one for something that on the surface is quite straightforward!
@hyperprotagonist5 ай бұрын
@@Jims-Garagekudos for persevering. On twitter you highlighted the setbacks, on discord you kept everyone reassured, and in the video your demeanour was as if it was merely a hiccup. You weren’t lying when you said I didn’t know half of it 😂
@MarkConstable5 ай бұрын
I'm pretty sure if you used the gui to set up Ceph you would have had less problems. I've done it a number of times and did not have to use the cli at all.
@Jims-Garage5 ай бұрын
The cli is necessary for the backhaul network. if it was simply the vmbr0 route then you're right, GUI would be a good choice.
@rodneykahane49942 ай бұрын
not sure what the performance implications are, but the nvme osds that were created were classified as ssd. in the advanced tab, you can manually select the drive type (hdd,ssd, or nvme).
@Jims-Garage2 ай бұрын
@@rodneykahane4994 thanks, let me check that!
@cschwartz5 ай бұрын
If you are going to continue to do iGPU passthrough, have you thought of passing a TTY console via USB to serial, that way you can connect up should HW change and pve wants to move around your NIC naming.
@Jims-Garage5 ай бұрын
Good idea, I'll look into that. Thanks
@johnvandenhurk86505 күн бұрын
First of all, I love your videos and have watched many of them. I have had a similar CEPH configuration on MSI Cubi Proxmox cluster using Samsung 990 Pro NVME SSD's. I was pretty happy with this until I noticed that less then six months in the SMART monitoring is failing on two of VNME's. Wearout for the three 990 Pro's, are (150% ,255%, 6%). On the proxmox forum I'm told that this is due to consumer grade SSD's. The 255% is from the node that does the most IO, but my no means these are heavily loaded systems. i wonder what your experience is so far on wearout because of Ceph?
@Jims-Garage5 күн бұрын
@@johnvandenhurk8650 thanks. It does chew through consumer SSDs. Mine is on about 40%, I think it's good for about 4 years in total.
@johnvandenhurk86505 күн бұрын
@@Jims-Garage Thanks for the swift response! perhaps it is only mine that have an issue, but mine are failing within a year. I will reach out to my vendor and create a ticket. I hope yours are better! How happy are you with your MS-01's? I'm considering an upgrade to an MS01 (i9-12900) cluster for the SFP+
@orgind77785 ай бұрын
Thanks great video
@Jims-Garage5 ай бұрын
Glad you enjoyed it
@majoryoshi5 ай бұрын
I could be mistaken on this, but in regards to your HA OPNsense is there any reason why you couldn't your WAN in to a switch (even an unmanaged would do the trick) and plug whatever port your WAN ports on your notes into said switch? Since you're doing HA through Proxmox/Ceph and not through OPNsense, I see no reason why that wouldn't work. Please correct me if I'm wrong though.
@Jims-Garage5 ай бұрын
That's what I'm going to try.
@dimitristsoutsouras27124 ай бұрын
Nice presentation of the procedure and your special case scenario as well. At the part where you created a cephfs (after you created individual ceph managers), where does that fs created on? The same1Tb nvme storage? If yes shouldn t it have some kind of partition seperation between VMs storage and ISOs or those object storage services arrange that automatixally (where goes what).
@jgarfield2 ай бұрын
How would one tap into this Ceph cluster from a Kubernetes cluster running on VMs in the HA Proxmox cluster?
@Jims-Garage2 ай бұрын
@@jgarfield you'd simply select the storage volume on the ceph as the storage volume for the VM. You can see that in my OPNSense video afterwards whereby the OPNSense uses the ceph storage to make it HA with a single node.
@sku20075 ай бұрын
there's some pcie passthrough translation in pve8. meaning you can set the hw for each node and in the vm the "friendly name" (don't know their wording right now, it's in datacenter somewhere)
@Jims-Garage5 ай бұрын
Thanks, wasn't aware of that. I'll take a look
@sku20075 ай бұрын
it's called resource mappings, right below metric server
@Jims-Garage5 ай бұрын
@@sku2007 thanks, I took a look just now and the i226-v isn't on the node. Very odd!
@sku20075 ай бұрын
@@Jims-Garage very odd! even when forwarded, the HW gets listed with lspci in host shell. with lspci -v you'll see a line with Kernel driver in use: vfio-pci
@Jims-Garage5 ай бұрын
@@sku2007 I've tried all of those to no avail. I'm going to load a live Linux installation. If I don't see it I'll rma
@monish05m5 ай бұрын
May i ask for a video on how to set up that virtual nic you have running on you opnsense. Thanks and really loved your video.
@fbifido25 ай бұрын
@4:33 - the thunderbolt backhaul does not show up as a network bridge inside Proxmox ???
@Jims-Garage5 ай бұрын
Eno5 and eno6 are the thunderbolt adapters. You could create a bridge if you wanted.
@jeffersonsantos46035 ай бұрын
Great job, man. Do you have full network performance for Opnsense via the VirtIO bridges?
@Jims-Garage5 ай бұрын
Yeah, it maxes out 10Gb via iperf3 and full 2Gb up/down via speedtest.net
@romseaaccthree14485 ай бұрын
@@Jims-Garage i'm assuming this is for the same VLAN iperf test. Would you also be able to test iperf results for inter VLAN traffic?
@vonwerderc5 ай бұрын
Very interesting. I'm curious how HA with OPNsense would work. Wouldn't the WAN connection from your Modem only go into one node? If that one dies, how would the other nodes be connected?
@Jims-Garage5 ай бұрын
The WAN connection goes into a switch that splits the internet to the nodes via a vLAN. They are all members.
@headlibrarian19965 ай бұрын
How does routing work then? Only one member of the cluster should get the traffic and the switch wouldn’t know which one that is.
@Jims-Garage5 ай бұрын
@@headlibrarian1996 well there's only one firewall at a time.
@snowballeffects3 ай бұрын
SO... that lock out problem when you pass through the GPU - I have a standby PCI (yup PCI 😂) GPU that I popped into that previously annoyingly unused slot - leaving the original gpu in place. plug in the SVGA monitor 😂 and boom - hello cli 😅
@Jims-Garage3 ай бұрын
@@snowballeffects nice, that's a good failsafe!
@JonatanCastro5 ай бұрын
This is amazing, I just got the MS-01 to create some content for my channel, but definitely would love to have the needed hardware to do a CEPH setup. Anyway, I digress; just want to ask you how quick it is to move a CT, considering you can't live migrate them, but on the other hand, the storage is already shared!
@janstasik9094Ай бұрын
Hello, may i ask you about stability of ms-01 from time you've deployed th4 and ceph? I've ordered boxes but meanwhile i've read horrible stories about ms-01, how hard is to deploy vPRO, proxmox installation is nightmare, bios upgrade and microcode deployment nearly unrealistic, how impossible is to configure and run TH4 ports and overal ceph and box stability is nightmare, every 3 days to reboot etc..what is your real life experience? Is it worth to buy em? From my side, the best hardware for homelab. Thank you.
@Jims-GarageАй бұрын
I haven't had a single issue since buying about 3 months ago. They've been on all that time, are on stock bios and are running ceph via TB4. Proxmox installation is the same as any other device. I don't vpro as I don't have a need to but I've heard it's a nightmare. Only issue I had was to disable ASPM in the BIOS.
@janstasik9094Ай бұрын
@@Jims-Garage Thanks...
@simuman5 ай бұрын
Hey jim, really like your videos. I tried this a few months back and not sure if I got this ceph system wrong or not, but couldn't get it to work with a connected external NAS storage through mapped CIFS mount as the HA did not recognize the IP address for media for plex on fail over. Do you know if this is possible or have I got the wrong end of the stick about HA and how it works?
@DavidC-rt3or4 ай бұрын
After having setup somewhat of a test PBS server and backing up the nodes of the cluster, trying to find the steps of how to do a restore of a node that is in a cluster and has ceph.. just to make sure all of the needed information was backed up and how to restore (ahead of time :) ) Ideas?
22 күн бұрын
@9:33 Try to _ALWAYS_ have a serial console. That never fails.
@Copernicus225 ай бұрын
Hi, very impressive work! are those ceph benchmark speeds normal though? I was expecting more given 25gbit/NVMe?
@Jims-Garage5 ай бұрын
Normal for consumer devices. Ceph isn't about performance, it's about reliability. It's perfectly fine from my experience. Anything super heavy you want local.
@Copernicus225 ай бұрын
@@Jims-Garage ok thanks, yeah I did it once years ago, I think I had stimular results with ceph using microk8s.
@RoiskiaFilms5 ай бұрын
I just noticed that naming scheme and i am confused. Failbaddon the Harmless and then the two primarchs? Anyway, great video. Looking forward to try this myself in the future.
@Jims-Garage5 ай бұрын
Thanks 👍 Cadia stands (oh wait!) 😲
@kienanvella5 ай бұрын
You can absolutely run with spinning disks with ceph, but you need quite a few of them, and definitely want some SSD DB/WAL devices. I'm running a cluster of 4 nodes, with 24 spinning disks, 6 per node. 3:1 OSD to DB/WAL drive ratio (3 OSDs share one DB/WAL SSD). Having said that, it's not stupendously fast - especially for my write-heavy workload, but it's fast 'enough'. I've got about 35 guests, which includes a Zabbix server with DB, 3x elasticsearch, and a graylog system. It was quite affordable however, buying used drives in bulk.
@Jims-Garage5 ай бұрын
That's awesome, thanks for sharing. I'll do some more testing.
@Irish20865 ай бұрын
I have been looking for this answer for a while... How would one figure out the right number for a 5-7-9 nodes CEPH configuration... I just foun information about a 3 nodes config
@headlibrarian19965 ай бұрын
I like 5 more than 3, but 5 MS-01s is fairly pricey and you can’t do a full-mesh thunderbolt network with 5. With five shutting down a node for maintenance doesn’t completely degrade the cluster and erasure coding works better with more nodes. A 5-node Qotom cluster is interesting because they have 2 SFP+ 10G ports, but I don’t know how well it would actually perform. You could have one set of SFP interfaces on a dumb switch for the private backhaul network, and you need 5 ports on your main switch for the public facing interfaces.
@lsimsdjАй бұрын
My mini pcs have one 512GB NVME SDD each... This will not work? Does it mean I need to buy one additional NVME SSD for each mini pc in the cluster?
@Jims-GarageАй бұрын
Correct, CEPH requires a dedicated drive.
@BenjaminBenStein5 ай бұрын
🎉
@voldllc96215 ай бұрын
I did not see you creating a shared storage for vm and ct disks. Cephfs cannot host these because that gives you posix file storage only, not block storage. You need RADOS block storage.
@Jims-Garage5 ай бұрын
Thanks, as mentioned that was in the previous video.
@voldllc96215 ай бұрын
Sorry, i missed that, probably since i saw you installing Ceph from scratch,and after creating a replicated pool, going straight to Cephfs for ISO and CT template file storage. ISO and CT template are not crucial for HA.
@DavidC-rt3or5 ай бұрын
In my setup I've got one crush rule and pool setup for ssd's for the vm disk and another with hdd's for data virtual disk of the vms. Not a high volume/performance need
@cberthe0675 ай бұрын
There is no Erasure Coding in Crush Rule ?
@Jims-Garage5 ай бұрын
It's a trade off from my understanding. Erasure coding ensures better replication (data loss prevention) but impacts on performance. As I always abstract my data I'm less worried about it as a long term storage mechanism (more for failover capability).
@MelroyvandenBerg4 ай бұрын
is covid back again in the country? blehh.
@Jims-Garage4 ай бұрын
@@MelroyvandenBerg yeah, I think there has been a summer spike
@dazealex4 ай бұрын
@@Jims-Garage Even here in California.
@mridulranjan10694 ай бұрын
You didn't show or guide through the setup of anything, just talked, showed your face and a couple of screenshots. Seriously man, what CRAP!
@Jims-Garage4 ай бұрын
@@mridulranjan1069 did you ensure that your monitor was on and that the sound wasn't muted?
@randallsalyer2 ай бұрын
the fix for your ipv4 is now in the setup documentation , you have it after your source line, just fyi hope you see this also add this is as the last line to the interfaces file unless there is a sources file in which case put it immeditately before the sources lines (or delete the sources line) /etc/network/interfaces # This must be the last line in the file unless there is a sources line in which case put this immediately above the sources line (or delete the sources line) post-up /usr/bin/systemctl restart frr.service