In this week's video I expound upon the joys of being on call and what can happen when you do get called.
Пікірлер: 71
@PE4Doers4 ай бұрын
This video brings back memories of my early days in computer field engineering. It's been over 49 years, and now I will be retiring one month from today as a computer security compliance officer for a State agency. 😊
@NetworkAdminLife4 ай бұрын
Happy retirement! God bless!
@caelan53014 ай бұрын
CCNA in training here. Absolutely loving the content, it's really interesting to see what this job looks like in the real world and how troubleshooting actually works.
@NetworkAdminLife4 ай бұрын
Don't know if I'm the best model for troubleshooting. But it's how *I* do things. You may find a better way. God bless!
@TheDexterFishbourne4 ай бұрын
Just did the same, took position in Jan 2024, was told that the whole room UPS was bad and had been bypassed. Building took a lightning strike last night and somehow tripped the transfer switch over to the non-functional UPS and killed the entire server room. Was troubleshooting power from 12am till 130am, could not find hidden xfer switch so ran extension cords out of the server room to multiple outlets. Then finally got the electrician out at 9am and had him completely remove any physical connection to the old UPS so it won't happen again. The little things you find when cleaning up after others.
@NetworkAdminLife4 ай бұрын
Heh, yep. I call them easter eggs. Those little surprises that someone left you without any disclosure. Our last server room outage was caused by a raccoon that short-circuited our main power feed. UPS bank kicked in like it was supposed to. But the batteries were more than 10 years old so it only lasted a few hours. Now suddenly we've got budget to replace the entire UPS. And no, the raccoon did not survive the encounter and in fact started a small brush fire when it's flaming carcass landed in the weeds below the power poles. Yep. Network Admin Life. God bless!
@nightwing09x4 ай бұрын
Glad you got a solid workaround in place, it’s possible you’ll need a replacement, if they can’t figure it out. Hope you get some good rest, you deserve it. Love your content!
@NetworkAdminLife4 ай бұрын
Yeah, they RMA'd the switch. However, we may have to RMA the RMA. Upcoming video. God bless!
@Capriceii4 ай бұрын
Right there with you brother. Go weeks even months, and then boom you are up for 3 days in a row. But that is what we signs up for. You’re making me reconsidered Extreme. Keep up the great work!
@NetworkAdminLife4 ай бұрын
99% of the time, it just works. And then it doesn't. God bless!
@Obsessedwithcoding4 ай бұрын
love these vids man, keep it up.
@NetworkAdminLife4 ай бұрын
Glad you like them! I'll keep them coming. God bless!
@MarkBurfeind4 ай бұрын
It must be something in the air! I got a call last Monday at 2:58AM about our network being offline. Which also caused the building chillers to stop functioning. In my scenario, the Extreme NAC took a dump, but finally started working again at 7:30am. 🤦♂️ I feel like Extreme products are more iffy when it comes to reliability compared to HPE/Aruba.
@NetworkAdminLife4 ай бұрын
Yeah, those NACs are strange beasts at times. Ours (we have two) will periodically stop talking. Luckily not both at once. God bless!
@xXSilentSniper4 ай бұрын
I just started a new role as a Sys Admin for an automotive part manufacturer and they mentioned on call as part of my duties. So that should be fun haha.
@NetworkAdminLife4 ай бұрын
It can be. Hopefully it's just for easy stuff that they call you. God bless!
@daniellauck9565Ай бұрын
I love when things happen for "for some reason". Today I spent 4 hours troubleshooting a VoIP problem that came from nothing and went away without explanation!!!😢
@NetworkAdminLifeАй бұрын
Oh yeah, aren't those fun? I've had that happen on my network and my pickup truck. God bless!
@ShepardComN74 ай бұрын
Yeah, on call you never know when. And they call you for some big problems usually. Since you work in a hospital, no downtime. Keep the videos coming!
@NetworkAdminLife4 ай бұрын
Yep, you just never know. God bless!
@shmew224 ай бұрын
As someone who's been in the extreme world for years. That one switch, even though it may seem good, will ruin your day. I'd be willing to bet if you remove that slot and replace it, your headache will alleviate.
@NetworkAdminLife4 ай бұрын
You, sir, would win that bet. More videos coming on this subject! God bless.
@technicalthug4 ай бұрын
I noticed what appears to be Door control systems in the background of your video (black boxes on the wall behind you, Labeled DSX) with the doors left open. I'm glad to know it's not just our Hospital in Australia that suffers from messy Security installers. Do you suffer from them leaving spare parts like DC Power supplies on the floor and Lead-Acid batteries and not putting the doors back on? Not the tidiest of folks.
@NetworkAdminLife4 ай бұрын
Good eye. Those are managed by our engineering group and they are the ones that leave it messy. :-) Only rivaled by our security camera hardware. But somehow it all keeps doing it's thing. God bless!
@Jamesaepp4 ай бұрын
Genuine question. Was this incident worthy of an on call response? As you mentioned, it sounds like your 911 centre already has procedures to compensate for a network outage and the network was designed in such a way to not impact all the operators in the event of an issue. In my books (non medical environments), that is a "system is degraded but not down, I'll take a look in the morning, call me back when it is a total outage" scenario. Same way if a drive fails in a RAID array. I'm not coming in at 12AM to swap disks and watch the resilver. That can wait until normal hours.
@NetworkAdminLife4 ай бұрын
Once I found out what the true problem was, I'd say no. It could have waited. Except that was was reported to the Help Desk was that ALL computers were running slow and ALL phones were rebooting over and over. When I asked the HD to double check they got the same report. When I went in, it turned out to be only a small subet of users in the Emergency Department. Stupid user tricks. However, ER is just one of those places you don't want to take chances on affecting patient care so I went in. God bless!
@esra_erimez4 ай бұрын
Oh my God, my dad completely sympathizes!
@NetworkAdminLife4 ай бұрын
Tell your dad thank you! And God bless!
@212helpdesk4 ай бұрын
I hope you get overtime or comp days.
@NetworkAdminLife4 ай бұрын
We do get paid overtime if we come in. Also night pay if it's after 8pm or before 5 am. God bless!
@samjones43274 ай бұрын
Great morning 2 u brother! Grace & Peace 2 U and your family! This one was just "One of those days" type of issue. Thanx 4 sharing this one because you don't oftern hear about Kernel Panic and things running out of memory when it comes to Linux Windows maybe LOL. Anyways I'm glad things were resolved once again with your expertise. I hope your nap was superb! God Bless U and keep you! Always praying 4 you and our brothers and sisters in Christ and Tech🙏🏽
@NetworkAdminLife4 ай бұрын
Thank you brother Sam. Yeah, this was a weird one and had Extreme scratching their heads. They still are. It's looking like we're going to have to RMA the RMA! God bless!
@DelticEngine4 ай бұрын
For me, it would depend on how old the equipment is and how hot it runs as to where I would start looking. Without knowing anything about a particular installation, for any device that gets a 'ghost' in it I suspect the power supply or a power fault. From my own experience, a failing power supply is like electronic cancer which as long as it goes undiagnosed slowly spreads through the system. Most computerised errors are relatively straightforward, in that they are a logical diagnosis and solution. Power supply issues cause strange and seemingly random errors because the dirty power corrupts internal data and what's worse is that a failing power supply starts affecting and damaging the electronics downstream. A power supply problem would also explain why the internal diagnostic appeared to 'pass'. If the power supply is a module and I didn't have the specialised equipment to test it then I'd automatically replace it, or at least substitute another one. Has the device had any firmware updates recently before the problem started? It could be bad firmware including a faulty download or a bad flash. Sometimes some devices don't take well to a firmware flash and it can be hard to get the new firmware to install properly or be accepted by the device. Failing internal storage can also cause problems. I don't know how much you can do with those switches, but if it's possible I'd try completely wiping the device before reflashing a firmware, let it set itself up from there and then set up any custom configuration. Personally, I build and repair systems, including servers, as well as having a background in electronics. I've repaired a number of systems where the client couldn't see why they should pay more for 'just a power supply' and have the same lower quality part installed. Then they wonder why they have weird problems and issues down the line. Where possible, I fit the best quality power supply I can because I know what happens and how they fail. Finally, are there any internal links or jumpers that are field configurable? I mention it because ESD damage is a real issue these days with so few people not wearing or taking any ESD precautions because 'they've never had a problem'. Except that's not how ESD damage works, it's not a case of 'it either works or does not'. There is often a gradual failure over time often with intermittent issues or everything is fine until some part of a component that was damaged at some point finally fails and nobody has any idea.
@NetworkAdminLife4 ай бұрын
For these switches there are no user serviceable parts inside. The power supplies, fan module, and stacking module are the only parts an end user can remove and replace. In an upcoming video I actually do find out what caused this failure and it is surprising! God bless!
@SB-qm5wg4 ай бұрын
You got any auto-healthchecks on those switches for ram/uptime? I have never seen OOM on a switch before. SFPs being a brat after power-up is semi-common in my exp. We kept spares right in the cage even though we weren't supposed to. No rhyme or reason. One that didn't work last time would work the new time. Glad you figured it out.
@NetworkAdminLife4 ай бұрын
Somehow this failure caused a loop because the next day I saw some links on our core disabled due to SLPP. Not sure what happened. New switch is in place and all seems well now. God bless!
@pierren29604 ай бұрын
Would like to know what the issue was. RMA'ing the box is always the last resort; done that in the past after hours of collecting logs and troubleshooting. Always PITA. And hope you got some rest in the meantime. Cheers!
@NetworkAdminLife4 ай бұрын
Yeah, most of the night was spent gathering logs and waiting for engineering to take a look. They are still trying to figure it out and it also looks like we might have to RMA the RMA! God bless.
@alexanderg91064 ай бұрын
Year I have for this cases alway a spare and a backup of the config. Some times there are devices that have got to the point where you don't trust them and with the config backup a repelace is done quick. And then I talk to the engineer of the vendor. Just send a new one and you can play with this unit as long as you wish. :)
@keithsauer35744 ай бұрын
Boy thats the truth, you can be on call for weeks and weeks and its really quiet. Then something happens and its a doozy. Your 100% on support. We have both vendors in our environment and Extreme's support has always been great, while Palo Alto's support is just very slow and sometimes absent? Hope you patched your palos for that Global Protect CVE - or maybe were lucky enough to be on 10.1.x and earlier where it didn't apply.
@NetworkAdminLife4 ай бұрын
Ha! I told the Extreme tech that it could be worse, I could be talking to Palo Alto support. We're on 10.1.11 right now. The only thing we have to do now is upgrade for that cloud certificate thingy. God bless!
@alexanderg91064 ай бұрын
Hope you get some rest. I remember this days when you leave and the rest comes in while you where there all night. One suggestion, since this is a stack you should NEVER place both uplinks on the same switch. I usually use the master and the secondary. Since it is a stack you can still use lacp for the Uplink. In that case if you lose one switch in teh stack not the hole stack is down.
@NetworkAdminLife4 ай бұрын
That's a good idea! So good it's already implemented. :-) We have 4 uplinks per stack. Two on the top switch in the stack, and two on the bottom switch in the stack. This is why the rest of the stack didn't go down, the other two uplinks work working just fine. God bless!
@alexanderg91064 ай бұрын
@@NetworkAdminLife thanks for filling in. In the video I got the impression there have only been 2. Hope someone else can still learn from the idear / discussion.
@grandtoasty662984 ай бұрын
Besides Intermapper for basics, do you have any form of assurance platform in place? Something that is collecting and establishing KPIs for the network's overall health? Nine times out of ten, having an assurance platform e.g., Cisco DNA/Catalyst Center in the case of Cisco, can be instrumental in seeing trend deviations that you may not otherwise see until something catastrophic e.g., network down, occurs. If this was a STP problem, do you have the switches configured for error disable, so that the faulty/broken switch would be automatically isolated?
@NetworkAdminLife4 ай бұрын
We have XIQ Site Engine but just the basics are enabled. We don't have the staff to set up a lot of these really fancy monitoring systems. I'm the only guy here for the network. God bless!
@paulierco4 ай бұрын
You don't have Local IT? We send Local IT to do all the checks and we stay with them remotely. My company has 150k+ users.
@NetworkAdminLife4 ай бұрын
I am local IT. God bless!
@jttech444 ай бұрын
I'd probably nuke the config and reflash the firmware on a bench, then with nothing plugged into it, check and see if that memory leak is fixed, if it looks good, pass some phony traffic through it and see if it's still fixed, if it still looks good after that, load config, back in the rack it goes, keep a close eye on the memory usage and if the problem comes back, whole unit gets shipped back to EXN and it's their problem. Do you get comp time for this sort of thing?
@NetworkAdminLife4 ай бұрын
Would be an interesting exercise but I have other things going on that take up my time. I'll let Extreme have the fun on this one. I get paid overtime for any after hours support. God bless!
@jttech444 ай бұрын
@@NetworkAdminLife Nice, getting that time and a half, praise God our provider
@joerockhead72464 ай бұрын
get some rest.
@rmo98084 ай бұрын
Definitely take a quick nap before driving anywhere
@NetworkAdminLife4 ай бұрын
I did, thank you! God bless!
@NetworkAdminLife4 ай бұрын
Good advice. But by this time all sleep was evading me so I just drove home. I did talk to some folks on my Ham radio to make sure I stayed awake on the 30 minute drive home. God bless!
@dcoll178024 ай бұрын
did you setup your spanning tree root? and BPDU rootguard ?
@NetworkAdminLife4 ай бұрын
Extreme Professional Services set it all up. I just keep it running. God bless!
@bhaveshkc11714 ай бұрын
Sir how can I be communicate with you
@NetworkAdminLife4 ай бұрын
We're communicating right now. God bless!
@A_Good_Boy.4 ай бұрын
How much do you make as a network administrator? My guess is around 59-60k
@NetworkAdminLife4 ай бұрын
That's not a question we ask in the good ol' USA. I don't disclose my salary. God bless!
@changwang75964 ай бұрын
Has the new guy been put on call yet?
@NetworkAdminLife4 ай бұрын
Well, no. But the new guy is the boss and if he doesn't want to be on call we can't force him... and we can't blame him. God bless!