Got something to say or just want fewer pesky ads? Join us... 😊

[Technology] Data centre outage - Infrastructue IT folk







maffew

Well-known member
Dec 10, 2003
8,861
Worcester England
Sounds bad, thankfully not us. I believe we've got all our services mirrored to another data centre so things like this don't happen.

Yeah not cool. Several sites needing manual table rebuilds, 100's of support calls and obviously a few dead HD's + many client visits today to explain (luckily doesn't affect me right now)
 


Springal

Well-known member
Feb 12, 2005
23,714
GOSBTS
The CEO of UKFAST bloody loves the sound of his own voice and publicity.

SLA worth nothing except some invoice credits which I doubt covers the actual client impact.

No-one should be relying on a 3rd party to provide DR etc for them, any organisation worth their salt would build their own DR plans.

The excuse they've given sounds shonky too.
 


maffew

Well-known member
Dec 10, 2003
8,861
Worcester England
The CEO of UKFAST bloody loves the sound of his own voice and publicity.

SLA worth nothing except some invoice credits which I doubt covers the actual client impact.

No-one should be relying on a 3rd party to provide DR etc for them, any organisation worth their salt would build their own DR plans.

The excuse they've given sounds shonky too.

Well some of the smarter clients did so down time was minimal but these clients have more IT savvy folk, you can go so far with DR, then the next level of DR, then the next, it all costs money as you obviously understand, why he UPS and gennys never kicked in remains to be explained
 




maffew

Well-known member
Dec 10, 2003
8,861
Worcester England
The CEO of UKFAST bloody loves the sound of his own voice and publicity.

SLA worth nothing except some invoice credits which I doubt covers the actual client impact.

No-one should be relying on a 3rd party to provide DR etc for them, any organisation worth their salt would build their own DR plans.

The excuse they've given sounds shonky too.

Dont forget some f these clients may have C30 staff and an IT support person
 


Springal

Well-known member
Feb 12, 2005
23,714
GOSBTS
Well some of the smarter clients did so down time was minimal but these clients have more IT savvy folk, you can go so far with DR, then the next level of DR, then the next, it all costs money as you obviously understand, why he UPS and gennys never kicked in remains to be explained

Still can't see how a 'pick axe' went through a main power line like that. Power lines are protected by meshing to give anyone a warning... can be missed by diggers but no so a 'pick axe'
 


Bry Nylon

Test your smoke alarm
Helpful Moderator
Jul 21, 2003
19,783
Playing snooker
Anyone get affected by this?

https://tamebay.com/2017/12/ukfast-outage-tool-down-100s-of-uk-businesses.html

C500 Clients and C2000 servers down our side, blew 3/4/5 99999s SLAs in a couple of hours

Yeah not cool. Several sites needing manual table rebuilds, 100's of support calls and obviously a few dead HD's + many client visits today to explain (luckily doesn't affect me right now)

Did anyone try switching it off and back on again? :shrug:
 




beorhthelm

A. Virgo, Football Genius
Jul 21, 2003
35,265
SLA worth nothing except some invoice credits which I doubt covers the actual client impact.

likely no more than a months invoice at that.
the trouble with relying on own DR is a) costs if done properly and b) accountability. SLA is worth a lot more to the management in ticking a box and shifting responsibility than the actual up-time or financial aspect.

the multi-redundant systems always seem to fail or expose another weakness when it comes to it, you may as well host servers in a shed.
 


maffew

Well-known member
Dec 10, 2003
8,861
Worcester England
Still can't see how a 'pick axe' went through a main power line like that. Power lines are protected by meshing to give anyone a warning... can be missed by diggers but no so a 'pick axe'

Have to admit power to a DC is not my scope of knowledge, more of a concern is how long it takes to fire up a "redundant" generator. Whilst a lot of business also did seem to have failover plans, they werent implemented due to be told it will be up in x mins
 


Westdene Seagull

aka Cap'n Carl Firecrotch
NSC Patreon
Oct 27, 2003
20,938
The arse end of Hangleton
SLA worth nothing except some invoice credits which I doubt covers the actual client impact.

Problem being that all the CFO's in companies want IT / Telecoms services for the cheapest possible price. They sign the contracts with SLA credits clearly defined and then complain when they get £1.27 as a credit when the outage cost them £50k in business / staff time. I've seen this in many of my roles and still amazes me. If a service outage will cost you that much money then invest a proper amount in redundancy ..... strangely few do.
 




graysgull

New member
Aug 23, 2003
131
Begs the question how they managed to cut mains power to start with as it is standard practice in data centres to have minimum of 2, and preferably 3 diversely routed incoming power supplies from different suppliers to maintain resilience. Then multiple UPS units to support while generators, which should be regularly tested, kick in.

I suspect that some heads will roll as it is obvious that planning was inadequate, and testing appears nonexistant.



Sent from my SM-J510FN using Tapatalk
 


maffew

Well-known member
Dec 10, 2003
8,861
Worcester England
Problem being that all the CFO's in companies want IT / Telecoms services for the cheapest possible price. They sign the contracts with SLA credits clearly defined and then complain when they get £1.27 as a credit when the outage cost them £50k in business / staff time. I've seen this in many of my roles and still amazes me. If a service outage will cost you that much money then invest a proper amount in redundancy ..... strangely few do.

Well it really does depend on the size of the organisation. If an IT dependant company has a CFO then they likely need a CTO who can advise on risk. A lot of businesses affected by this kind of outage may be hosting just a website or indeed forum and for the vast majority that is enough
 


Joey Jo Jo Jr. Shabadoo

Waxing chumps like candles since ‘75
Oct 4, 2003
10,899
We still host everything onsite at the moment so weren't hit. Sounds like someone hadn't been doing proper testing and maintenance. We regularly carry out both on and off load tests of our generator. Although we do have the added issue that a lack of ups/generator would cut off patient oxygen supply a power cut becomes a potentially life threatening incident.

Have to admit power to a DC is not my scope of knowledge, more of a concern is how long it takes to fire up a "redundant" generator. Whilst a lot of business also did seem to have failover plans, they werent implemented due to be told it will be up in x mins

UPS batteries should be capable of holding services until a diesel generator kicks in, the generator should be up and running within 60 seconds or so and powering the entire infrastructure as required. Like I said above it sounds like the hosting company hadn't been testing correctly. This doesn't excuse the companies who use the hosting from having their own DR plans on top of whatever redundancy they expect from their providers.

A previous data centre I worked in had two diesel generators both capable of providing full power for the building for 72 hours before refueling was required, this gave some redundancy should one fail to start, both could be filled while running and were serviced by separate fuel tanks. There was also two separate incoming power supplies, both fed from different sub-stations. The chances of a power outage were heavily reduced but there was still a strict testing and servicing plan in place just in case.
 




RandyWanger

Je suis rôti de boeuf
Mar 14, 2013
5,995
Done a Frexit, now in London
Begs the question how they managed to cut mains power to start with as it is standard practice in data centres to have minimum of 2, and preferably 3 diversely routed incoming power supplies from different suppliers to maintain resilience. Then multiple UPS units to support while generators, which should be regularly tested, kick in.

I suspect that some heads will roll as it is obvious that planning was inadequate, and testing appears nonexistant.



Sent from my SM-J510FN using Tapatalk

Very good point, I'd expect minimum of 2 separate lines in.
 


maffew

Well-known member
Dec 10, 2003
8,861
Worcester England
We still host everything onsite at the moment so weren't hit. Sounds like someone hadn't been doing proper testing and maintenance. We regularly carry out both on and off load tests of our generator. Although we do have the added issue that a lack of ups/generator would cut off patient oxygen supply a power cut becomes a potentially life threatening incident.



UPS batteries should be capable of holding services until a diesel generator kicks in, the generator should be up and running within 60 seconds or so and powering the entire infrastructure as required. Like I said above it sounds like the hosting company hadn't been testing correctly. This doesn't excuse the companies who use the hosting from having their own DR plans on top of whatever redundancy they expect from their providers.

A previous data centre I worked in had two diesel generators both capable of providing full power for the building for 72 hours before refueling was required, this gave some redundancy should one fail to start, both could be filled while running and were serviced by separate fuel tanks. There was also two separate incoming power supplies, both fed from different sub-stations.

You are kinda right but if you are selling a service to SMEs and most years you hit the 5x9 uptime its ridiculous to think that they would pay substantially more for a further level of redundancy
 


Springal

Well-known member
Feb 12, 2005
23,714
GOSBTS
From their own update:

'We prove the start signal on a weekly basis which fires up the generators and the UPS tests itself every day at 8am. We are the only data centre that we are aware of to hold the NICEIC accreditation meaning we are a fully licensed electrical contractor and can manage and maintain our data centres without the need for external contractors.'

Maybe they shouldn't be holding that accreditation :lol:
 


bluenitsuj

Listen to me!!!
Feb 26, 2011
4,305
Willingdon
Yes we got hit with this yesterday. Very frustrating.
 




Publius Ovidius

Well-known member
Jul 5, 2003
45,919
at home
The CEO of UKFAST bloody loves the sound of his own voice and publicity.

SLA worth nothing except some invoice credits which I doubt covers the actual client impact.

No-one should be relying on a 3rd party to provide DR etc for them, any organisation worth their salt would build their own DR plans.

The excuse they've given sounds shonky too.

Well I work for a billion dollar availability company covering most of the worlds biggest companies and corporations, part of the top 3 availability companies in the world. If our customers didn't trust our " 3rd party " offering, we wouldn't be in business.

We have taken back many companies who have thought they could have done DR better than us, i.e. People who specialise in it and have the infrastructure to cope with multiple invocations.

We have slas of 99.98% availability on many customers,especially in specialised industries and local government.

That is the real world of "3rd party offerings""
 


D

Deleted member 22389

Guest
Problem being that all the CFO's in companies want IT / Telecoms services for the cheapest possible price. They sign the contracts with SLA credits clearly defined and then complain when they get £1.27 as a credit when the outage cost them £50k in business / staff time. I've seen this in many of my roles and still amazes me. If a service outage will cost you that much money then invest a proper amount in redundancy ..... strangely few do.

Very true. I used to admin a very large website, the final straw was when our server went down at 2am in the morning and was told that they didn't have a Linux Engineer in until the morning. That was money being lost. Not nice when I had the boss on my back. In the end I sorted the problem myself. After that incident we went with Rackspace, it was expensive but the quality of service was first class. Phones answered with a couple of rings, and any problems sorted ASAP 24/7.
 



Paying the bills

Latest Discussions

Paying the bills

Paying the bills

Paying the bills

Albion and Premier League latest from Sky Sports


Top
Link Here