BitDepthFeatured

The price of the single point of failure

3 Mins read

Above: Illustration by eamesBot/DepositPhotos

BitDepth#1420 for August 21, 2023

After TSTT’s network experienced widespread failures on August 09, the company issued general apologies about the issue, promising prompt restorative action.

00:00

    It wasn’t until Communications Worker’s Union Secretary-General Clyde Elder accused the company of neglect and poor maintenance that CEO Lisa Agard responded with details on the reasons for the hours-long outage.

    “A GE breaker at Nelson Exchange designed to function uninterruptedly for 30 years malfunctioned when there was an electrical surge,” Agard told the TT Guardian.

    “This equipment is only eight years old. This led to a series of events which ultimately led to important telecommunications equipment not being supported by power.”

    TSTT CEO Lisa Agard.

    Reading directly from that statement, questions arise about the reference to a breaker, which isolates equipment from an electrical surge, and its implied role in impacting support systems which should have supplied redundant stored power, normally from an industrial UPS.

    A request to discuss the technical details of the incident was ignored by TSTT, so what’s left is an admission of reliance on a single, probably expensive piece of equipment, which represented a point of failure that appears to have had no clear redundancies.

    The problems that arise from a single point of failure are not unique to technology.

    The concept was most dramatically illustrated by the story of Achilles, whose mother, according to mythology, dipped him in the River Styx, granting him immortality save for his heel, where she’d held him.

    If there is any moral to be taken from this fanciful sidebar on the Trojan War, it is that the part of any system that is weakest and least replaceable will eventually suffer a catastrophic failure.

    In February 2022, while the country was firmly under pandemic lockdown, a 21-metre-tall Palmiste tree, its bole rotted through with fungus infestation, came crashing down in Grant Trace, Rousillac.

    The falling tree severed a single-phase TTEC 12 KV distribution line then hit the 220KV line circuit which transfers most of the power from Trinidad Generation Unlimited (TGU) to TTEC.

    TGU generates one third of the country’s electricity and the sudden loss of that much capacity caused a cascading collapse of the grid shutting off electricity for more than twelve hours for most of Trinidad.

    Six months later – again in Rousillac – a landslip caused the partial collapse of a 220 KV transmission tower, leading to load shedding and an outage for 30 per cent of the country, most in the south of Trinidad.

    These are not mechanical failures, they are design problems, architectures with limited redundancy that made critical systems more fragile than they needed to be.

    Ultimate redundancy is two of everything, but that’s both needlessly and prohibitively expensive.

    Design thinking in networks, whether they carry electricity or data, is a measured consideration of the eventual failure of components and how their function can be replaced or rerouted with minimal impact on the end user.

    Illustration by rudall30/DepositPhotos.com

    It’s an idea that has wider relevance to society. Any company or government hierarchy that doesn’t have a proper succession plan is also courting the problems that result from a single point of failure.

    Pandemic restrictions brought the importance of redundancy and resilience home, quite literally.

    Having a secondary source of internet connectivity to support two remote teachers and a remote student at home became critical after Flow experienced an extended outage.

    Keeping that internet connection going across electrical brownouts and outages with a UPS went quickly from luxury to necessity.

    The contemplation of redundancy and efficiency in networks is a challenge even in small systems.

    During a complete revamp of my desktop workstation, its peripherals and connections, a topological map of the system, freed from the tangle of wires and boxes, revealed a systemic problem.

    I’d replaced both my workstation and laptop systems, both upgraded to 100base-T ports along the way, but the systems were connected using a 10base-T ethernet rated switch and cabling.

    That simple, inexpensive change doubled throughput speeds, particularly useful since I move large files back and forth across that hardwired connection.

    It would be a mistake to think about a single point of failure as being only technology related.

    It’s a weakness of systems designed by humans, not a cruel whim of fate. A fragility in the heel of presumptions of robustness or invulnerability that are inadequately tested and believed to be more secure and protected than they actually are.

    Finding and assessing these weaknesses is even more critical when continuous availability and access are the characteristic most highly prized in a system.

    It’s why skydivers have a reserve parachute, but with every jump they hear the wind roaring past their ears and see the ground growing closer.
    Their enthusiasm to design redundancy is commensurately more urgent in their planning before stepping through the door of an aircraft.

    You should probably have an EPK You should probably have an EPK
    Hands-on with the Samsung S25 Ultra Hands-on with the Samsung S25 Ultra
    The last Carnival column The last Carnival column
    Will UPI change how we shop? Will UPI change how we shop?

    🤞 Get connected!

    A once weekly email notification of new stories on TechNewsTT. Just that. No spam.

    Possible UI Glitch. Click top right corner to dismiss 👉

    Get Connected!

    A once weekly email notification of new stories on TechNewsTT.

    Just that. No spam.

    Related posts
    Press Releases

    bMobile partners with Neptune for satellite based network resilience

    3 Mins read
    Neptune provides 100% geographic coverage across Trinidad and Tobago’s land and territorial waters
    Press Releases

    TSTT earns four TATT awards

    2 Mins read
    TSTT remains dedicated to delivering cutting-edge solutions that address the needs of its customers while contributing to national development.
    News Briefs

    TSTT confirms Kent Western as CEO

    1 Mins read
    Effective October 01, 2024, TSTT has confirmed the appointment of Kent Western as Chief Executive Officer. Western has been acting…
    Subscribe
    Notify of
    guest


    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    2 Comments
    Oldest
    Newest Most Voted
    Inline Feedbacks
    View all comments
    dennise
    1 year ago

    Very thoughtful article. Love the story about your own workspace.

    For 5 years @keitademming and I did summer camps for young people focussing on systems thinking and design thinking. We eventually abandoned the idea because of a lack of traction.

    What I am taking away from this is that at the highest level of our corporations, design thinking is not part of the way we work. My question is what has to happen for us to transform our approach?

    trackback
    1 year ago

    […] Trinidad and Tobago – After TSTT’s network experienced widespread failures on August 09, the company issued general apologies about the issue, promising prompt restorative action… more […]

    ×
    FeaturedTechnology Reporting

    What caused the Internet failure of December seventh?

    2
    0
    Share your perspective in the comments!x
    ()
    x