Lessons from Bret – A Tech post-mortem

Reading Time: 12 minutes

Tropical Storm Bret on June 19, 2017. Photo courtesy NASA.

I’ve dealt with disaster communications over the years, and disasters in general. While we can view people suffering in the disaster-porn of the moment, it’s important that there be some introspection in the technology sector to deal with the things that went wrong.

Things went wrong. The intent here is to be constructive so that, in the future, these things won’t happen. It’s a fact of life that systems fail. It’s our response to these failures that defines who we are as individuals and as a society.

The Broad Strokes

The ODPM site was up and down throughout the period before and after Bret. The Trinidad and Tobago Meteorological Service site, too, was not dependable. And on the heels of writing about the Cybercrime Bill, the concept of ‘critical infrastructure’ comes to mind – the things that people should look toward when disaster strikes Trinidad and Tobago failed.

The ODPM itself, with it’s app, didn’t send out any alerts that myself or anyone else knew about. To balance that, Facebook pages run by government offices were fairly dependable in sharing information, mainly because so many people are on Facebook. Twitter, too, was used effectively by some in the media, though I have not heard or seen anything from government on that.

Also, there were the telecommunications providers (bMobile and Digicel) as well as well moderated WhatsApp group that I was in. That there are applications that allow posting to multiple social networking sites at the same time should not have to be pointed out in this day and age. It has not been a secret, and marketers have gone out of their way to make their products known.

These failings are lessons, lessons that can and should be addressed. Around the country, as flood waters recede, as life returns to normality, the people are owed disaster communication systems that work and are used appropriately. As such, I’ll write about a few systems and what they did right as well as what they did wrong. There will be commentary on the specific systems, and there will be a more in depth commentary for the more technically experienced afterward.

The ODPM App

I didn’t know about the ODPM App until the day before Bret hit, but the ODPM managed to get the word out on it. I downloaded it, installed it and it prompted me for my name, email address and phone number. What struck me as odd is that it didn’t ask me for my address. In emergency communication systems, addresses are used to allow for an idea of where a person lives and what might affect them in a specific area.

As we have realized, some people came through Bret completely unaffected. Others are, a day later, still dealing with flood waters. It’s reasonable to believe that the people in certain areas prone to flooding could have been given better warning. People close to river banks, as an example, could have been updated when the river was close to going over the banks, and told to evacuate to higher ground, perhaps even pointing them to appropriate emergency shelters near them.

If that sounds magical, it isn’t. I’ve worked on and troubleshot such systems at ECN. What the ODPM App did have that was a list of Emergency Shelters, as well as Emergency numbers. While there is some concern about whether these were up to date and whether shelters were activated, the ODPM app did have the information in it. It also had information on how to prepare, etc.

That, though, is static information. During any form of disaster one has to assume that the Internet will not be accessible, as well as other forms of communication, so it is good to have that in the ODPM app. But it did nothing else. It’s certainly colourful, though ‘Succes’ should be spelled, ‘Success’, amongst other things. Could the App be capable of doing more? Quite possibly, I would hope so. It just didn’t.

 The ODPM Website

The ODPM website was down so much during the event that I tossed it out in my triage of communication. Looking at the site before and after, it could have been useful to a lot of people, but that requires up-time. I’ll delve more into this in the more technical area; I’d wager that at least one person warned of potential problems and was ignored – but I am a bit of an idealist.

The Trinidad and Tobago Meteorological Service Website

Whenever I checked the Trinidad and Tobago Meteorological Service site, it offered me a more graceful failure than that of the ODPM website – it told me, plainly, that the site had taken too long to respond. While this is a failure, it’s a graceful failure and that makes a lot of difference.

That the graceful failure may have been accidental in nature does not take away from people understanding that the site was taking too long to respond. Someone on Facebook told me that they were getting updates every 15 to 30 minutes on the site, and I believe them. It simply couldn’t handle the load during Bret.
NOAA became my site for information rather than the MET Offfice.

bMobile

I tend to think of bMobile as state-run because of who owns it, so I don’t really think of it as private sector. It’s in that grey area in my mind; as divorced as it may be from government, the government still owns controlling shares in it. Thus, it comes next in this. I’m a bMobile customer.

I had service throughout the event in San Fernando, though I imagine that there may have been problems elsewhere- it is to be expected. I got no information from bMobile itself through text, and I was a bit surprised at that. Their Facebook page, on the other hand, had things like Tech Tips for Storm Preparation and a Service notice before the storm hit, which was good.

They even notified the public of office closures the next day right around when the storm effects were being felt. After the storm, they ran free 5 minute calls for a good portion of the day so that people could get in touch and check on people. I do hate to be impressed by such things.

I was impressed. It showed, from the private sector (or not, it’s still grey to me), that bMobile understands their role in the lives of their customers, and that people would want to check on their family and friends. That’s a way of building brand loyalty.

Digicel

I’m not a Digicel customer, so I don’t know whether they sent out any text alerts. On their Facebook page, they shared a LoopTT article, “Citizens urged not to panic as country on Tropical Storm warning“, while posting ‘Take all relevant precautions’. They later posted the map of emergency shelter locations and then – at 7:36 p.m., which was a little too late to be useful, a link to the LoopTT article, “Tropical Storm Warning: Items You Need for a storm.

During the storm, they did post the Tropical Storm Bret Advisory. Afterward, they shared an article on the Caroni river high while noting that there might be rain for the next 5-8 hours as well as some Drone footage of flooded areas. It seems to me that Digicel did things bMobile did not, and that bMobile did things that Digicel did not. Perhaps they can learn from each other.

WhatsApp: The TT.Talk WhatsApp Group

When it comes to community, crowdsourcing is one of the best methods of communication, and without appropriate moderation, it can quickly devolve into one of the worst methods of the communication. It’s one of the concepts that I wrestled with when it came to the Alert Retrieval Cache for the future – now. No matter how I looked at the problem, it boiled down to human moderators keeping things civil, informative and clear.

I found out that the person behind the group was Khalid Shageer, who I did not get into contact with – I got his name from one of the moderators.

During the storm, this was probably the best avenue for information I found that had updates from several government offices passed around, updates from around the country which included video and photography. There were, of course, a few idiots that were posting things that were unrelated – but good moderation kept them squelched.

From my perspective, as winds howled overhead and water dripped down a wall next to me, was the most informative and engaging. I used it to inform others in my social networks. Where others failed, the community stood tall.

It was made available by WhatsApp and rode on the combined abilities of bMobile and Digcel during the storm. Most importantly, it consisted of a self organizing network of volunteers who shared and disseminated information in a timely manner, enabled by TT Talk: “a small group of like minded individuals with a thirst for knowledge and a drive to succeed.”

This is the sort of thing that Trinidad and Tobago needs to be tapping
into.

Non Technical Commentary

This is for the people who don’t speak technology but may be in charge of the systems mentioned here.

It’s interesting, in many ways, that after my commentary on the CyberBill 2017 and the need for audits of ‘critical infrastructure’ that an unrelated example should show up. These sites are critical infrastructure during an emergency, and if I were to pretend to be a lawyer I might try to make a case under the CyberBill that the government itself is culpable for not using best practices. If you’re going to sell me on the CyberBill protecting governmental websites, at least make the government software worthy of it.

These sites did not perform when they needed to perform; while poor performance is excusable (The Trinidad and Tobago Meteorological Service), the ODPM’s website was not up to par. From my experience with the technologies of these sites (I used to be an active developer of such technology), it’s clear that they were not properly stress-tested, and that their setup is not what it should be.

The only reasons I can see for this are speculative, and I prefer not to speculate as I do not know the requirements given for these sites, their development process, or their planned life cycle.

A site like these is simply not developed. They are maintained, they are constantly tested, and they are redundant. They scale to demand. And in a disaster scenario, it’s best that they aren’t in the area where the disaster is – which is why myself and others have tried pushing for a multi-state solution. There is the budget for the development, which should require strict requirements from a government office, and then there are the recurring costs for maintenance and ongoing development.

The good news about the sites is that they use the same base technology – Drupal – and that I know for a fact that Drupal can handle much more than what was thrown at these sites. I worked for a company, Treehouse Agency, that worked on the U.S. Department of Energy website.

The present Whitehouse website is also using the same base. And this base technology could be used for multiple sites, allowing maintenance on the back end for all government websites using it to be done once instead of many times. Reduced costs. Reduced cycle time. Improved maintenance. It’s really not that hard.

However, as I understand it, there is no standardization of technology across Ministries and governmental websites. That’s a big problem, bigger than this. That needs to be addressed.

The ODPM App? I don’t know how much money was spent on it, but it sits on my phone occupying space. I don’t even know that the information in it is updated should the information change. Again, software is not just something developed, it is maintained. With no alerts being sent to me, it seems like a paperweight on my phone. It’s colourful, though. I suppose that counts for something.

bMobile and Digicel? Telecommunications providers? They could be working better with government offices; they should be in my opinion. As luck would have it, a paper was actually written by this very same thing: Strengthening cooperation between telecommunications operators and national disaster offices in Caribbean countries. I know that one of the co-authors (Shiva Bissessar) is here in Trinidad and Tobago, and I know that there will be follow-up articles on this related to policy.

The takeaway from this is that things are not working right, that there are ways to fix them, and that the government has a role to play in that.

Social media played a large role on all of this, and the open lines of communication – the unfettered, uncensored lines of communication through social media – allowed communication throughout the event to get information around fairly well, but not necessarily where it needed to be. Social media has a tendency to pat itself on the shoulder too much.

And, again, a hat tip to those at TT Talk that did make a positive impact. A small group of like minded individuals with a thirst for knowledge and a drive to succeed seems like an asset anywhere.
If you’re non-technical, feel free to skip the next section and go to Conclusions.

Technical Commentary

This is where I speak to my fellow geeks. First of all, I expect that there will be some anger about things. I expect that there will be people who warned of the same things that I mentioned, if not more, and that they might have been ignored. I feel your pain; I have felt more than once in the private sector throughout my career, and I get it. Still, I have to write this to be thorough.


I’m assuming good, of course. I’m assuming that you weren’t given solid requirements, that you did things within budget, and that you wanted to do more. I could be completely wrong. I could be partially right. I’m not privvy to that information, and I’m not sure I would want to be.

But stuff failed.  

The Drupal development on the Trinidad and Tobago Meteorological Service and ODPM websites is good, but the backend needs work. It needs stress testing and appropriate changes to the backend. It seemed like the database communication in the ODPM site was problematic, and that code for the maps was on another server that had a lack of bandwidth for the Trinidad and Tobago Meteorological Service – at a glance. I don’t know. But they weren’t working.

I don’t know that there are maintenance contracts for this stuff, I don’t know. What I do know is that they failed, they shouldn’t have, and I’m not sure that good software engineering practices were followed. In the U.S., there are requirements for government websites, as in other countries.

It seems, from the outside looking in and assuming good, that the best practices for such sites weren’t followed for whatever reasons.

Stuff failed.

No site should fail under these circumstances, the very circumstances where it’s supposed to shine.

The ODPM Application – is there a plan for it? I hope there is. I have ideas, such as using GIS information and mapping software to allow for alerts within geographical areas. I know the GIS information exists here in Trinidad and Tobago. Access to it, on the other hand… well, I know that might be problematic.

I’ve seen one CTO stay up into the wee hours looking at network communication before – himself – when the company’s software was having trouble. That, like so many other things I had to fix or help fix, were because best practices were not followed.

If any CTO I worked with in the past – any CEO, CFO or anyone else – saw these sorts of problems with their code in production, there would be blood.
Get the solution proposals ready and slam them in now, if you haven’t already.

Conclusions

Stuff failed. It could be done better. It should be done better, but given the lack of visibility I have in these things I cannot say what should or should not be done. What I do know is that solid software engineering best practices need to be followed not just on the ODPM and Trinidad and Tobago Meteorological Service websites, but on all government websites – and for that matter, all websites in general.

As someone who has worked with emergency communications and the Domestic Communications Infrastructure (DCI) in the U.S., it’s disappointing to see these things not done – but understandable, because I had to bring some legacy systems up to date.

Stuff always fails. The more complex the system, the more likely to fail. Constant maintenance and testing is needed on the websites – and the disaster office, as well as the Trinidad and Tobago Meteorological Service, might want to consider more of a relationship with the telecommunications providers.

This was a learning experience, one that should be hardening those systems. Let’s not waste this experience.

bMobile deserves a salute here. While Digicel did some things that TSTT did not, the combination of these things would have made for better communication.

And those like minded individuals with a thirst for knowledge and a drive to succeed, along with so many volunteers sharing and disseminating information on social media, through social networking and otherwise? A salute. Well done.

  • Richard Jobity

    There are a lot of old-tech and low-tech things that we apparently no longer do, which I have now realized.

    Once upon a time, we got SMS alerts on our phones telling us of storm warnings – at least we did the last time a storm looked as if it was about to pass through. Not everyone has smartphones, but if you have any sort of cell phone, it has SMS. For communication, use every medium you have available.

    Before that, in the pre-Internet days, we were told every single year to buy portable radios and have extra sets of batteries available so we could be informed if electricity went. It’s still valid, even if less young people listen to the radio. But older people do. But apparently we no longer do that, or, in *some* cases, can no longer afford to buy radios. For communication, use every medium you have available.

    Even before the wide availability of radio, we had town criers (the guys with the big tweeter horns on top their cars that go around areas with death announcements, community announcements, etc) . They still exist. There is no reason not to use them, even though it is startlingly low tech. But they do work still, especially in areas with indifferent cell reception. Next time, we could do worse than consider using them. For communication, use every medium you have available.

    • Taran Rampersad

      Yes. I didn’t mention those things, and I think that could be fodder for another article. I’m planning one on related policy, and I think touching on these things is important.

      For example: HAM radio – still something in use that you don’t hear about. More interoperability with telecommunications providers, something I recently learned was pressed by someone here in Trinidad… I’d fallen out of touch when CARICOM was looking for a solution in in the late 2000s, and they were more interested in their budget than in the solution (or so it seemed to me).

      I know two 72 year old men, one in South Oropouche, one in Carlsen Field – neither knew about the storm when I called them to make sure that they knew.

      But underlying this, I think, was the culture of not caring. We jamming still. Etc.