THE DUSTY SERVER
My early career was spent working as a lecturer at the college where I had studied a few years before. Our IT department was responsible for hosting a wide range of applications on internal servers, most of which they had very little understanding of beyond the instruction manual provided by the software developer.
Unfortunately, the software I relied on the most was an open-source LMS (which we called a VLE back then!), and as such didn’t come with much of a user manual. The server would sit on a shelf in the corner of the server room, gathering dust and minding its own business until every now and again some mischievous students ganged together to overload it and bring it down.
I’d often get a call to take a look and offer assistance since I knew more about the software than most of the IT team, and eventually, I ended up looking after it full time. Thus started my career in ed-tech!
THE VIRTUAL SERVER
By the time I moved across to working in the corporate space, things had moved on quite considerably. Everyone was virtualising their servers and hosting them on high-powered hosts, and this brought a huge amount of flexibility to deal with increased workloads.
I worked with a number of customers, looking after what was basically the same open-source software, who implemented servers designed to handle really considerable load including thousands of users at a time.
It was genuinely a highlight of my career to be able to work with some of these highly experienced teams, full of highly skilled individuals who knew much more than I would ever know about virtualisation, unix, databases, and all the rest of it.
In hindsight though, the original problems I faced all those years before continued to be prevalent. Despite the huge amount of talent these individuals had, they didn’t know enough about the software in question to be able to troubleshoot it effectively when things went wrong.
And things really did go wrong! With any technical project, you will experience issues that need troubleshooting on both the software and the infrastructure, but when “the customer is always right” it can be very difficult to persuade them that their hosting might be causing a problem – especially where the infrastructure team can’t see the problem.
The worst such example I can remember was a production database that never quite seemed to be working correctly; when restoring databases we could tell it responded slower than the staging database that was supposedly identical. We mentioned it a few times but it wasn’t causing problems for end users and so didn’t get much attention.
One day a significant number of users accessed the system simultaneously and the database failed. In the aftermath, we worked with the customer to repeat weeks worth of load testing on the staging environment but couldn’t reproduce the problem. After all that time and resource it eventually transpired that the production database was firstly not as up to date, and secondly had only 50% CPU power compared to the staging environment.
THE NON-EXISTENT SERVER
Having the amazing opportunity of a fresh start with THRIVE, we wanted to deal with these types of problems once and for all. Luckily, by 2018 the cloud hosting space was offering some hugely compelling features that are really difficult to replicate in a virtual server setup.
THRIVE’s various services are hosted on a serverless infrastructure in the AWS London data centers. We use a combination of containerised services running on AWS Fargate as well as smaller serverless functions running on AWS Lambda. Both of these technologies allow us to scale our platform based upon the needs of our customers and without any consideration for physical hardware constraints.
This approach has benefits beyond the scalability – maintenance is also significantly easier to handle because we don’t manage any operating systems in the infrastructure. We’re also able to leverage a range of other AWS tools to deploy services automatically, ensure security, and monitor performance proactively.
We still might encounter a few issues of course, but with this approach, our developers are working closely with the infrastructure team to deploy and maintain services, and as a result, we can troubleshoot problems so much quicker.
AWS are a great partner to work with, and we can rely on them to deliver a consistent service in a highly-secure environment. This in turn helps us to work more methodically and adhere to standards like ISO 27001 which we were certified just 6 months after launching THRIVE.
After 14 years we are finally solving the problems that plagued the dusty server all those years ago.