Haplo Retrospective

Security

We met our ethical obligation to protect our users’ personal data and privacy with an uncompromising stance on security. This also enabled the business to succeed in a market which demanded high security software.

New graduates were able to write complex customised line-of-business applications which passed penetration tests without issue, simply by telling them to “do things the easy way”.

Building the foundations first

We did this in a way which was lightweight and helped us to work rapidly, by setting out a few principles and letting them guide our behaviours and processes:

We made sure that everyone prioritised and thought about security in their work through a culture of security. This started in the onboarding process, and was continually reinforced with gentle reminders and checks in our processes.

It is a lot of work to get information security right. We choose to spend the effort up front, putting considerable thought and time into the foundational work. This was repaid many times over by massively reducing the day-to-day burden on the developers and the company.

I specifically rejected the approach of adding “security” later. Patching holes after the fact is more time consuming, a reactive approach is unpredictable and messes with your schedules, and you’re far less likely to build a secure service.

Outcome

Working on solid security foundations is very effective. We were many times better than anyone expected us to be, with many amusing moments. My favourites include:

On the commercial side, the predicability helped the business enormously. As we knew that we would pass any security test, and that our security efforts were a constant small effort, contracts were awarded smoothly and there was never any last minute panic.

Principles into practise

Our principles were thoughtfully applied across everything we did, whether it was how we wrote software, or how we interacted with clients. You have to be absolutely perfect to deliver a secure service, whereas an attacker only needs to find one hole.

The trick, of course, is to be a high security company without it getting in anyone’s way. Security must enable a business, not prevent things from getting done.

Security through correctness

I approached security as a quality issue. Correct software does not have security problems, because a security hole is obviously a bug. This reframes the problem, moving security from something that’s treated as an extra step, to something that is fundamental to the process of writing software.

Of course, it’s hard to look at code and determine whether it has a security hole or not. Therefore, you need to minimise the amount of code which has to be secure.

We did this by creating a open source Platform which was as small as possible, and which was changed relatively infrequently. This presented a server-side JavaScript API which was used by the vast majority of the code. As the JavaScript code ran in a sandbox, it was not possible to circumvent this API and use private Platform interfaces.

The JavaScript API was deliberately written so that it was very difficult, if not impossible, to write code with security holes. This was the key to enabling early career developers to write code which passed penetration tests immediately.

A few examples:

All of these measures reduce the amount of code which is security critical, and almost eliminated the need for the developer to think about security. In the rare cases where there was a need for careful review, the security critical code was small, and was immediately obvious.

The easy thing is the secure thing to do

We put a lot of effort into making the Platform easy to use as a security measure.

If things are difficult, it’s human nature to circumvent them. (And if they can’t, get another job!) So, by providing secure tools that developers love, they will use them correctly and securely.

While the APIs were strict, they were easy to use. And on top of the API, we provided many pre-built components that had declarative APIs that were easy to compose into applications.

We strongly preferred declarative APIs, which were driven by data structures, because they described what the developer wanted to achieve, not the steps to achieve it. This has two benefits. Firstly, all the development effort is focused on a single high quality reusable component, and secondly, we reduced the amount of code that had to be written. If you don’t write code, it can’t have bugs or security holes.

I am very proud of our HSVT templating language, as it’s a really good example of this philosophy of making it easy to be secure. When you use it, you don’t feel it’s constraining what you can express. It has lots of functionality which makes writing templates a delight. And when you make a syntax error, the error message you get is clear and tells you the exact line and character of the problem.

Simple rules, with no human judgement involved

All our security rules were simple and easy to understand. Importantly, they did not involve human judgement in any way. Something was obviously permitted, or not allowed.

The best example was our rule on company equipment: you were not permitted to connect anything to it which wasn’t owned by the company.

This is very easy to follow. Yet it eliminates all sorts of security problems, from trivial malware attacks via USB sticks to sophisticated (and admittedly unlikely) attacks through subverted hardware.

It wasn’t hard to follow. We had a supply of USB sticks for the odd occasion when we needed to transfer files, and after it was plugged into someone elses’ computer, we gave it away. A refusal to connect random things at a client site, when the reasoning was explained, made them feel confident about our security measures. We had a “personal devices only” charger in the office so people weren’t tempted to charge their phone from their work laptop. And so on.

Similarly, we didn’t allow any unapproved software to be installed, which also included plug-ins and extensions to approved software like browsers and text editors. To avoid this getting in the way of work, our approval process for new software was very fast.

This approach also extended to our infrastructure. There was a simple rule that all networks were untrusted, whether the office network or the various networks in the datacentre, even the private subnet. This forced us to encrypt properly everywhere, mutually authenticate the servers which hosted our products, and configure devices securely on all networks. It even made it trivial to move to working from home when the pandemic started.

There was an amusing moment when an ISO27001 auditor thought they’d caught us out when they noticed there was a key in the lock of the office networking cabinet. They looked disappointed when I reminded them that all networks were untrusted.

Minimise the attack surface

If you’re not running something, it can’t cause you security problems. So we ran as little as possible.

Security products were carefully considered. For example, anti-virus software sounds a good idea, but attackers can upload whatever they want, and the AV software will parse it. Unfortunately, these parsers are numerous, of dubious quality, and run in a privileged position where they have access to pretty much everything on a system. So we sandboxed it, and only let it see one file at a time.

Dependencies and software libraries were resisted. Obviously you shouldn’t reinvent the wheel, but whenever you use a library, you are paying a price which is not always apparent. You need to make sure you’re using it properly. You’ve got to keep on top of updates. And there’s potentially lots of code that you’re not using, which could be flawed.

For many years, we used Rails’ ActiveRecord ORM for database access in the Platform. While it was kind of nice and definitely got us off to a good start, it had downsides. It had security problems that needed to be continually updated or patched, questionable APIs which trusted the data in HTTP requests too much, and had many potentially insecure features that needed disabling or mitigating. And it prevented us updating our Ruby version, because versions which worked on the new Ruby had significant API changes.

When it came to update, we didn’t want to face this again. So we, slightly reluctantly, wrote our own. Because it only had to do what we needed, it could be very small, and all features would be completely tailored to our use case. In particular, all SQL had to be specified upfront with insertation points, avoiding SQL injection.

As well as choosing our dependencies carefully, we managed them carefully. We used a deployment process where we repackaged each dependency, and referred to the version with its cryptographic digest in a signed deployment manifest for the Platform. While this was a bit of extra effort, particularly when upgrading a dependency, we knew exactly what we were running on our servers.

Security culture

You can have all the security processes and controls you want, but if your colleagues don’t see them as important, they won’t be terribly effective. We deliberately created a security culture in our team by embedding it into everything we did, and prioritising it in our leadership.

This began on a new colleague’s first day as a formal part of the onboarding process. The first was a bit of theatre. Our contract included a small clause which stressed the importance of security and the need to follow the security policies. We asked a new joiner to read the policies, then review these clauses, and then sign against next to them to show they had read and understood.

Within their first few days, there was an onboarding session on security. We made a couple of important points:

This was effective at getting people’s attention, and highlight their responsibilities to maintaining our record of good security by following the rules and policies.

This does, however, need security to be seen as important by the leaders in the company. As founders, we put security first, never made any compromises, never took any shortcuts, and were vocal about our uncompromising stance. It was clear that there was never anything which was more important than security, and we would unquestioningly support anyone who prioritised it.

Our impressive results in security were a direct result of this culture. A team needs a common shared goal and understanding to value something that is not measurable.

Miscellaneous

Cloud

When we started, cloud computing was very young and untrusted. We had to start by self-hosting, because cloud didn’t exist, and then we kept on doing so for longer than most companies. We had the expertise to do it well, and our customers preferred that we had full control over the hosting environment.

However, that changed in the last few years. Customers began to view the cloud as a known quantity with known best practises which were “secure”, and felt comfortable if you did things a certain way and enabled the cloud security products. So we evolved, and started to host more on the cloud. (After the acquisition, the new owner moved everything to AWS within a few months.)

The cloud also makes it easier to work internationally. Customers want their data hosted in their jurisdiction, with the services close to them for performance. Cloud providers make it easy to bring up a cluster pretty much anywhere in the world.

Even though we adopted the cloud for hosting, we still took a very strong line on security, and did not use any other vendor to process data. This perhaps made things a little harder for ourselves, but dramatically reduced the things which could go wrong, and the number of vendors to justify in security reviews.

ISO27001

I initially assumed that ISO27001 would just be annoying paperwork, and wouldn’t help us to be secure. I was wrong.

Unlike the lesser certifications like Cyber Essentials, ISO27001 is not prescriptive in how you do things. It sets out the things you must achieve through a long list of “controls”, but how you actually do them is up to you.

We choose the toughest auditors we could find, and made sure their approach was consultative rather than just ticking boxes. The process went through all our operations, and worked out how we achieved them in a way which worked for us technically and commercially.

This systematic review, while it didn’t reveal any flaws, gave us confidence we hadn’t missed anything, and enabled us to make incremental improvements and plan how we continued to improve as we grew.

I was, however, right about the paperwork.

Physical security

We avoided many of the problems with physical security by:

This allowed us to focus our physical security efforts on the laptops. Measure we used included:

These rules also brought our non-developer colleagues into the security culture, by showing them how their actions affected the security of our systems and involving them in everyday security.

Resistance to social engineering

We were “hands on” with our customers, with close working relationships over multiple years. As it was usual for us to have support requests and phone calls which related to the access controls for our customers’ data, social engineering was a significant risk in our threat model.

This risk was managed by a policy to treat all communications which impacted security as suspicious until verified. We made sure everyone was aware of how they might be tricked into compromising security by explaining and repeating all the ways in which social engineering attacks can succeed.

We also needed to be internally resistant to social engineering. Again, all instructions were treated with suspicion, and the black and white rules made it very easy to see which requests were breaking the security policy.

Continual education

A security culture needs to be reinforced by continual education and reminders, otherwise it becomes forgotten. We integrated this into our day-to-day work:

When we became ISO27001 certified, we formalised the process by keeping records to satisfy our auditor.

How this can go wrong

Because you are only actually secure if you’re perfect, there are plenty of ways you can go wrong. (Realistically you just need to make it more expensive to attack you than would be gained by the attack.) Few of these potential failures are unique to our approach.

If you think you’re doing security well, you can very easily end up with a false sense of security. It is pretty much impossible to look at any bit of software and say it is secure, not least because it’s built on lots of code from other people. You still have to remember that you need defence in depth, to be able to detect attacks, and respond appropriately.

Security needs to be led from the top of the organisation. It was the top priority for all the leaders in our organisation. If compromises were made for expediency, then it will destroy the culture of security.

I don’t think it’s possible to retrofit a foundational approach to security in a team or a product. You will always be dealing with the results of the old reactive approach. However, it could be used to take a more rigorous approach to future work.

If you built all the foundations first, you’d spend ages building them, and never get around to building a product to sell. So you have to evolve and improve, and be unafraid to revisit old code. The biggest example for us was our adoption of HSVT for HTML templating. A small amount of our product code still uses Handlebars, which is OK, but not perfect. The Platform, however, still uses ERB, which is terrible and requires a lot of attention to detail from the developer. We will eventually replace it all with HSVT, but for the moment, we have to rely on the fact it’s not changed much, and very carefully reviewed.

If you ban the use of all hardware and software, you can easily miss out on opportunities to work more effectively. This risk has to be mitigated by fast evaluation and approvals, and a willingness to accept reasonable risk for business benefits.

Finally, some of our lower level APIs place restrictions on the code you can write, and this sometimes results in inefficient code. In particularly, our database APIs don’t allow you to take full advantage of SQL, which is a shame. We have managed this without compromising by carefully extending the API as we find really bad cases, and designing it in the first the place to fit our use cases well.


Table of contents