A scared executive's guide to digital privacy
Feb 10, 2021
In Finland, where I live, a hacker recently hacked into a database of Vastaamo, a psychotherapy company, and ransomed their clients' data for bitcoin. The Founder/CEO was eviscerated, and the company's valuation has plummeted. The reason for the hack is because the password for their main server was
root. This went unnoticed in a company with hundreds of employees serving thousands of clients.
The reaction of most of my friends in the Finnish tech sector wasn't "Ha, those idiots, what a dumb password." It was, "Oh fuck, do any of our servers have the password
root?" Privacy fiascos freak us out because we realize that the systems we manage are too complex for us to understand. They often represent millions of lines of code spread across hundreds of services authored and edited by many different individuals. And yet, when something goes wrong, it's us getting fired and even sued personally in class action lawsuits. Shaking in your boots yet? You should be!
This article is for you, scared executive. By the end, I hope you have enough concepts under your belt to be able to have a heart-to-heart with your engineering team about some basic steps you can take to avoid being the next Vastaamo.
At Meeshkan, we use 1password as a company-wide password manager. Every single employee and contractor gets a 1password account on day one, and every password is generated using this tool. Generating passwords is way easier and way safer than coming up with them on your own, and once you get in the habit of using a tool like this, you'll never go back to managing your passwords in a text file, on PostIt notes, or (even worse) in your head. If you don't have one already, set up a team-wide or company-wide policy to use a password manager, import all existing passwords in there, and change the ones that are flagged as weak.
Chances are that your stack is deployed on a public cloud like AWS, GCP, Azure, Heroku or Vercel. All of these services have key management services (KMS). For example, at Meeshkan, we use AWS Key Management Service to hold the keys used by all of our cloud resources, such as authentication tokens and passwords to databases. KMSs provide automatic key rotation, ensuring that new tokens are provisioned regularly for mission-critical services.
Using a key management service has a trickle-down effect on the rest of your stack that leads to sound architecture choices. Because only cloud services can access these keys, all of your testing needs to happen in the cloud. This means that you'll have to design your stack so that test environments can be deployed to the cloud relatively quickly, which requires infrastructure as code + CI/CD to pull off at scale. At Meeshkan, a copy of our entire stack (DB, servers, website, web app, cron jobs, ML pipeline: the whole nine yards!) deploys to the cloud in less than four minutes, and all of this stems from our privacy-first approach.
At Meeshkan, our recorder script (which helps us understand user behavior to write tests from) never records credit card data, passwords, or hidden input fields. This means that they will never even be sent over the wire, let alone stored in our database, let alone picked up by our algorithms.
So, when you're looking over data anonymization, start with your frontend and mobile engineers. People tend to think of data privacy in terms of databases, so these engineers often get overlooked, but they're the ones that will make sure you don't expose yourself to unnecessary risk.
One of our artifacts at Meeshkan is videos of tests running on websites. These videos use input data generated by our ML algorithms, and if the algorithms learn a piece of data that is then marked private, there's a risk that a new video may be generated before the algorithm can retrain, causing it to display private information. That's why we have an extra layer of anonymization right before our artifacts are created.
For example, when we render videos, we have a conservative anonymization script written in Python that re-redacts sensitive fields right before a screenshot is taken. Make sure to get in touch with the team that produces those images or PDFs or videos and make sure there's some sort of sensible last-mile redaction in place for sensitive data!
There are several benefits to open-sourcing large swaths of your stack:
- White-hat hackers will find zero-day bugs much faster in open source repositories.
- Open-source projects often benefit from generous free tiers, allowing you to use services like Snyk to scan your dependencies and liabilities for free.
- Engineers love open source, and there's a higher chance they'll write high-quality, safe code if they know it will be out there for the world to read.
At Meeshkan, even though we're a team of six, we have open-sourced 100s of repositories and will continue to do this in our commitment to transparency and security.
Bots powered by AI can simulate lots of behaviors, including hackering behavior. At Meeshkan, we build AI to simulate real users interacting with online services, helping to find bugs and vulnerabilities.
Real hackers may steal and ransom your data, but their AI-powered homologues are usually nice (at least ours are) and will send you a polite-but-stern warning. Having your site continuously tested by artificial intelligence can help privacy violations surface and disappear before carbon-based entities spot them.
A lot of articles on privacy will talk about doomsday events where archived databases are dumped on the dark web. However, the articles very rarely talk about how hackers reach this data. It's not rocket science: hackers use brute force, guesswork, sleuthing, impersonating, and even wooing until they find a password or token. Using a password manager, a key management service, end-to-end anonymization and aggressive open sourcing drastically decreases the chances that your data will be compromised.
Don’t miss the next post!
Absolutely no spam. Unsubscribe anytime.