Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A few years ago we didn't renew our subscription on time because we got the email over Christmas break, and iirc they deleted all of our data in less than two weeks. They were eventually able to manually restore it from backups, but they restored it incorrectly so there was a bunch of stuff broken. This whole thing isn't even remotely surprising to me.


You can sleep soundly: it seems like they back _everything_ up:

> Second, the script we used provided both the "mark for deletion" capability ... (where recoverability is desirable), and the "permanently delete" capability that is required to permanently remove data when required for compliance reasons. The script was executed with the wrong execution mode and the wrong list of IDs. The result was that sites for approximately 400 customers were improperly deleted.

> To recover from this incident, our global engineering team has implemented a > methodical process for restoring our impacted customers.

[https://www.atlassian.com/engineering/april-2022-outage-upda...]

Anyone else find it disturbing that they are able to restore data that they deleted permanently for "compliance" reasons? If this is true, how were they ever compliant? I guess data is only permanently deleted when the engineering team is following their typical, non-methodical process...


No, I don't think that's disturbing. That's the point of backups - even when something is permanently and completely erased in the production database, it's still in the backup. Eventually it will get rotated out as the backups expire.

Going back and purging things from the backups as part of the delete process would be overdoing it to a ridiculous degree.


I think that depends on what you mean by compliance. Some regulations require you to irreversibly destroy data when they prescribe the destruction of that data.

That can mean as much as "you have to encrypt everything with a separate key, so that you can destroy the key for the given (say, personally identifiable) dataset making its retrieval irrecoverable"

I'm not saying that's the particular compliance reason they had here, or that the analysis you're giving is wrong, either. There is an interpretation where either of these ideas could be the correct one.


This is why regulations specify that data must be destroyed within a time period, typically 90 days. It gives enough time for backups to rotate out.

If this weren’t a concern, regulations would demand immediate deletion of data.


"permanently delete" strongly suggests to me that it was the "medical and financial data" kind of compliance. If data can be restored, it's not permanently deleted. But this was a statement from the CEO, so words can have arbitrary meaning :)


"permanently delete" does not mean the same thing as "immediately delete". deleting from the live database is the first step of a permanent deletion, as long as the data exists somewhere the deletion process is still in-progress.

there's a whole lot of people in here who are way too quick to assume that just because one part of a permanent deletion process was inadvertently triggered and then caught while they still had backups, their whole permanent deletion process is a lie.


https://ico.org.uk/for-organisations/guide-to-data-protectio...

You seem to be right-ish, while the gdpr in certain circumstances allows you to keep backups of data that should have been deleted it seems like they are trying to discourage it in the future.

> ...It is, however, important to note that where data put beyond use is still held it might need to be provided in response to a court order. Therefore data controllers should work towards technical solutions to prevent deletion problems recurring in the future.


A better way to do this sort of thing is not an actual "delete", but a "cryptographic delete". The data should be encrypted, and you just delete the key. The data is then unrecoverable everywhere, including backups. Of course you probably don't want to just nuke the key, but disable it for some period of time, and then nuke it.


i don't see how that really changes anything - your keys should be backed up just as much as, if not more than your data. and any process for deleting the encryption keys should allow for restoring from backups for some period of time just the same as your process for deleting data should allow restoring from backups for some period of time. either way, permanently rendering data as unrecoverable takes time.


As an example, if you are using Amazon's KMS for key management and you destroy a key it gives you 7 days to undo before permanently destroying the key. Or you can disable they key and destroy it later as your retention policy permits. Surely they have some kind of key backup, but KMS users have no access to those backups.


> Going back and purging things from the backups as part of the delete process would be overdoing it to a ridiculous degree.

Also, modifying backups is a great way to inadvertently hose your backups.


Nope it's not ridiculous. If you are only allowed to store data for x month that's it.

It's your job to use technics which allow you to do this like using encryption on your backup and deleting the keys for it, for example.


Delitio> If you are only allowed to store data for x month that's it.

Exactly. I'm not aware of any laws saying "you must delete this data immediately". More like "within X days or months". The permanently delete thing presumably skips some cooling-off period in the online database but not the backup, which seems perfectly appropriate, provided your backup retention is compliant.

Google has a nice page describing out their deletion process. [1] It doesn't go into product-specific technical details/steps (like marked as deleted within the product, row deleted from Bigtable/Spanner, major compaction guaranteed to happen, backups guaranteed to be deleted or unusable) but it says this:

Google> We then begin a process designed to safely and completely delete the data from our storage systems. Safe deletion is important to protect our users and customers from accidental data loss. Complete deletion of data from our servers is equally important for users’ peace of mind. This process generally takes around 2 months from the time of deletion. This often includes up to a month-long recovery period in case the data was removed unintentionally.

This is a best practice.

Delitio> It's your job to use technics which allow you to do this like using encryption on your backup and deleting the keys for it, for example.

If they'd thrown away the encryption key immediately, this would have been much worse. Instead of "we're down for 2 weeks?!?" (already quite bad) it'd be "our data is gone forever?!?". You never want to delete anything too quickly for exactly this reason.

[1] https://policies.google.com/technologies/retention?hl=en-US


It's generally recognized that deleting data from a backup would violate the integrity of the backup, so allowances are made. Usually you have to make sure the data is deleted as part of the restore process. For example, from CCPA:

> If a business stores any personal information on archived or backup systems, it may delay compliance with the consumer's request to delete, with respect to data stored on the archived or backup system, until the archived or backup system relating to that data is restored to an active system or next accessed or used for a sale, disclosure, or commercial purpose.


Generally user data deletion happens in multiple phases for large companies that care about both compliance and user experience.

For example, if you delete an email or document on Google it moves to the "Trash" folder for 30 days.

When you manually empty the trash or the time window expires, most likely the next step would be a soft deletion for a few days where the data is still on hard drives but hidden from the application. Soft deletion is mainly protection against coding errors, since soft deletion is easy to undo if you've caused an incident but hard deletion (removing the data from disk) is not.

Then most likely a garbage collection process comes by a few days later and hard deletes the data from disk, leaving it only on tape backups

Finally, maybe a month or two later it disappears from the tape backups as they get rotated or otherwise disposed of

This addresses the needs of:

- Giving a good user experience (user "oops I made a mistake" undelete)

- Protecting against incidents due to coding errors (software engineer "oops I made a mistake" undelete)

- Making sure data disappears from both disk and backups within a certain time window, like maybe 30 or 60 days (comply with regulation and user expectations of data being cleared)


Another option for clearing tape backups is to throw away the encryption keys, as https://www.youtube.com/watch?v=ejBncCrlAqc mentions


I asked the same question yesterday, and the responses were food for thought.

If you make backups, you are, almost by definition, unable to perform a full 'Compliance Delete' before the oldest backup in the set has expired.

Compliance-based deletion, if it is offered as a service, is almost always something time-based, like "we guarantee the data will be deleted 7 years from now". And then that deliberate deletion step is baked into the backup process.

So, i.m.o. at best they misrepresented the nature of the compliance deletion process. It never did what it was designed to do.


> Anyone else find it disturbing that they are able to restore data that they deleted permanently for "compliance" reasons?

An overarching theme with these things is “legitimate business need” and “no indefinitely retained customer data. Having backups, system event logs, etc are all legitimate business needs. Based on the data type that business need may be days or years with things like financial and legal requirements.

Youre conflating permanent, immediate, and irrevocable. These are usually handled in different aspects. Think of accounts having multiple states like active, suspended, closed, terminated, purged. Some examples;

suspended: credentials/authnz immediately disabled, all data online, charges continue to accrue, can be restored in minutes.

Closed: credentials disabled, data online, processing stopped, charges stopped, may take manual intervention (hours) to return to active.

Terminated: creds & account irrevocably unavailable, online data deleted, offline data (backups) remains available.

Purged: all online and offline customer data irrevocably unavailable. This generally happens after a defined retention period for things like logs, backups, etc.

You can apply similar concepts to individual resources more granularly than the account.

Disclaimer: principal at AWS but the above is my own opinion/observation and does not represent my employer.


One assumes the backups will rotate out and be discarded after a fixed time, so everything is "truly deleted" for compliance reasons.


Did you continue as their customer after that?


Nope. I exported our data after they restored the backup and then we cancelled less than a month later. Like I obviously understand suspending our logins, but why would you ever delete someone's data when it's literally only 160 KB of text? The whole thing made zero sense.


After I met my now-fiancée on OkCupid, I deactivated my profile, turned off notifications and forgot about it for a while. A while later, I thought it be nice to revisit the first messages we sent to each other, only to find that... OkCupid had deleted both of our accounts. They didn't give me any advance warning, either, because I turned off notifications, remember? :^)

I'm still kinda salty about it. I understand why big services can't retain data indefinitely, but like... it's just a few KB of text, and that text happens to have a lot of sentimental value. Besides, OkCupid knows that I deactivated my account because I am a success story -- why not hold onto those profiles a bit longer? Or better yet, how about emailing an archive of those messages immediately when you click the "I'm leaving because I'm in a happy relationship now" button? /rant


I kind of agree - a company I used to work at used free Slack for years, then HipChat (until Atlassian killed it - with good reason), then converted to paid Slack, and all of our chat history was still there - even the old stuff that gets hidden as part of the free plan.


With GDPR, privacy regulations and data breach regulations sweeping the globe, holding onto unnecessary data is a huge liability. Getting rid of data you no longer have clear consent to store, or which you're unlikely to have a clear business need to continue storing, is a sign of a good company these days.


Not if the customer doesn't ask for it. As long as the user has a profile, was aware that PII is stored and doesn't request deletion, GDPR won't ever force you to delete information. Otherwise GMail would have to start deleting old emails as well.


True, but likely not this kind of data.


Yes, this kind of data. Your OkCupid account has all kinds of information about who you associate with.


That's true but it was stored there with your explicit consent. The GDPR is first and foremost concerned with data that is stored about you without your consent or with data that continues to be stored about you after your explicit request for deletion. Or incorrect data that you have requested to be removed. See the wikipedia page on the GDPR or a bunch of articles that I wrote about this subject.

If they had obtained the data without you supplying it freely then that would have been an entirely different matter, especially if it was used in ways that you did not consent to. But since that does not appear to be the case here the GDPR applies like it does to all data that is directly related to a data subject but continuing to store it on behalf of the user(s) that supplied it is not a problem.

Note that the user here is disappointed that their data which they consented to be kept is no longer there. This is a pretty clear indication that as far as they are concerned their expectation was the even with the GDPR up and running that such data would continue to be preserved as it is in almost every service that existed prior to may 2018.

It is precisely this kind of panicky thinking around the whole subject of the GDPR that gives these irrational responses, companies that suddenly no longer dare to mail you but you have to log in to their portal, which is secured by your email address and more of these totally weird constructs.

If they wanted to delete this data the better way would have been to positively contact the user (so that you know that they have received your message) to ask if their data should be deleted or not. That's good stewardship, just tossing it isn't.


I don't think people write code saying "if accountSize < 160kB { skipDelete() }" - THAT would make zero sense. So, the size is not relevant here. The process was likely to delete data after some event occurred, or lack of event occurred.


Someone somewhere got a promotion sooner because they lowered the slope of a line a little bit.


Or some overzealous engineer said hey guys let's delete all data 7 days after an account is canceled. This is called over optimizing.


Such a decision is just as likely to have come from the legal/compliance team as an engineer. Data you no longer have clear consent or a legitimate business need to store is a liability, and if you operate in Europe, potentially illegal to continue storing.


It’s amazing how much stupid shit we do to keep the legal guys happy while their bosses are busy engaging in <checks news headlines%> tax evasion, graft, bribery, fraud, embezzlement, illegal dumping, sexual harassment, sexual assault, statutory rape, solicitation to commit murder, and my personal favorite and I’m sure yours too: human trafficking.

But sure, we can break all of our users to avoid the possibility of you having to write some legal briefs and us paying a small fine for keeping data 7 days instead of three.


> why would you ever delete someone's data when it's literally only 160 KB of text?

Compliance? The contract has expired, so there’s no legal basis for them to keep your data?


Seems like that could be addressed with some fine print in the initial agreements. "In the event that you stop paying us, we may keep your data for up to N days unless directed otherwise by you"--or similar.


Why would they bother?


Even with GDPR, you can easily keep that data around for a few months or a year unless the customer requests deletion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: