Delete Events and the Act of Forgetting

delete_events feature makes it easy to delete individual event logs for GDPR compliance and privacy
PJ Hagerty
April 10, 2019

Logs are meant to record the running history of your system - the application and infrastructure of what you are trying to deliver. Humio enables this observability by combining information from the past and what is happening now. In the modern world, we want to retain as much information as possible, but just as important as remembering, is the act of forgetting.

Recent research suggests that forgetting can be just as vital a function for our minds as remembering. Humio understands the importance of remembering everything when it comes to data. Our highly-efficient data compression and purpose-built time-series database allows for all relevant log data to be instantly searched.

But there are rare instances when data must be excised from the log. So Humio recently introduced delete_events, giving support for deleting individual events from compressed segment files from your logs. Let’s say you need to remove data under the recently-passed GDPR laws to have all information on a user deleted from the database. Humio allows a company to be compliant by being able to remove that data from the log. Or maybe an application mistakenly has logged some private information into the log-stream, allowing it to be viewed by other parties. Using the delete_events feature allows you to have such data excised from the log.

  Delete Events

The information that you want to delete is likely not just the FIRST_NAME and LAST_NAME columns in a relational database. It is scattered all over the place: in log statements, request logs, text messages, etc. It might be the person’s phone number that is mentioned in the middle of a text and not their unique user id. Humio makes it easy to blanket match against everything, any text and any field, both structured or unstructured, which means you will find all that you are looking for.

Take a query like:

(/john/i AND /doe/i) OR OR “+1 290 112 218” OR user_id=718

Here we are looking for anything that contains both “John” and “Doe” in any order and case insensitive, or anything with his email or phone number or that has a field user_id that is associated with that person. A real life example would probably have a lot more. And once you found them, just delete, and poof! instantly gone.

To further clarify, delete_events is not a means for saving space or speeding up searches. It’s a tool to be deployed for exceptional cases, be they legal or technical. Humio’s delete mechanism is used to rewrite the relevant parts of the segment files and to wipe clear the records of the events. Depending on the segments to be revised, this can be a non-trivial operation.

For delete_events to operate, a user must be authorized to initiate such changes. Here is an example using the REST API deleting all events with a password field in the specified time interval in milliseconds:

curl -v$REPOSITORY_NAME/deleteevents
 -H "Authorization: Bearer $TOKEN"
 -H "Content-Type: application/json"
 -d '{"queryString": "password=*", "startTime": 1551074900671, "endTime": 1551123730700}

If the delete requested was properly scheduled, the endpoint will return the HTTP status code 201 (Created) and the entity returned will be the internal ID of the delete. If you have to track the execution of the delete to make sure it is carried through correctly, you can also use the internal ID.

Additionally, delete_events can be combined with Humio’s audit logging features in order to ensure things like GDPR compliance, or simply to purge logs of old, irrelevant, or unnecessary data.

For the Graphql mutation, it will look like deleteEvents and the list of pending deletes being processed in the background is available under that name as well. Just like that, Humio makes the very important task of forgetting easy to remember.

The delete_events feature is currently in beta, but should be available to everyone in the near future - read the docs to learn more.

Keep an eye out for more great features as the Humio team continually delivers better performance and the best features to help with observability and your log management needs. Join our Humio Community Slack or tweet us @MeetHumio with any questions - we're here to help!