How We implemented Audit in our SaaS Django Platform

We needed a system on our platform to keep a track of all changes that was happening to easily responds to support requests from our clients.

How We implemented Audit in our SaaS Django Platform

Context: what is Audit?

A while back with my team when working on a project we needed to add some audit features on the platform to be able to trace what happened in the app and show it to the end users and use it also for customer support requests, we were receiving at that time a lot of requests requiring us to know what change has happened on various objects in the database, so we need a system to help us achieve that. And we knew that when building a web application, especially when it has some business value it's important to provide the ability for the end user to know who did what action, or the changes that occurred on a specific object in the database and who did that change.

For example, if an invoice has been created, processed and validated, and later on we need to display all the history of changes on it, we need to build an audit trail, a system of record that can hold the history of changes like logs; this is called maintaining an audit trail.

The basic Principle

regardless of what framework you are using the principle remains the same, you need to find a way to listen to everything that happens in the application and log those somewhere, it could be a file, a database table or anything else, as long as it can keep all the data you send to it.

With Django Framework the simple way to do that is by connecting to various signals like post_save or m2m_changed on all the models at once (it's possible) and process the signal's data to save them somewhere as events, this should ideally be done in a dedicated thread or asynchronously to avoid slowing the application.

Therefore it will be possible as well to choose what type of event should be logged or what models should be tracked, you got the idea ;).

The available solutions in Django packages

There are several packages to achieve this but will showcase only 2 of them here because is actually used them both and they work differently internally.

The django-easy-audit package

github.com/soynatan/django-easy-audit

This package is installed via the command pip install django-easy-audit and added to the project's settings like this:

INSTALLED_APPS = [
    #...
    'easyaudit',
]

MIDDLEWARE = (
    #...
    'easyaudit.middleware.easyaudit.EasyAuditMiddleware',
)

It provides the ability to watch a lot of events such as login , crud, HTTP request in the project and save them into a dedicated set of database tables (models): CRUDEvent , LoginEvent and RequestEvent .

There is a set of settings to change how it works or what models it tracks. such as:

  • DJANGO_EASY_AUDIT_WATCH_MODEL_EVENTS

  • DJANGO_EASY_AUDIT_WATCH_AUTH_EVENTS

  • DJANGO_EASY_AUDIT_WATCH_REQUEST_EVENTS

These settings are used to disable/enable the specific type of event tracking.

The django-simple-history package

This package provides the same result but it behaves slightly differently, It mirrors all the table in the database and store each object changes in the mirror table related to the model. And provide an attribute that you can add to each model to access each instance's history. To add this package to the project you need to run : pip install django-simple-history and set it up in the project like this:

INSTALLED_APPS = [
    # ...
    'simple_history',
]
MIDDLEWARE = [
    # ...
    'simple_history.middleware.HistoryRequestMiddleware',
]

And add the history attribute to all the models you need to track

from simple_history.models import HistoricalRecords
class SomeModel(models.Model):
    history = HistoricalRecords()

And create a migration before running it.

python manage.py makemigrations
python manage.py migrate

It will add the history attribute to the SomeModel class and create a app_historicalsomemodel , the table where all the changes happening on the model will appear. It's well explained here: django-simple-history.readthedocs.io/en/lat...

This means the number of tables in the database will probably get multiplied by 2, at least all the models you wrote if you desire to track them all. To access the history (audit trail) of a specific model's instance it's done by using someModel.history.all(), It returns a QuerySet of SomeModel with the various version of the object over time (since his creation) and you can use queryset filters to get whatever version you want.

Summary, what to remember

We did test these two and choose django-easy-audit for some reason, I will give you some pros and cons of the 2 libs.

django-easy-audit

Pros:

  • Very simple to install in the project

  • Add only 3 tables to the database

  • Requires no changes to the models or the project

Cons:

  • It saves all the model's data in the same table, which make the number of rows grow fast (according to the number of the model being monitored)

  • Provide no functions or utility to browse an object's history easily

  • Provide a really simple admin integration, just a regular list of events containing JSON objects that need to be processed by the person using the admin

django-simple-history

Pros:

  • Provide a simple API to navigate the model's history via .history.all()

  • Avoid storing too much data in the same table

  • Provide a good admin integration to navigate the object's history

Cons

  • Create a copy of each table, which can almost duplicate the number of tables if you track all the data

  • In some cases migrations were not applied or well applied, the package seems to be the cause since it stopped happening when we removed it from the project.

What I think

I don't like the idea of having too many tables in a database, so I will go most of the time with django-easy-audit, but I also think audits (model's history) don't need to be stored in the database, since they are anyway not used too much most of the time, it makes more sense to me to store them in another system, file, object storage, stream processing system etc. Just like you would do with logs and logs files.

An improvement I think could be done on these packages is to send the generated data into an external system like a stream of data and avoid putting too much data into the database and retrieving them on demand.

Edit: after a comment by Tom Dyson, it turns out there is a way to achieve this in django-easy-audit , simply by using the DJANGO_EASY_AUDIT_LOGGING_BACKEND setting s and implementing a method to send the data into another logging system as in the example:

  import logging

  class PythonLoggerBackend:
      logging.basicConfig()
      logger = logging.getLogger('your-kibana-logger')
      logger.setLevel(logging.DEBUG)

      def request(self, request_info):
          return request_info # if you don't need it

      def login(self, login_info):
          self.logger.info(msg='your message', extra=login_info)
          return login_info

      def crud(self, crud_info):
          self.logger.info(msg='your message', extra=crud_info)
          return crud_info

learn more here: https://github.com/soynatan/django-easy-audit#settings

Resources

https://github.com/soynatan/django-easy-audit

https://django-simple-history.readthedocs.io/en/latest/index.html