How We migrate a Joomla based website to a Django Backend with 90 Gb worth of data

Photo by Susan Q Yin on Unsplash

How We migrate a Joomla based website to a Django Backend with 90 Gb worth of data

Our process of redesigning a Joomla Site to a Django API back-end and migrating it's database data and file into it.

Context

A few years ago while I was still running my Django Consultancy Agency, We worked with a client who built a website over the years with a Content Management System (CMS) named Joomla and he was facing some issues such as adding new features, many hassles with managing the content, editing the various webpages and more importantly receiving payments in local currency (XAF, XOF) via local payment processors (Mobile Money) in addition to Paypal. The whole installation was a huge set of plugins and many patches here and there made over the years with all the incompatibility coming with it, all running on MySQL server, everything was installed within a VPS where all the uploaded files were hosted.

The website is an educational platform where visitors could create their account and download content (Books, Software, Watch Videos, etc), now the owner wanted it to have more features such as Audio Listening of PDF books, an improved Full-Text Search engine capable of indexing the content of each PDF book uploaded to the platform and monthly subscription to access the content and various other features. We did follow some steps to properly build a new Back-end with Django and migrate the data (database and files) in it. We also choose to host it using Amazon Web Service. There was a separate team working on a React Based Frontend, so we also had to produce many REST endpoints to be consumed.

Designing and Building the Django Back-end.

The first step for us was to design a system with Django that could accept the old data from Joomla and enable us to develop more features easily. We decided to do a design around some OOP concept such as inheritance, so we had many models (video, document, etc) which inherited from a main model (which hold most of the attributes shared by all the content available on the site), and then we added more models according to the new features (bookmarks, likes, billing plan, subscription, payments transactions, etc).

class Resource(models.Model):
    # common attribute
    pass

class Video(Resource):
    # Custom attributes and method
    pass

class Document(Resource):
    # Custom attributes and method
    pass

class Software(Resource):
    # Custom attributes and method
    pass

#... more models

We also exported the data into a CSV form to identify all the currently existing attributes per resource available on the platform and we designed our models based on those, we used them as the base for our work.

After all this, we made heavy usage of the Django Admin UI to avoid rebuilding a new UI from scratch, so we wrote some ModelAdmin Classes using a lot of inheritance and some optimized Queryset to fetch data from within the admin UI.

The PDF document was related to a model and it needed to be indexed for the full-text search engine, we chose Elasticsearch and used the post_create signals to process each uploaded PDF and extract all the text from it using PyPDF lib, once done the text is sent to Elasticsearch alongside the object serialized as a simple elastic search Document.

Database Data migration

After having designed how the data were saved in the database and indexed it was time to put in place some tools to migrate data from the old system and make sure they fit easily, since we were able to extract data in CSV form we used django-import-export in the admin UI to import each model's data from the CSV file and also we wrote some Django commands to make sure everything was properly in sync with our settings such as the static URLs configuration; For example, some paths were absolute so we needed some script to normalize them, some other CSV file where not straightforwardly usable, so we had to write some command to make sure they are transformed into a proper object and all the relationship were rebuilt, We also need to import the users, we had made the user import process to work via Django command that will load the CSV, create the user and set an unusable password, so the users were prompted to change their password on new login.

Uploaded File Migration to AWS S3

At this point we had the data available in the database as well as their file path, we had to copy data from the VPS to an S3 bucket following a few steps:

  1. We installed the AWS CLI on the server

  2. Start a copy process with aws s3 cp command to upload from the server to S3

  3. Update the missing file path in the S3 bucket or the database.

Once the data are available on S3 we could do whatever we wanted and we even added a CloudFront distribution to serve the data efficiently.

From a security standpoint, all the files are private so we added some functions to get a pre-signed URL when a file was requested.

Links & Resources

Here is a little set of links to the various tools we used in the process.

Optimizing Django Admin Queries

How to transfert data from a server to S3

Use Elastic Search with Django

Django Import Export Documentation


Thank You for reading ;)