Data Migration is a very convenient way to change the data in the database in conjunction with changes in the schema. They work like a regular schema migration. Django keep track of dependencies, order of execution and if the application already applied a given data migration or not.
A common use case of data migrations is when we need to introduce new fields that are non-nullable. Or when we are creating a new field to store a cached count of something, so we can create the new field and add the initial count.
In this post we are going to explore a simple example that you can very easily extend and modify for your needs.
Data MigrationsLetâs suppose we have an app named blog, which is installed in our projectâs INSTALLED_APPS
.
The blog have the following model definition:
blog/models.py
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=255)
date = models.DateTimeField(auto_now_add=True)
content = models.TextField()
def __str__(self):
return self.title
The application is already using this Post model; itâs already in production and there are plenty of data stored in the database.
id title date content 1 How to Render Django Form Manually 2017-09-26 11:01:20.547000 [â¦] 2 How to Use Celery and RabbitMQ with Django 2017-09-26 11:01:39.251000 [â¦] 3 How to Setup Amazon S3 in a Django Project 2017-09-26 11:01:49.669000 [â¦] 4 How to Configure Mailgun To Send Emails in a Django Project 2017-09-26 11:02:00.131000 [â¦]Now letâs say we want to introduce a new field named slug which will be used to compose the new URLs of the blog. The slug field must be unique and not null.
Generally speaking, always add new fields either as null=True
or with a default
value. If we canât solve the problem with the default
parameter, first create the field as null=True
then create a data migration for it. After that we can then create a new migration to set the field as null=False
.
Here is how we can do it:
blog/models.py
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=255)
date = models.DateTimeField(auto_now_add=True)
content = models.TextField()
slug = models.SlugField(null=True)
def __str__(self):
return self.title
Create the migration:
python manage.py makemigrations blog
Migrations for 'blog':
blog/migrations/0002_post_slug.py
- Add field slug to post
Apply it:
python manage.py migrate blog
Operations to perform:
Apply all migrations: blog
Running migrations:
Applying blog.0002_post_slug... OK
At this point, the database already have the slug column.
id title date content slug 1 How to Render Django Form Manually 2017-09-26 11:01:20.547000 [â¦] (null) 2 How to Use Celery and RabbitMQ with Django 2017-09-26 11:01:39.251000 [â¦] (null) 3 How to Setup Amazon S3 in a Django Project 2017-09-26 11:01:49.669000 [â¦] (null) 4 How to Configure Mailgun To Send Emails in a Django Project 2017-09-26 11:02:00.131000 [â¦] (null)Create an empty migration with the following command:
python manage.py makemigrations blog --empty
Migrations for 'blog':
blog/migrations/0003_auto_20170926_1105.py
Now open the file 0003_auto_20170926_1105.py, and it should have the following contents:
blog/migrations/0003_auto_20170926_1105.py
# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('blog', '0002_post_slug'),
]
operations = [
]
Then here in this file, we can create a function that can be executed by the RunPython
command:
blog/migrations/0003_auto_20170926_1105.py
# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-09-26 11:05
from __future__ import unicode_literals
from django.db import migrations
from django.utils.text import slugify
def slugify_title(apps, schema_editor):
'''
We can't import the Post model directly as it may be a newer
version than this migration expects. We use the historical version.
'''
Post = apps.get_model('blog', 'Post')
for post in Post.objects.all():
post.slug = slugify(post.title)
post.save()
class Migration(migrations.Migration):
dependencies = [
('blog', '0002_post_slug'),
]
operations = [
migrations.RunPython(slugify_title),
]
In the example above we are using the slugify
utility function. It takes a string as parameter and transform it in a slug. See below some examples:
from django.utils.text import slugify
slugify('Hello, World!')
'hello-world'
slugify('How to Extend the Django User Model')
'how-to-extend-the-django-user-model'
Anyway, the function used by the RunPython
method to create a data migration, expects two parameters: apps and schema_editor. The RunPython
will feed those parameters. Also remember to import models using the apps.get_model('app_name', 'model_name')
method.
Save the file and execute the migration as you would do with a regular model migration:
python manage.py migrate blog
Operations to perform:
Apply all migrations: blog
Running migrations:
Applying blog.0003_auto_20170926_1105... OK
Now if we check the database:
id title date content slug 1 How to Render Django Form Manually 2017-09-26 11:01:20.547000 [â¦] how-to-render-django-form-manually 2 How to Use Celery and RabbitMQ with Django 2017-09-26 11:01:39.251000 [â¦] how-to-use-celery-and-rabbitmq-with-django 3 How to Setup Amazon S3 in a Django Project 2017-09-26 11:01:49.669000 [â¦] how-to-setup-amazon-s3-in-a-django-project 4 How to Configure Mailgun To Send Emails in a Django Project 2017-09-26 11:02:00.131000 [â¦] how-to-configure-mailgun-to-send-emails-in-a-django-projectEvery Post entry have a value, so we can safely change the switch from null=True
to null=False
. And since all the values are unique, we can also add the unique=True
flag.
Change the model:
blog/models.py
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=255)
date = models.DateTimeField(auto_now_add=True)
content = models.TextField()
slug = models.SlugField(null=False, unique=True)
def __str__(self):
return self.title
Create a new migration:
python manage.py makemigrations blog
This time you will see the following prompt:
You are trying to change the nullable field 'slug' on post to non-nullable without a default; we can't do that
(the database needs something to populate existing rows).
Please select a fix:
1) Provide a one-off default now (will be set on all existing rows with a null value for this column)
2) Ignore for now, and let me handle existing rows with NULL myself (e.g. because you added a RunPython or RunSQL
operation to handle NULL values in a previous data migration)
3) Quit, and let me add a default in models.py
Select an option:
Select option 2 by typing â2â in the terminal.
Migrations for 'blog':
blog/migrations/0004_auto_20170926_1422.py
- Alter field slug on post
Now we can safely apply the migration:
python manage.py migrate blog
Operations to perform:
Apply all migrations: blog
Running migrations:
Applying blog.0004_auto_20170926_1422... OK
Conclusions
Data migrations are tricky sometimes. When creating data migration for your projects, always examine the production data first. The implementation of the slugify_title I used in the example is a little naïve, because it could generate duplicate titles for a large dataset. Always test the data migrations first in a staging environment, so to avoid breaking things in production.
Itâs also important to do it step-by-step, so you can feel in control of the changes you are introducing. Note that here I create three migration files for a simple data migration.
As you can see, itâs fairly easy to create this type of migration. Itâs also very flexible. You could for example load an external text file to insert the data into a new column for example.
The source code used in this blog post is available on GitHub: https://github.com/sibtc/data-migrations-example
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4