A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://realpython.com/modeling-polymorphism-django-python/ below:

Modeling Polymorphism in Django With Python – Real Python

Modeling polymorphism in relational databases is a challenging task. In this article, we present several modeling techniques to represent polymorphic objects in a relational database using the Django object-relational mapping (ORM).

This intermediate-level tutorial is designed for readers who are already familiar with the fundamental design of Django.

Free Bonus: Click here to get the most popular Django tutorials and resources on Real Python and improve your Django + Python web development skills.

What Is Polymorphism?

Polymorphism is the ability of an object to take on many forms. Common examples of polymorphic objects include event streams, different types of users, and products in an e-commerce website. A polymorphic model is used when a single entity requires different functionality or information.

In the examples above, all events are logged for future use, but they can contain different data. All users need be able to log in, but they might have different profile structures. In every e-commerce website, a user wants to put different products in their shopping cart.

Why Is Modeling Polymorphism Challenging?

There are many ways to model polymorphism. Some approaches use standard features of the Django ORM, and some use special features of the Django ORM. The main challenges you’ll encounter when modeling polymorphic objects are the following:

To truly understand the challenges of modeling polymorphism, you are going to take a small bookstore from its first online website to a big online shop selling all sorts of products. Along the way, you’ll experience and analyze different approaches for modeling polymorphism using the Django ORM.

Note: To follow this tutorial, it is recommended that you use a PostgreSQL backend, Django 2.x, and Python 3.

It’s possible to follow along with other database backends as well. In places where features unique to PostgreSQL are used, an alternative will be presented for other databases.

Naive Implementation

You have a bookstore in a nice part of town right next to a coffee shop, and you want to start selling books online.

You sell only one type of product: books. In your online store, you want to show details about the books, like name and price. You want your users to browse around the website and collect many books, so you also need a cart. You eventually need to ship the books to the user, so you need to know the weight of each book to calculate the delivery fee.

Let’s create a simple model for your new book store:

To create a new book, you provide a name, price, and weight:

To create a cart, you first need to associate it with a user:

Then the user can start adding items to it:

Pro

Con

Sparse Model

With the success of your online bookstore, users started to ask if you also sell e-books. E-books are a great product for your online store, and you want to start selling them right away.

A physical book is different from an e-book:

To make your existing model support the additional information for selling e-books, you add some fields to the existing Book model:

First, you added a type field to indicate what type of book it is. Then, you added a URL field to store the download link of the e-book.

To add a physical book to your bookstore, do the following:

To add a new e-book, you do the following:

Your users can now add both books and e-books to the cart:

The virtual books are a big hit, and you decide to hire employees. The new employees are apparently not so tech savvy, and you start seeing weird things in the database:

That book apparently weighs 0 pounds and has a download link.

This e-book apparently weighs 100g and has no download link:

This doesn’t make any sense. You have a data integrity problem.

To overcome integrity problems, you add validations to the model:

You used Django’s built-in validation mechanism to enforce data integrity rules. clean() is only called automatically by Django forms. For objects that are not created by a Django form, you need to make sure to explicitly validate the object.

To keep the integrity of the Book model intact, you need to make a little change to the way you create books:

When creating objects using the default manager (Book.objects.create(...)), Django will create an object and immediately persist it to the database.

In your case, you want to validate the object before saving if to the database. You first create the object (Book(...)), validate it (book.full_clean()), and only then save it (book.save()).

Denormalization:

A sparse model is a product of denormalization. In a denormalization process, you inline attributes from multiple normalized models into a single table for better performance. A denormalized table will usually have a lot of nullable columns.

Denormalizing is often used in decision support systems such as data warehouses where read performance is most important. Unlike OLTP systems, data warehouses are usually not required to enforce data integrity rules, which makes denormalization ideal.

Pro

Cons

Use Case

The sparse model is ideal when you’re representing heterogeneous objects that share most attributes, and when new items are not added very often.

Semi-Structured Model

Your bookstore is now a huge success, and you are selling more and more books. You have books from different genres and publishers, e-books with different formats, books with odd shapes and sizes, and so on.

In the sparse model approach, you added fields for every new type of product. The model now has a lot of nullable fields, and new developers and employees are having trouble keeping up.

To address the clutter, you decide to keep only the common fields (name and price) on the model. You store the rest of the fields in a single JSONField:

JSONField:

In this example, you use PostgreSQL as a database backend. Django provides a built-in JSON field for PostgreSQL in django.contrib.postgres.fields.

For other databases, such as SQLite and MySQL, there are packages that provide similar functionality.

Your Book model is now clutter-free. Common attributes are modeled as fields. Attributes that are not common to all types of products are stored in the extra JSON field:

Clearing up the clutter is important, but it comes with a cost. The validation logic is a lot more complicated:

The benefit of using a proper field is that it validates the type. Both Django and the Django ORM can perform checks to make sure the right type is used for the field. When using a JSONField, you need to validate both the type and the value:

Another issue with using JSON is that not all databases have proper support for querying and indexing values in JSON fields.

In PostgreSQL for example, you can query all the books that weigh more than 100:

However, not all database vendors support that.

Another restriction imposed when using JSON is that you are unable to use database constraints such as not null, unique, and foreign keys. You will have to implement these constraints in the application.

This semi-structured approach resembles NoSQL architecture and has many of its advantages and disadvantages. The JSON field is a way to get around the strict schema of a relational database. This hybrid approach provides us with the flexibility to squash many object types into a single table while still maintaining some of the benefits of a relational, strictly and strongly typed database. For many common NoSQL use cases, this approach might actually be more suitable.

Pros

Cons

Use Case

A semi-structured model is ideal when you need to represent heterogeneous objects that don’t share many common attributes, and when new items are added often.

A classic use case for the semi-structured approach is storing events (like logs, analytics, and event stores). Most events have a timestamp, type and metadata like device, user agent, user, and so on. The data for each type is stored in a JSON field. For analytics and log events, it’s important to be able to add new types of events with minimal effort, so this approach is ideal.

Abstract Base Model

So far, you’ve worked around the problem of actually treating your products as heterogeneous. You worked under the assumption that the differences between the products is minimal, so it made sense to maintain them in the same model. This assumption can take you only so far.

Your little store is growing fast, and you want to start selling entirely different types of products, such as e-readers, pens, and notebooks.

A book and an e-book are both products. A product is defined using common attributes such as name and price. In an object-oriented environment, you could look at a Product as a base class or an interface. Every new type of product you add must implement the Product class and extend it with its own attributes.

Django offers the ability to create abstract base classes. Let’s define a Product abstract base class and add two models for Book and EBook:

Notice that both Book and EBook inherit from Product. The fields defined in the base class Product are inherited, so the derived models Book and Ebook don’t need to repeat them.

To add new products, you use the derived classes:

You might have noticed that the Cart model is missing. You can try to create a Cart model with a ManyToMany field to Product:

If you try to reference a ManyToMany field to an abstract model, you will get the following error:

A foreign key constraint can only point to a concrete table. The abstract base model Product only exists in the code, so there is no products table in the database. The Django ORM will only create tables for the derived models Book and EBook.

Given that you can’t reference the abstract base class Product, you need to reference books and e-books directly:

You can now add both books and e-books to the cart:

This model is a bit more complicated now. Let’s query the total price of the items in the cart:

Because you have more than one type of book, you use Coalesce to fetch either the price of the book or the price of the e-book for each row.

Pro

Cons

Use Case

An abstract base model is a good choice when there are very few types of objects that required very distinct logic.

An intuitive example is modeling a payment process for your online shop. You want to accept payments with credit cards, PayPal, and store credit. Each payment method goes through a very different process that requires very distinct logic. Adding a new type of payment is not very common, and you don’t plan on adding new payment methods in the near future.

You create a payment process base class with derived classes for credit card payment process, PayPal payment process, and store credit payment process. For each of the derived classes, you implement the payment process in a very different way that cannot be easily shared. In this case, it might make sense to handle each payment process specifically.

Concrete Base Model

Django offers another way to implement inheritance in models. Instead of using an abstract base class that only exists in the code, you can make the base class concrete. “Concrete” means that the base class exists in the database as a table, unlike in the abstract base class solution, where the base class only exists in the code.

Using the abstract base model, you were unable to reference multiple type of products. You were forced to create a many-to-many relation for each type of product. This made it harder to perform tasks on the common fields such as getting the total price of all the items in the cart.

Using a concrete base class, Django will create a table in the database for the Product model. The Product model will have all the common fields you defined in the base model. Derived models such as Book and EBook will reference the Product table using a one-to-one field. To reference a product, you create a foreign key to the base model:

The only difference between this example and the previous one is that the Product model is not defined with abstract=True.

To create new products, you use derived Book and EBook models directly:

In the case of concrete base class, it’s interesting to see what’s happening in the underlying database. Let’s look at the tables created by Django in the database:

The product table has two familiar fields: name and price. These are the common fields you defined in the Product model. Django also created an ID primary key for you.

In the constraints section, you see multiple tables that are referencing the product table. Two tables that stand out are concrete_base_model_book and concrete_base_model_ebook:

The Book model has only two fields:

Behind the scenes, Django created a base table for product. Then, for each derived model, Django created another table that includes the additional fields, and a field that acts both as a primary key and a foreign key to the product table.

Let’s take a look at a query generated by Django to fetch a single book. Here are the results of print(Book.objects.filter(pk=1).query):

To fetch a single book, Django joined concrete_base_model_product and concrete_base_model_book on the product_ptr_id field. The name and price are in the product table and the weight is in the book table.

Since all the products are managed in the Product table, you can now reference it in a foreign key from the Cart model:

Adding items to the cart is the same as before:

Working with common fields is also simple:

Migrating base classes in Django:

When a derived model is created, Django adds a bases attribute to the migration:

If in the future you remove or change the base class, Django might not be able to perform the migration automatically. You might get this error:

This is a known issue in Django (#23818, #23521, #26488). To work around it, you must edit the original migration manually and adjust the bases attribute.

Pros

Cons

Use Case

The concrete base model approach is useful when common fields in the base class are sufficient to satisfy most common queries.

For example, if you often need to query for the cart total price, show a list of items in the cart, or run ad hoc analytic queries on the cart model, you can benefit from having all the common attributes in a single database table.

Generic Foreign Key

Inheritance can sometimes be a nasty business. It forces you to create (possibly premature) abstractions, and it doesn’t always fit nicely into the ORM.

The main problem you have is referencing different products from the cart model. You first tried to squash all the product types into one model (sparse model, semi-structured model), and you got clutter. Then you tried splitting products into separate models and providing a unified interface using a concrete base model. You got a complicated schema and a lot of joins.

Django offers a special way of referencing any model in the project called GenericForeignKey. Generic foreign keys are part of the Content Types framework built into Django. The content type framework is used by Django itself to keep track of models. This is necessary for some core capabilities such as migrations and permissions.

To better understand what content types are and how they facilitate generic foreign keys, let’s look at the content type related to the Book model:

Each model has a unique identifier. If you want to reference a book with PK 54, you can say, “Get object with PK 54 in the model represented by content type 22.”

GenericForeignKey is implemented exactly like that. To create a generic foreign key, you define two fields:

To implement a many-to-many relation using GenericForeignKey, you need to manually create a model to connect carts with items.

The Cart model remains roughly similar to what you have seen so far:

Unlike previous Cart models, this Cart no longer includes a ManyToMany field. You are going need to do that yourself.

To represent a single item in the cart, you need to reference both the cart and any product:

To add a new item in the Cart, you provide the content type and the primary key:

Adding an item to a cart is a common task. You can add a method on the cart to add any product to the cart:

Adding a new item to a cart is now much shorter:

Getting information about the items in the cart is also possible:

So far so good. Where’s the catch?

Let’s try to calculate the total price of the products in the cart:

Django tells us it isn’t possible to traverse the generic relation from the generic model to the referenced model. The reason for that is that Django has no idea which table to join to. Remember, the Item model can point to any ContentType.

The error message does mention a GenericRelation. Using a GenericRelation, you can define a reverse relation from the referenced model to the Item model. For example, you can define a reverse relation from the Book model to items of books:

Using the reverse relation, you can answer questions like how many carts include a specific book:

The two statement are identical.

You still need to know the price of the entire cart. You already saw that fetching the price from each product table is impossible using the ORM. To do that, you have to iterate the items, fetch each item separately, and aggregate:

This is one of the major disadvantages of generic foreign keys. The flexibility comes with a great performance cost. It’s very hard to optimize for performance using just the Django ORM.

Structural Subtyping

In the abstract and concrete base class approaches, you used nominal subtyping, which is based on a class hierarchy. Mypy is able to detect this form of relation between two classes and infer types from it.

In the generic relation approach, you used structural subtyping. Structural subtyping exists when a class implements all the methods and attributes of another class. This form of subtyping is very useful when you wish to avoid direct dependency between modules.

Mypy provides a way to utilize structural subtyping using Protocols.

You already identified a product entity with common methods and attributes. You can define a Protocol:

Note: The use of class attributes and ellipses (...) in the method definition are new features in Python 3.7. In earlier versions of Python, it isn’t possible to define a Protocol using this syntax. Instead of an ellipsis, methods should have pass in the body. Class attributes such as pk and name can be defined using the @attribute decorator, but it will not work with Django models.

You can now use the Product protocol to add type information. For example, in add_item(), you accept an instance of a product and add it to the cart:

Running mypy on this function will not yield any warnings. Let’s say you change product.pk to product.id, which is not defined in the Product protocol:

You will get the following warning from Mypy:

Note: Protocol is not yet a part of Mypy. It’s part of the complementary package called mypy_extentions. The package is developed by the Mypy team and includes features that they thought weren’t ready for the main Mypy package yet.

Pros

Cons

Use Case

Generic foreign keys are a great choice for pluggable modules or existing projects. The use of GenericForeignKey and structural subtyping abstract any direct dependency between the modules.

In the bookstore example, the book and e-book models can exist in a separate app and new products can be added without changing the cart module. For existing projects, a Cart module can be added with minimal changes to existing code.

The patterns presented in this article play nicely together. Using a mixture of patterns, you can eliminate some of the disadvantages and optimize the schema for your use case.

For example, in the generic foreign key approach, you were unable to get the price of the entire cart quickly. You had to fetch each item separately and aggregate. You can address this specific concern by inlining the price of the product on the Item model (the sparse model approach). This will allow you to query only the Item model to get the total price very quickly.

Conclusion

In this article, you started with a small town bookstore and grew it to a big e-commerce website. You tackled different types of problems and adjusted your model to accommodate the changes. You learned that problems such as complex code and difficulty adding new programmers to the team are often symptoms of a larger problem. You learned how to identify these problems and solve them.

You now know how to plan and implement a polymorphic model using the Django ORM. You’re familiar with multiple approaches, and you understand their pros and cons. You’re able to analyze your use case and decide on the best course of action.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4