Mastering Reporting in Django with PostgreSQL Views.

P

PostgreSQL views are a powerful tool for creating virtual tables that can be used for reporting purposes. This article will explore using PostgreSQL views and Django migrations to create and update views. We will also discuss optimization techniques for improving the performance of views and how to work with complex views that involve joining multiple tables and using subqueries. By the end of this article, you will have a solid understanding of how to leverage PostgreSQL views for reporting purposes in your Django projects.

Introduction to PostgreSQL views

What are the views in PostgreSQL?

Views in PostgreSQL are virtual tables based on a query's result. They allow you to encapsulate complex queries into a single object that can be treated like a table. Views provide a way to simplify the complexity of querying data by abstracting away the underlying query logic.

Views can be used to:

  • Simplify complex queries by breaking them down into smaller, more manageable pieces.
  • Hide sensitive or confidential data by only exposing specific columns or rows.
  • Provide a consistent interface for accessing data, even if the underlying table structure changes.

Views are handy when working with large and complex databases, as they allow you to organize and structure your data to make it easier to operate and understand.

Tip: When creating views, it’s essential to consider the performance implications. Views can introduce additional overhead, so optimizing them for the specific use case is necessary.

Advantages of using views

As a developer, one of the advantages of using views in PostgreSQL is the ability to abstract complex queries into a single, reusable object. This allows for cleaner, more maintainable code and improved performance.

Views also provide a way to simplify data access by presenting a logical representation of the underlying tables. This can be particularly useful when working with large and complex databases, as it allows us to focus on the specific data they need without understanding the underlying database structure. In my experience, these views are heavily used as a source for Microsoft Power BI or AWS QuickSight to visualize the data for dashboards.

Additionally, views can enhance security by limiting the data exposed to users. By granting access to views instead of tables, developers can control what data is visible and ensure that sensitive information is protected.

When working with views, it’s essential to remember that they are virtual tables and do not store any data. Instead, they provide a way to query and manipulate data from one or more tables in a structured and efficient manner.

Creating views in PostgreSQL

Creating views in PostgreSQL is a straightforward process. Views are virtual tables that are based on the result of a query. They allow us to encapsulate complex queries and reuse them like tables. To create a view, we use the CREATE VIEW statement followed by the view name and the query that defines the view.

Here’s an example of creating a view that shows the total sales for each product:

CREATE VIEW product_sales AS
SELECT product_id, SUM(quantity) AS total_sales
FROM sales
GROUP BY product_id;

This view can then be used in other queries or joined with other tables to retrieve the total sales for each product.

Using Django migrations with PostgreSQL views

Setting up Django project with PostgreSQL

Setting up a Django project with PostgreSQL is a straightforward process. Here are the steps to follow:

  1. Install PostgreSQL and create a new database for your project. Or use Docker to setup a PostgreSQL database
  2. Update the DATABASES setting in your Django project’s settings.py file to use PostgreSQL as the database backend.
  3. Install the psycopg2-binary package, which is the PostgreSQL adapter for Python.
  4. Run the Django migrations to create the necessary tables in the database.

My Docker Compose setup for this tutorial:

version: "3"

# external services to connect to
services:

  postgres:
    image: postgres:12
    container_name: tutorial_postgres
    restart: always
    volumes:
      - tutorial-postgres:/var/lib/postgresql/data
    ports:
      - "5439:5432"
    env_file: .env
    healthcheck:
      test: ["CMD-SHELL", "sh -c 'pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}'"]
      interval: 5s
      timeout: 5s
      retries: 5

volumes:
  tutorial-postgres:

Django Settings:


DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        "NAME": os.environ.get("POSTGRES_DB"),
        "USER": os.environ.get("POSTGRES_USER"),
        "PASSWORD": os.environ.get("POSTGRES_PASSWORD"),
        "HOST": os.environ.get("POSTGRES_HOST"),
        "PORT": os.environ.get("POSTGRES_PORT"),
    }
}

For demonstration purposes I’ve created orders app, and added models:

from django.db import models
from django.conf import settings

class Customer(models.Model):
    user = models.OneToOneField(
        settings.AUTH_USER_MODEL,
        on_delete=models.CASCADE
    )
    address = models.TextField()

    def __str__(self):
        return self.user.username

class Order(models.Model):
    customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
    product_name = models.CharField(max_length=100)
    quantity = models.IntegerField()
    price = models.DecimalField(max_digits=10, decimal_places=2)
    order_date = models.DateField()

    def __str__(self):
        return f"{self.product_name} - {self.quantity}"

class Invoice(models.Model):
    order = models.OneToOneField(Order, on_delete=models.CASCADE)
    invoice_date = models.DateField()
    total_amount = models.DecimalField(max_digits=10, decimal_places=2)

    def __str__(self):
        return f"Invoice for Order: {self.order.id}"

Once you have completed these steps, your Django project will be set up to work with PostgreSQL. You can now start creating and using PostgreSQL views for reporting purposes.

Creating a migration for a PostgreSQL view

When creating a migration for a view in Django, it is important to understand the steps involved. First, you need to create a new migration file using the makemigrations command. This command will generate a new migration file that includes the necessary SQL statements to create the view. For detailed explanation check the Django’s official documentation.

# command
# python manage.py makemigrations --empty <yourappname> --name <migration_name>

# example
python manage.py makemigrations --empty orders --name raw_sql_view

>> Migrations for 'orders':
>>   orders/migrations/0002_raw_sql_view.py

Next, you can define the view using the migrations.RunSQL operation in the migration file. This operation allows you to execute raw SQL statements to create the view.

Here’s an example:

from django.db import migrations

class Migration(migrations.Migration):

    dependencies = [
        ('orders', '0001_initial'),
    ]

    operations = [
        migrations.RunSQL(
            'CREATE VIEW my_simple_view AS SELECT * FROM orders_invoice',
            'DROP VIEW my_simple_view'
        ),
    ]

Note that you need to specify both the SQL statement to create the view and the SQL statement to drop the view in case the migration needs to be rolled back.

Once the migration file is created, you can apply it using the migrate command. This will execute the SQL statements in the migration file and create the view in the database.

Screenshot 2024-01-31 at 02.29.52.png

It is important to keep in mind that views are read-only, so you won’t be able to perform any write operations on them. If you need to update the view, you will need to create a new migration file that includes the necessary SQL statements to update the view.

Updating views with migrations

Updating views with migrations is a straightforward process in Django. Once a view has been created, any changes to the underlying tables or columns can be easily reflected in the view using migrations.

python manage.py makemigrations --empty orders --name update_raw_sql_view

>> Migrations for 'orders':
>>   orders/migrations/0003_update_raw_sql_view.py

To update a view with migrations, you can create a new migration file using the makemigrations command. In the migration file, you can use the RunSQL operation to execute the SQL statement that updates the view.

Here’s an example of how to update a view using migrations:

from django.db import migrations

class Migration(migrations.Migration):
    dependencies = [
        ("orders", "0002_raw_sql_view"),
    ]

    operations = [
        migrations.RunSQL(
            """
            CREATE OR REPLACE VIEW my_simple_view AS
            SELECT 
                inv.id,
                inv.invoice_date,
                inv.total_amount,
                inv.order_id,
                ord.product_name,
                ord.quantity,
                ord.price,
                ord.order_date,
                usr.first_name,
                usr.last_name
            FROM 
                orders_invoice inv
            INNER JOIN 
                orders_order ord ON inv.order_id = ord.id
            INNER JOIN 
                orders_customer cust ON ord.customer_id = cust.id
            INNER JOIN 
                auth_user usr ON cust.user_id = usr.id;
            """,
            "DROP VIEW my_simple_view",
        ),
    ]

As you can see when we want to have anything meaningful to show, SQL queries became more complex. But what if we could take advantage of Django ORM, to create PostgreSQL views?

Working with complex views in Django migrations

Using Django ORM in migrations

As a developer working with PostgreSQL views in Django migrations, one common scenario is the need to join multiple tables in a view. This allows us to combine data from different tables into a single virtual table that can be queried efficiently.

Create a new migration:

python manage.py makemigrations --empty orders --name django_orm_sql_view

>> Migrations for 'orders':
>>   orders/migrations/0004_django_orm_sql_view.py

Example of a migration with Django ORM:


from django.db import migrations

def generate_complex_qs():
    from orders.models import Invoice
    from django.db.models import F, Value
    from django.db.models.functions import Concat

    qs = Invoice.objects.annotate(
        customer_full_name=Concat(
            F("order__customer__user__first_name"),
            Value(" "),
            F("order__customer__user__last_name"),
        ),
    ).values(
        "id",
        "invoice_date",
        "total_amount",
        "customer_full_name",
        "order__customer__address",
        "order__product_name",
        "order__quantity",
        "order__price",
        "order__order_date",
    )
    return qs

def convert_qs_to_sql(qs):
    from django.db import connections

    cursor = connections["default"].cursor()
    sql_query = cursor.mogrify(*qs.query.sql_with_params()).decode("utf-8", "ignore")
    return sql_query

VIEW_NAME = "invoices_report"
VIEW_SQL = convert_qs_to_sql(generate_complex_qs())
CREATE_POSTGRES_VIEW_SQL = f"""
CREATE OR REPLACE VIEW {VIEW_NAME}
 AS
 {VIEW_SQL}
"""
REVERT_POSTGRES_VIEW_SQL = f"""
DROP VIEW IF EXISTS {VIEW_NAME};
"""

class Migration(migrations.Migration):
    dependencies = [
        ("orders", "0003_update_raw_sql_view"),
    ]

    operations = [migrations.RunSQL(CREATE_POSTGRES_VIEW_SQL, REVERT_POSTGRES_VIEW_SQL)]

Again to create the you should run migrate. Afterwards you can check if view created:

Screenshot 2024-01-31 at 03.39.32.png

Using Django Models to access Postgres Views

Sometimes you’ll need to expose Postgres Views via API or use in your application. Thanks to Django Models you can do that easily, just defining managed=False in your models:

class InvoiceReport(models.Model):
    id = models.BigAutoField(primary_key=True)
    total_amount = models.DecimalField(max_digits=10, decimal_places=2)
    quantity = models.IntegerField()
    product_name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=10, decimal_places=2)
    order_date = models.DateField()
    invoice_date = models.DateField()
    customer_full_name = models.CharField()
    address = models.TextField()

    class Meta:
        managed = False
        db_table = "invoices_report"

    def __str__(self):
        return f"{self.customer_full_name} - {self.id}"

Django will require you to run makemigrations and migrate for your newly created InvoiceReport model, but under the hood it won’t try to re-create table or view:

./manage.py makemigrations

>> Migrations for 'orders':
>>   orders/migrations/0005_invoicereport.py
>>     - Create model InvoiceReport

./manage.py migrate       
>> Operations to perform:
>>   Apply all migrations: admin, auth, contenttypes, orders, sessions
>> Running migrations:
>>   Applying orders.0005_invoicereport... OK

Afterwards you can use InvoiceReport as usual Django Models:

./manage.py shell 
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from orders.models import * 
>>> 
>>> InvoiceReport.objects.all()
<QuerySet [<InvoiceReport: John Doe - 2>, <InvoiceReport: John Doe - 1>]>
>>> 
>>> for report in InvoiceReport.objects.all():
...     print(report.id, report.product_name, report.customer_full_name, report.total_amount)
... 
1 product #2 John Doe 10.10
2 product name #1 John Doe 15.98
>>>

Rolling back view migrations

Rolling back view migrations is a straightforward process in Django. When a view migration is rolled back, the corresponding view is dropped from the database.

To roll back a view migration, you can use the migrate command by passing the number or name of the previous migration. For detailed explanation check Django’s Documentation.

For example:

# python manage.py migrate <yourappname> <migration_name>

python manage.py migrate orders 0003_update_raw_sql_view

It’s important to note that rolling back a view migration will result in the loss of any data stored in the view. If you need to preserve the data in the view, you can create a new migration that recreates the view and then migrate to that new migration.

Tip: Before rolling back a view migration, make sure to backup any important data stored in the view to avoid data loss.

Optimizing PostgreSQL views for reporting purposes

Choosing the correct columns for the view

As a developer, when creating a PostgreSQL view for reporting purposes, it is important to carefully choose the columns that will be included in the view. Selecting the right columns can significantly impact the performance and usability of the view.

One approach is to include only the necessary columns that are relevant to the reporting requirements. This helps to minimize the amount of data retrieved from the database and improves query performance.

Additionally, consider including any calculated or derived columns that are commonly used in reporting. These columns can be pre-computed in the view, reducing the need for complex calculations in the reporting queries.

To summarize, when choosing the columns for a PostgreSQL view, focus on selecting the necessary columns and including any calculated or derived columns that are frequently used in reporting queries.

Here is a bulleted list of points to consider:

  • Include only necessary columns
  • Include calculated or derived columns
  • Minimize data retrieval
  • Optimize query performance
  • Pre-compute commonly used calculations

Tip: By carefully selecting the columns for the view, you can improve the efficiency and effectiveness of your reporting queries.

Choosing Materialized Views over Views

When dealing with large datasets, views can simplify your code, but they don't necessarily save you time during execution because their speed depends on the underlying query. This limitation becomes more evident with expensive queries and extensive datasets, presenting a drawback.

To address this concern for performance, a better alternative could be utilizing materialized views. These views allow you to store the query's results on disk in a temporary table, acting as a cache. Consequently, querying the materialized view becomes much faster.

However, one drawback of materialized views is that they don't automatically update when the data in the base tables changes. For instance, if a customer changes their address in the earlier example, we would see the updated information once we refresh the materialized view. This process involves rerunning the original query and caching the new results. An example demonstrating this situation will be presented in the next section.

Let’s create an example materializer view:

python manage.py makemigrations --empty orders --name django_orm_materialized_view

>> Migrations for 'orders':
>>   orders/migrations/0006_django_orm_materialized_view.py

Migration file:

from django.db import migrations

def generate_complex_qs():
    from orders.models import Invoice
    from django.db.models import F, Value
    from django.db.models.functions import Concat

    qs = Invoice.objects.annotate(
        customer_full_name=Concat(
            F("order__customer__user__first_name"),
            Value(" "),
            F("order__customer__user__last_name"),
        ),
    ).values(
        "id",
        "invoice_date",
        "total_amount",
        "customer_full_name",
        "order__customer__address",
        "order__product_name",
        "order__quantity",
        "order__price",
        "order__order_date",
    )
    return qs

def convert_qs_to_sql(qs):
    from django.db import connections

    cursor = connections["default"].cursor()
    sql_query = cursor.mogrify(*qs.query.sql_with_params()).decode("utf-8", "ignore")
    return sql_query

VIEW_NAME = "invoices_materialized_report"
VIEW_SQL = convert_qs_to_sql(generate_complex_qs())
CREATE_POSTGRES_VIEW_SQL = f"""
CREATE MATERIALIZED VIEW {VIEW_NAME}
 AS
 {VIEW_SQL};
CREATE UNIQUE INDEX ON {VIEW_NAME} (id);
"""
REVERT_POSTGRES_VIEW_SQL = f"""
DROP MATERIALIZED VIEW IF EXISTS {VIEW_NAME};
"""

class Migration(migrations.Migration):
    dependencies = [
        ("orders", "0005_invoicereport"),
    ]

    operations = [migrations.RunSQL(CREATE_POSTGRES_VIEW_SQL, REVERT_POSTGRES_VIEW_SQL)]

As you probably noticed, I created an INDEX on view. It’s required to be able to REFRESH the materialized views.

Result:

Screenshot 2024-01-31 at 04.34.43.png

Refreshing materialized views

Refreshing materialized views is an important step in ensuring that the data in the view is up to date. Materialized views are a great way to improve query performance by precomputing and storing the results of a query. However, the data in a materialized view can become stale over time as the underlying data changes. To refresh a materialized view in PostgreSQL, you can use the REFRESH MATERIALIZED VIEW command.

Here is an example of how to refresh a materialized view:

REFRESH MATERIALIZED VIEW my_materialized_view;

This command will recompute the data in the materialized view based on the underlying tables and update the view with the latest data. It’s important to note that refreshing a materialized view can be an expensive operation, especially if the view contains a large amount of data or complex calculations. Therefore, it’s recommended to schedule regular refreshes based on the frequency of data changes and the performance requirements of your application.

To schedule regular refreshes of a materialized view, you can use the PostgreSQL REFRESH MATERIALIZED VIEW CONCURRENTLY command. This command allows you to refresh the view without locking the view, allowing concurrent reads and writes to the view while the refresh is in progress.

REFRESH MATERIALIZED VIEW CONCURRENTLY my_materialized_view;

By using the CONCURRENTLY option, you can minimize the impact on the performance of your application while keeping the materialized view up to date.

Let’s implement a refresh mechanism into our Django Model, so we can refresh in our app whenever we need a fresh data:

class InvoiceMaterializedReport(models.Model):
    id = models.BigAutoField(primary_key=True)
    total_amount = models.DecimalField(max_digits=10, decimal_places=2)
    quantity = models.IntegerField()
    product_name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=10, decimal_places=2)
    order_date = models.DateField()
    invoice_date = models.DateField()
    customer_full_name = models.CharField()
    address = models.TextField()

    class Meta:
        managed = False
        db_table = "invoices_materialized_report"

    def __str__(self):
        return f"{self.customer_full_name} - {self.id}"

    @classmethod
    def refresh_view(cl):
        with connection.cursor() as cursor:
            cursor.execute(
                f"REFRESH MATERIALIZED VIEW CONCURRENTLY {cl._meta.db_table}"
            )

Usage example:

./manage.py shell

>>> from orders.models import * 
>>> InvoiceMaterializedReport.refresh_view()
>>>

Handling view dependencies

When working with complex views in Django migrations, it is important to handle view dependencies properly. This ensures that the views are created and updated in the correct order, avoiding any errors or inconsistencies.

One way to handle view dependencies is by using the depends_on attribute in the migration file. This attribute specifies the views that the current view depends on, ensuring that the dependencies are resolved before creating or updating the view.

For example, let’s say we have two views: view1 and view2. If view2 depends on view1, we can specify this dependency in the migration file for view2 using the depends_on attribute:

from django.db import migrations

class Migration(migrations.Migration):

    dependencies = [
        ('app_name', '0001_initial'),
        ('app_name', '0002_view1_migration'),
    ]

    operations = [
        migrations.RunSQL(
            'CREATE VIEW view2 AS SELECT * FROM view1',
            'DROP VIEW view2'
        ),
    ]

By specifying the dependency on view1 in the dependencies list, Django ensures that view1 is created or updated before creating or updating view2.

Handling view dependencies correctly is crucial for maintaining the integrity and consistency of the database schema.

Conclusion

In conclusion, PostgreSQL views provide a powerful tool for creating custom reports in Django applications. By leveraging the capabilities of Django migrations, developers can easily define and manage views that can be used for reporting purposes. Views allow for efficient data retrieval and can be customized to meet specific reporting requirements. With the ability to perform complex queries and join multiple tables, views offer a flexible solution for generating meaningful insights from the database. PostgreSQL views are a valuable addition to any Django project, enabling developers to create comprehensive reports and analyze data in a structured and efficient manner.

Key Takeaways

  • PostgreSQL views are virtual tables that can be used for reporting purposes.
  • You can create and update views in your PostgreSQL database using Django migrations.
  • Optimizing views involves choosing the correct columns, filtering and aggregating data, and using indexes for improved performance.
  • Complex views can be created by joining multiple tables and using subqueries.