PostgreSQL

PostgreSQL, often written as "Postgres" and pronounced "Poss-gres", is an open source relational database implementation frequently used by Python applications as a backend for data storage and retrieval.

PostgreSQL logo.

How does PostgreSQL fit within the Python stack?

PostgreSQL is the default database choice for many Python developers, including the Django team when testing the Django ORM. PostgreSQL is often viewed as more feature robust and stable when compared to MySQL, SQLServer and Oracle. All of those databases are reasonable choices. However, because PostgreSQL tends to be used by Python developers the drivers and example code for using the database tend to be better documented and contain fewer bugs for typical usage scenarios. If you try to use an Oracle database with Django, you'll see there is far less example code for that setup compared to PostgreSQL backend setups.

PostgreSQL is an implementation of the relational database concept. Learn more in the data chapter or view the table of contents for all topics.

Why is PostgreSQL a good database choice?

PostgreSQL's open source license allows developers to operate one or more databases without licensing cost in their applications. The open source license operating model is much less expensive compared to Oracle or other proprietary databases, especially as replication and sharding become necessary at large scale. In addition, because so many people ranging from independent developers to multinational organizations use PostgreSQL, it's often easier to find developers with PostgreSQL experience than other relational databases. There is also ancedotal evidence that PostgreSQL fixes bugs faster than MySQL, although to be fair there has not been a comprehensive study comparing how the two projects handle defect resolution.

The PostgreSQL core team also releases frequent updates that greatly enhance the database's capabilities. For example, in the PostgreSQL 9.4 release the jsonb type was added to enhance JavaScript Object Notation (JSON) storage capabilities so that in many cases a separate NoSQL database is not required in an application's architecture.

Connecting to PostgreSQL with Python

To work with relational databases in Python you need to use a database driver, which is also referred to as a database connector. The most common driver library for working with PostgreSQL is psycopg. There is a list of all drivers on the PostgreSQL wiki, including several libraries that are no longer maintained. If you're working with the asyncio Python stdlib module you should also take a look at the aiopg library which wraps psycopg2's asychronouos features together.

To abstract the connection between tables and objects, many Python developers use an object-relational mapper (ORM) with to turn relational data from PostgreSQL into objects that can be used in their Python application. For example, while PostgreSQL provides a relational database and psycopg is the common database connector, there are many ORMs that can be used with varying web frameworks, as shown in the table below.

Examples of how varying Python ORMs can work with PostgreSQL and the psycopg2 connector.

Learn more about Python ORMs on that dedicated topic page.

PostgreSQL data safety

If you're on Linux it's easy to get PostgreSQL installed using a package manager. However, once the database is installed and running your responsibility is just beginning. Before you go live with a production application, make sure to:

  1. Lock down access with a whitelist in the pg_hba.conf file
  2. Enable replication to another database that's preferrably on different infrastructure in a separate location
  3. Perform regular backups and test the restoration process
  4. Ensure your application prevents SQL injection attacks

When possible have someone qualified do a PostgreSQL security audit to identify the biggest risks to your database. Small applications and bootstrapped companies often cannot afford a full audit in the beginning but as an application grows over time it becomes a bigger target.

The data stored in your database is the lifeblood of your application. If you have ever accidentally dropped a production database or been the victim of malicious activity such as SQL injection attacks, you'll know it's far easier to recover when a bit of work has been performed beforehand on backups, replication and security measures.

Python-specific PostgreSQL resources

Many quickstarts and tutorials exist specifically for Django, Flask and other web application frameworks. The ones below are some of the best walkthroughs I've read.

o Postgres Joins and Django Querysets is a well done post with a specific example of how a standard Django ORM query can lead to degraded performance due when obtaining data from many related tables. The prefetch_related command and database performance monitoring tools can help analyze and alleviate some of the issues in these unoptimized queries.

General PostgreSQL resources

PostgreSQL tutorials not specific to Python are also really helpful for properly handling your data.

PostgreSQL monitoring and performance

Monitoring one or more PostgreSQL instances and trying to performance tune them is a rare skillset. Here are some resources to get you started if you have to handle these issues in your applications.

Do you want to learn more about data or web apps?

Tell me about standard relational databases.

What're these NoSQL data stores hipster developers keep talking about?

I want to learn how to code a Python web application using a framework.


Matt Makai 2012-2022