How and When use Identities over Serial Types in PostgreSQL?

When it comes to generating auto-incrementing primary keys in PostgreSQL, the choice between “identities” and “serial” types holds significant implications for your database design and performance. In this article, we’ll explore the key differences between these options and outline best practices to help you make informed decisions.

Defining Identities and Serial Types

Both “identities” and “serial” types serve the purpose of automatically generating unique primary key values. However, their implementation and feature sets vary.

🔔 Another way to create auto-incrementing values are plain sequences. In this articles we’ll not talk about this because in many cases you want to use sequences “behind” serials or identities without worry about direct handling of sequences. However, in some cases, sequences allows to implement some “not usual” numbering (eg. unique identifier for all the database records or for a set of tables).

Serial Types

“Serial” is a PostgreSQL-specific feature that creates an integer column linked to a sequence. It’s a shorthand for creating a sequence and an integer column. Serial data types aren’t true types, but merely a notational convenience for creating unique identifier columns (similar to the AUTO_INCREMENT property supported by MySQL, MariaDB, MSSQLServer and other databases).

The generated integer values are limited to the following integer data types.

Serial types are: smallserial, serial and bigserial.

Name Storage Size Description Range
smallserial 2 bytes small autoincrementing integer 1 to 32767
serial 4 bytes autoincrementing integer 1 to 2147483647
bigserial 8 bytes large autoincrementing integer 1 to 9223372036854775807

Declaring a table with a serial primary keys is similar to the following:

CREATE TABLE people (
    people_id bigserial primary key,
    first_name varchar,
    last_name varchar
)

which is equivalent to execute the following statements:

CREATE SEQUENCE people_people_id_seq AS bigint;
CREATE TABLE people (
    people_id bigint NOT NULL DEFAULT nextval('people_people_id_seq') primary key ,
    first_name varchar,
    last_name varchar    
);
ALTER SEQUENCE people_people_id_seq OWNED BY people.people_id;

As you can see, creating a bigserial field is a shortcut to create:

  • A bigint column and arrange for its default values to be assigned from a sequence generator.
  • A NOT NULL constraint applied to ensure that a null value cannot be inserted.
  • Mark the sequence as owned by the column, so that it will be dropped if the column or table is dropped.

More info about Serial Types are available in the PostgreSQL documentation.

Identities

  • Introduced in PostgreSQL 10, “identities” are SQL-standard compliant.
  • Separate the concept of identity generation from the data type, offering greater flexibility in choosing the data type of the primary key.
  • Allow fine-grained control over properties, such as defining whether updates are allowed, and can accommodate various constraints.
  • Identities are simpler to handle.

A table which uses an IDENTITY as auto-generated primary key is declared as follow:

CREATE TABLE people (
   people_id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
   first_name varchar,
   last_name varchar
)

or as follow

CREATE TABLE people (
   people_id bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
   first_name varchar,
   last_name varchar
)

About IDENTITY clause PostgreSQL documentation says:

IDENTITY clause creates the column as an identity column. It will have an implicit sequence attached to it and the column in new rows will automatically have values from the sequence assigned to it. Such a column is implicitly NOT NULL.

OK, but what’s the difference between ALWAYS and BY DEFAULT clauses?

The clauses ALWAYS and BY DEFAULT determine how explicitly user-specified values are handled in INSERT and UPDATE commands.

In an INSERT command, if ALWAYS is selected, a user-specified value is only accepted if the INSERT statement specifies OVERRIDING SYSTEM VALUE. If BY DEFAULT is selected, then the user-specified value takes precedence.

In an UPDATE command, if ALWAYS is selected, any update of the column to any value other than DEFAULT will be rejected. If BY DEFAULT is selected, the column can be updated normally. (There is no OVERRIDING clause for the UPDATE command.)

More info about identities are available in the PostgreSQL documentation

Best Practices

So, which is better and why?

✅ Clarity and Standards Compliance: Use “identities” for improved clarity in your schema. They explicitly convey the purpose of the column and align with SQL standards, enhancing readability and maintainability.

✅ Portability and Interoperability: “Identities” are more compatible with standard SQL practices and promote better integration when migrating databases or collaborating with developers familiar with standard conventions.

✅ Data Integrity and Maintenance: Utilize “identities” when the focus is on maintaining data integrity and enforcing constraints. The ability to control properties offers a comprehensive approach to handling updates and constraints. Also, since serial is not a real type, it cannot use in every statement where a reat field type can be used. You can specify serial as a column type when creating a table or when adding a column. But dropping serialness from an existing column or adding it to an existing column is not so simple.

✅ Long-Term Adaptability: If anticipating future changes in data requirements, opt for “identities” due to their inherent flexibility. This future-proofs your schema against potential changes.

🔔 WARNING! DUMB RULE AHEAD! 🔔

Use IDENTITY!

For new applications, identity columns should be used; you should always use an identity column, unless you have to support old PostgreSQL versions (remember, identity columns were introduced in PostgreSQL v10). This will make your code manageable, cleaner and more portable.

Conclusion

In the dynamic landscape of PostgreSQL database design, the choice between “identities” and “serial” types for auto-generated primary keys depends on the complexity of your schema, performance needs and long-term adaptability. “Identities” shine in scenarios that demand flexibility, clarity, and adherence to standards, while “serial” types cater to straightforward, performance-oriented situations. By aligning your choice with the intricacies of your project, you can forge a robust and optimized database schema that suits your unique requirements. Remember, the ultimate goal is to strike a balance between efficiency, clarity, and future-proofing as you navigate the realm of auto-generated primary keys in PostgreSQL.

Trainings

If you want to become proficient in PostgreSQL there are some very popular trainings provided by my company:

If your company needs a custom training, let us know.

Trainings are available remotely and on-site.

Some other interesting links:

Comments

comments powered by Disqus