How to pg_dump

Content on WhatAnswers is provided "as is" for informational purposes. While we strive for accuracy, we make no guarantees. Content is AI-assisted and should not be used as professional advice.

Last updated: April 4, 2026

Quick Answer: pg_dump is a PostgreSQL command-line tool that creates a backup of a database by exporting its schema and data as SQL commands. Run 'pg_dump -U username -h localhost dbname > backup.sql' in your terminal to generate a backup file that can be restored later using psql.

Key Facts

What It Is

pg_dump is a command-line utility provided with PostgreSQL that creates logical backups of databases by generating SQL commands that can recreate the database structure and data. It exports the complete schema (tables, indexes, views, functions) and all data rows into a portable format that can be executed on any PostgreSQL installation. Unlike physical backups that copy raw database files, pg_dump creates human-readable SQL scripts or custom binary formats that are independent of the underlying system architecture. This makes pg_dump ideal for migrating databases between different servers, creating version-controlled backups, and sharing database snapshots with team members.

PostgreSQL introduced pg_dump in version 6.4 released in 1998 as a fundamental tool for database administrators and developers to safeguard their data. The utility has evolved significantly over PostgreSQL's 25-year history, gaining support for compression, parallel processing, and selective dumping of specific tables or schemas. Prior to pg_dump, database backups required stopping the database server and copying raw files, a risky and time-consuming process that pg_dump eliminated. Today, pg_dump is universally recognized as the standard backup method for PostgreSQL databases and is used by organizations managing billions of records in production environments.

pg_dump supports multiple output formats to suit different backup and migration needs, each with distinct advantages. The plain text SQL format (default) produces human-readable .sql files that can be edited, version-controlled, and easily reviewed before restoration. The custom format (-Fc option) compresses the backup significantly, can be selectively restored, and offers better handling of large objects and binary data. The directory format (-Fd option) outputs the backup as a directory of files, enabling parallel dumps and restores of very large databases. The tar format (-Ft option) creates a tar archive suitable for archival systems and tape backups in large-scale infrastructure.

How It Works

pg_dump works by connecting to a PostgreSQL database and reading its catalog tables, which contain metadata about all database objects, then generating equivalent CREATE statements to rebuild that structure. It simultaneously reads the actual data from each table and outputs INSERT statements or COPY commands that will reinsert the data during restoration. The utility locks tables in read-only mode briefly to ensure consistency, preventing data corruption from concurrent writes during the dump process. It generates one large transaction that ensures the backup represents a consistent snapshot of the database at a specific point in time.

A practical example demonstrates pg_dump's typical usage: a company running a PostgreSQL 14 database named 'ecommerce' would run 'pg_dump -U postgres -h db.example.com ecommerce > ecommerce_backup.sql' to create a backup file. If the database is large, they might use compression with 'pg_dump -U postgres -h db.example.com -Fc ecommerce > ecommerce_backup.dump' to reduce file size from 50GB to 8GB. For nightly backups in a DevOps workflow, the command might include a timestamp: 'pg_dump -U backup_user -h production-db.internal mydb > mydb_$(date +%Y%m%d).sql'. A developer copying a production database locally for testing would run 'pg_dump -h prod-server.com -U read_only_user production_db | psql -h localhost development_db' to pipe the dump directly into restoration.

The step-by-step process begins with authenticating to the PostgreSQL server using connection parameters (hostname, port, username, database name), which prompts for a password unless a .pgpass file is configured. pg_dump then establishes a connection and begins outputting SQL to either a file or standard output, allowing piping to compression utilities like gzip or directly to psql for remote restoration. For large databases, the '--jobs' parameter (available in PostgreSQL 9.3+) enables parallel dumping across multiple threads, dramatically reducing backup time. Additional options like '--exclude-table' let you skip specific tables, '--data-only' exports only data without schema, and '--schema-only' exports only the structure without any data rows.

Why It Matters

pg_dump is critical for business continuity and disaster recovery strategies, with studies showing 60% of data loss incidents could be prevented with regular backups and restoration testing. Organizations in financial services, healthcare, and e-commerce rely on pg_dump for compliance with regulations like SOX, HIPAA, and GDPR that mandate backup and audit trail requirements. A single pg_dump backup file can be the difference between recovering from a ransomware attack in hours versus losing years of business data permanently. Database administrators cite pg_dump as essential because it enables zero-downtime backups without stopping the database, allowing continuous operations while protecting against data loss.

pg_dump enables critical DevOps and development workflows across industries where database portability is essential. Companies like Spotify and Airbnb use pg_dump to synchronize development databases with production snapshots, ensuring engineers test against realistic data. In continuous integration pipelines, pg_dump backups are taken before deployments, allowing instant rollback if database migrations fail. Marketing technology companies use pg_dump to export and share anonymized customer datasets between different environments, supporting A/B testing and analytics without copying actual live databases. Educational institutions use pg_dump to archive student records and course data for long-term retention and compliance with data protection laws.

Future developments in PostgreSQL backup technology are building on pg_dump's foundation with enhancements like WAL-G for continuous archiving, pg_basebackup for physical backups, and cloud-native solutions integrating with AWS RDS, Azure Database, and Google Cloud SQL. The emergence of logical replication and standby servers in PostgreSQL 10+ provides alternative backup strategies, but pg_dump remains the preferred method for exporting subsets of data and creating portable backups. Machine learning and automation tools are increasingly analyzing pg_dump output to detect schema anomalies and suggest optimization opportunities. The industry is moving toward hybrid backup strategies combining pg_dump with incremental backups and point-in-time recovery for comprehensive data protection.

Common Misconceptions

A widespread misconception is that pg_dump requires stopping the database server to create a safe backup, but this is false; pg_dump creates consistent backups without interrupting database operations or requiring downtime. This myth persists from the early days of database backups when physical file copying required exclusive access, but pg_dump's logical backup approach eliminates this requirement. PostgreSQL's MVCC (Multi-Version Concurrency Control) architecture allows pg_dump to read data while other connections simultaneously write to the database, capturing a consistent snapshot from the perspective of the dump's start time. Modern production systems run pg_dump continuously during peak business hours without any performance impact to active users.

Many developers believe pg_dump backups are immediately usable after creation, but this is only partially true; the backup must be restored using psql or pg_restore before the data is accessible. A pg_dump file is only a series of SQL commands, not a functioning database that can be queried directly, leading to confusion among those unfamiliar with database backup concepts. Another related misconception is that pg_dump creates automatic backups; it does not—it merely provides the tool, and administrators must schedule it via cron jobs, orchestration platforms, or backup software to run automatically. Without explicit scheduling, pg_dump must be run manually each time a backup is needed, leaving databases vulnerable to data loss.

People often assume that pg_dump is sufficient for point-in-time recovery, allowing them to restore a database to any moment in the past, but pg_dump only captures snapshots at specific intervals. True point-in-time recovery requires Write-Ahead Logging (WAL) archiving in addition to pg_dump backups, using tools like pg_basebackup and WAL-G to combine physical and logical backups. Another misconception is that pg_dump handles encrypted databases automatically, but encryption-at-rest on PostgreSQL requires separate configuration, and pg_dump must have decryption keys available to dump encrypted data. Finally, many assume pg_dump backups are secure against unauthorized access, but the resulting SQL file contains plaintext passwords and sensitive data unless explicitly encrypted or restricted with file permissions.

Why It Matters

Related Questions

What is the difference between pg_dump and pg_basebackup?

pg_dump creates logical backups using SQL exports that are portable and human-readable but slower for large databases, while pg_basebackup creates physical backups of raw database files that are faster but less portable. pg_dump allows selective restoration and compression, whereas pg_basebackup must restore the entire database. For most use cases, pg_dump is preferred for backups and migration, while pg_basebackup is used for setting up replication and streaming backups.

How do I restore a pg_dump backup?

For plain SQL format, use 'psql -U username -h localhost dbname < backup.sql' to restore the entire database. For custom format backups created with -Fc, use 'pg_restore -U username -h localhost -d dbname backup.dump' to restore. You can selectively restore specific tables using pg_restore with '--table' flags to restore only needed data without recreating the entire database.

How long does pg_dump take for a large database?

Backup time depends on database size, hardware speed, and compression settings; a 100GB database typically takes 30-60 minutes on modern servers. Using parallel dumps with '--jobs N' can reduce time by 50-70% on multi-core systems. Compression adds processing overhead but reduces file size by 80-90%, making it worthwhile for storage and transfer considerations.

Sources

  1. Wikipedia - PostgreSQLCC-BY-SA-4.0

Missing an answer?

Suggest a question and we'll generate an answer for it.