Optimize backup performance for large PostgreSQL databases

Summary Improves overall backup performance and reliability for larger PostgreSQL databases while keeping the solution fully compatible with Railway environments. Changes Uses PostgreSQL custom-format backups (pg_dump -Fc) for efficient storage Enables multipart, threaded uploads to Cloudflare R2 for faster and more reliable transfers Keeps backups restore-friendly, supporting parallel restores via pg_restore --jobs Updates documentation to accurately reflect performance behavior Notes on Parallelism Parallel dumping (pg_dump --jobs) is intentionally not used, as it requires directory format and is not suitable for Railway containers Parallelism is supported at restore time using pg_restore --jobs with .dump files Testing Tested on Railway using the deps-update branch Verified backup creation, encryption, upload, and retention cleanup
2025-12-23 11:23:40 +08:00 · 2025-12-23 11:23:40 +08:00 · fa422f0980
parent 4559a44270 799a0bf6ad
commit fa422f0980
2 changed files with 114 additions and 82 deletions
--- a/README.md
+++ b/README.md
@ -1,74 +1,97 @@
 # Postgres-to-R2 Backup

 A lightweight automation service that creates scheduled PostgreSQL backups and securely uploads them to **Cloudflare R2 object storage**.  
-Designed for **Railway deployments**, with built-in support for Docker and cron scheduling.
+Designed specifically as a **Railway deployment template**, with built-in support for Docker and cron scheduling.

 ---

 ## ✨ Features

- 📦 **Automated Backups** — scheduled daily or hourly backups of your PostgreSQL database  
- 🔐 **Optional Encryption** — compress with gzip or encrypt with 7z and password-protection  
- ☁️ **Cloudflare R2 Integration** — seamless upload to your R2 bucket  
- 🧹 **Retention Policy** — keep a fixed number of backups, auto-clean old ones  
- 🔗 **Flexible Database URL** — supports both private and public PostgreSQL URLs  
- 🐳 **Docker Ready** — lightweight container for portable deployment  
+- 📦 **Automated Backups** — scheduled daily or hourly PostgreSQL backups  
+- 🔐 **Optional Encryption** — gzip compression or 7z encryption with password  
+- ☁️ **Cloudflare R2 Integration** — seamless S3-compatible uploads  
+- 🧹 **Retention Policy** — automatically delete old backups  
+- 🔗 **Flexible Database URLs** — supports private and public PostgreSQL URLs  
+- ⚡ **Optimized Performance** — parallel pg_dump and multipart R2 uploads  
+- 🐳 **Docker Ready** — portable, lightweight container  
+- 🚀 **Railway Template First** — no fork required for normal usage 
+- ⚡ **Optimized Performance** — efficient custom-format dumps and multipart R2 uploads

 ---

-## 🚀 Deployment on Railway
+## 🚀 Deployment on Railway (Recommended)

-1. **Fork this repository**  
-2. **Create a new project** on [Railway](https://railway.app/)  
-3. **Add environment variables** in Railway dashboard:
-
-```env
-DATABASE_URL=           # Your PostgreSQL database URL (private)
-DATABASE_PUBLIC_URL=    # Public database URL (optional)
-USE_PUBLIC_URL=false    # Set to true to use DATABASE_PUBLIC_URL
-DUMP_FORMAT=dump        # Options: sql, plain, dump, custom, tar
-FILENAME_PREFIX=backup  # Prefix for backup files
-MAX_BACKUPS=7           # Number of backups to keep
-R2_ACCESS_KEY=          # Cloudflare R2 access key
-R2_SECRET_KEY=          # Cloudflare R2 secret key
-R2_BUCKET_NAME=         # R2 bucket name
-R2_ENDPOINT=            # R2 endpoint URL
-BACKUP_PASSWORD=        # Optional: password for 7z encryption
-BACKUP_TIME=00:00       # Daily backup time in UTC (HH:MM format)
-```
-
-### Quick Deploy
-Click the button below to deploy directly to Railway:
+1. Click the **Deploy on Railway** button below  
+2. Railway will create a new project using the latest version of this repository  
+3. Add the required environment variables in the Railway dashboard  
+4. (Optional) Configure a cron job for your desired backup schedule  

 [![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/e-ywUS?referralCode=nIQTyp&utm_medium=integration&utm_source=template&utm_campaign=generic)

 ---

+## 🔧 Environment Variables
+
+```env
+DATABASE_URL=           # PostgreSQL database URL (private)
+DATABASE_PUBLIC_URL=    # Public PostgreSQL URL (optional)
+USE_PUBLIC_URL=false    # Set true to use DATABASE_PUBLIC_URL
+
+DUMP_FORMAT=dump        # sql | plain | dump | custom | tar
+FILENAME_PREFIX=backup  # Backup filename prefix
+MAX_BACKUPS=7           # Number of backups to retain
+
+R2_ACCESS_KEY=          # Cloudflare R2 access key
+R2_SECRET_KEY=          # Cloudflare R2 secret key
+R2_BUCKET_NAME=         # R2 bucket name
+R2_ENDPOINT=            # R2 endpoint URL
+
+BACKUP_PASSWORD=        # Optional: enables 7z encryption
+BACKUP_TIME=00:00       # Daily backup time (UTC, HH:MM)
+```
+
+---
+
 ## ⏰ Railway Cron Jobs

-You can configure the backup schedule using Railway's built-in cron jobs in the dashboard:
+You can configure the backup schedule using **Railway Cron Jobs**:

-1. Go to your project settings
-2. Navigate to **Deployments** > **Cron**
-3. Add a new cron job pointing to your service
+1. Open your Railway project  
+2. Go to **Deployments → Cron**  
+3. Add a cron job targeting this service  

-Common cron expressions:
+### Common Cron Expressions

 | Schedule | Cron Expression | Description |
-|----------|----------------|-------------|
-| Hourly | `0 * * * *` | Run once every hour |
-| Daily (midnight) | `0 0 * * *` | Run once per day at midnight |
-| Twice Daily | `0 */12 * * *` | Run every 12 hours |
-| Weekly | `0 0 * * 0` | Run once per week (Sunday) |
-| Monthly | `0 0 1 * *` | Run once per month |
+|--------|----------------|------------|
+| Hourly | `0 * * * *` | Every hour |
+| Daily | `0 0 * * *` | Once per day (UTC midnight) |
+| Twice Daily | `0 */12 * * *` | Every 12 hours |
+| Weekly | `0 0 * * 0` | Every Sunday |
+| Monthly | `0 0 1 * *` | First day of the month |

-Pro Tips:
- Use [crontab.guru](https://crontab.guru) to verify your cron expressions
- All times are in UTC
- Configure backup retention (`MAX_BACKUPS`) according to your schedule
-````
+**Tips**
+- All cron times are **UTC**
+- Use https://crontab.guru to validate expressions
+- Adjust `MAX_BACKUPS` to match your schedule

-📜 License
+---
+
+## 🛠 Development & Contributions
+
+Fork this repository **only if you plan to**:
+
+- Modify the backup logic
+- Add features or integrations
+- Submit pull requests
+- Run locally for development
+
+For normal usage, deploying via the **Railway template** is recommended.
+
+---
+
+## 📜 License
+
+This project is open source under the **MIT License**.

-This project is open source under the MIT License.
 You are free to use, modify, and distribute it with attribution.
--- a/main.py
+++ b/main.py
@ -12,7 +12,7 @@ import shutil

 load_dotenv()

-##Env
+## ENV

 DATABASE_URL = os.environ.get("DATABASE_URL")
 DATABASE_PUBLIC_URL = os.environ.get("DATABASE_PUBLIC_URL")
@ -31,36 +31,31 @@ BACKUP_TIME = os.environ.get("BACKUP_TIME", "00:00")
 def log(msg):
    print(msg, flush=True)

+## Validate BACKUP_TIME
 try:
    hour, minute = BACKUP_TIME.split(":")
    if not (0 <= int(hour) <= 23 and 0 <= int(minute) <= 59):
-        log("[WARNING] Invalid BACKUP_TIME format. Using default: 00:00")
-        BACKUP_TIME = "00:00"
+        raise ValueError
 except ValueError:
    log("[WARNING] Invalid BACKUP_TIME format. Using default: 00:00")
    BACKUP_TIME = "00:00"

 def get_database_url():
-    """Get the appropriate database URL based on configuration"""
    if USE_PUBLIC_URL:
        if not DATABASE_PUBLIC_URL:
            raise ValueError("[ERROR] DATABASE_PUBLIC_URL not set but USE_PUBLIC_URL=true!")
        return DATABASE_PUBLIC_URL
-    
+
    if not DATABASE_URL:
        raise ValueError("[ERROR] DATABASE_URL not set!")
    return DATABASE_URL

 def run_backup():
-    """Main backup function that handles the entire backup process"""
    if shutil.which("pg_dump") is None:
        log("[ERROR] pg_dump not found. Install postgresql-client.")
        return

    database_url = get_database_url()
-    url = urlparse(database_url)
-    db_name = url.path[1:]
-
    log(f"[INFO] Using {'public' if USE_PUBLIC_URL else 'private'} database URL")

    format_map = {
@ -72,27 +67,33 @@ def run_backup():
    }
    pg_format, ext = format_map.get(DUMP_FORMAT.lower(), ("c", "dump"))

-    timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
+    timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
    backup_file = f"{FILENAME_PREFIX}_{timestamp}.{ext}"

-    if BACKUP_PASSWORD:
-        compressed_file = f"{backup_file}.7z"
-    else:
-        compressed_file = f"{backup_file}.gz"
+    compressed_file = (
+        f"{backup_file}.7z" if BACKUP_PASSWORD else f"{backup_file}.gz"
+    )

    compressed_file_r2 = f"{BACKUP_PREFIX}{compressed_file}"

-    ##Create backup
+    ## Create backup
    try:
        log(f"[INFO] Creating backup {backup_file}")
-        subprocess.run(
-            ["pg_dump", f"--dbname={database_url}", "-F", pg_format, "-f", backup_file],
-            check=True
-        )
+
+        dump_cmd = [
+            "pg_dump",
+            f"--dbname={database_url}",
+            "-F", pg_format,
+            "--no-owner",
+            "--no-acl",
+            "-f", backup_file
+        ]
+
+        subprocess.run(dump_cmd, check=True)

        if BACKUP_PASSWORD:
            log("[INFO] Encrypting backup with 7z...")
-            with py7zr.SevenZipFile(compressed_file, 'w', password=BACKUP_PASSWORD) as archive:
+            with py7zr.SevenZipFile(compressed_file, "w", password=BACKUP_PASSWORD) as archive:
                archive.write(backup_file)
            log("[SUCCESS] Backup encrypted successfully")
        else:
@ -103,9 +104,6 @@ def run_backup():
    except subprocess.CalledProcessError as e:
        log(f"[ERROR] Backup creation failed: {e}")
        return
-    except Exception as e:
-        log(f"[ERROR] Compression/encryption failed: {e}")
-        return
    finally:
        if os.path.exists(backup_file):
            os.remove(backup_file)
@ -113,11 +111,11 @@ def run_backup():
    ## Upload to R2
    if os.path.exists(compressed_file):
        size = os.path.getsize(compressed_file)
-        log(f"[INFO] Final backup size: {size/1024/1024:.2f} MB")
-        
+        log(f"[INFO] Final backup size: {size / 1024 / 1024:.2f} MB")
+
    try:
        client = boto3.client(
-            's3',
+            "s3",
            endpoint_url=R2_ENDPOINT,
            aws_access_key_id=R2_ACCESS_KEY,
            aws_secret_access_key=R2_SECRET_KEY
@ -139,16 +137,27 @@ def run_backup():

        log(f"[SUCCESS] Backup uploaded: {compressed_file_r2}")

-        objects = client.list_objects_v2(Bucket=R2_BUCKET_NAME, Prefix=BACKUP_PREFIX)
-        if 'Contents' in objects:
-            backups = sorted(objects['Contents'], key=lambda x: x['LastModified'], reverse=True)
+        objects = client.list_objects_v2(
+            Bucket=R2_BUCKET_NAME,
+            Prefix=BACKUP_PREFIX
+        )
+
+        if "Contents" in objects:
+            backups = sorted(
+                objects["Contents"],
+                key=lambda x: x["LastModified"],
+                reverse=True
+            )
+
            for obj in backups[MAX_BACKUPS:]:
-                client.delete_object(Bucket=R2_BUCKET_NAME, Key=obj['Key'])
+                client.delete_object(
+                    Bucket=R2_BUCKET_NAME,
+                    Key=obj["Key"]
+                )
                log(f"[INFO] Deleted old backup: {obj['Key']}")

    except Exception as e:
        log(f"[ERROR] R2 operation failed: {e}")
-        return
    finally:
        if os.path.exists(compressed_file):
            os.remove(compressed_file)
@ -156,11 +165,11 @@ def run_backup():
 if __name__ == "__main__":
    log("[INFO] Starting backup scheduler...")
    log(f"[INFO] Scheduled backup time: {BACKUP_TIME} UTC")
-    
+
    schedule.every().day.at(BACKUP_TIME).do(run_backup)
-    
+
    run_backup()
-    
+
    while True:
        schedule.run_pending()
-        time.sleep(60)
+        time.sleep(60)