Add parallel pg_dump support and document PG_DUMP_JOBS

This commit is contained in:
BigDaddyAman 2025-12-23 11:08:14 +08:00
parent 07e45aa13e
commit 367d366dec
2 changed files with 137 additions and 64 deletions

140
README.md
View File

@ -1,74 +1,120 @@
# Postgres-to-R2 Backup # Postgres-to-R2 Backup
A lightweight automation service that creates scheduled PostgreSQL backups and securely uploads them to **Cloudflare R2 object storage**. A lightweight automation service that creates scheduled PostgreSQL backups and securely uploads them to **Cloudflare R2 object storage**.
Designed for **Railway deployments**, with built-in support for Docker and cron scheduling. Designed specifically as a **Railway deployment template**, with built-in support for Docker and cron scheduling.
--- ---
## ✨ Features ## ✨ Features
- 📦 **Automated Backups** — scheduled daily or hourly backups of your PostgreSQL database - 📦 **Automated Backups** — scheduled daily or hourly PostgreSQL backups
- 🔐 **Optional Encryption** — compress with gzip or encrypt with 7z and password-protection - 🔐 **Optional Encryption** — gzip compression or 7z encryption with password
- ☁️ **Cloudflare R2 Integration** — seamless upload to your R2 bucket - ☁️ **Cloudflare R2 Integration** — seamless S3-compatible uploads
- 🧹 **Retention Policy** — keep a fixed number of backups, auto-clean old ones - 🧹 **Retention Policy** — automatically delete old backups
- 🔗 **Flexible Database URL** — supports both private and public PostgreSQL URLs - 🔗 **Flexible Database URLs** — supports private and public PostgreSQL URLs
- 🐳 **Docker Ready** — lightweight container for portable deployment - ⚡ **Optimized Performance** — parallel pg_dump and multipart R2 uploads
- 🐳 **Docker Ready** — portable, lightweight container
- 🚀 **Railway Template First** — no fork required for normal usage
--- ---
## 🚀 Deployment on Railway ## 🚀 Deployment on Railway (Recommended)
1. **Fork this repository** 1. Click the **Deploy on Railway** button below
2. **Create a new project** on [Railway](https://railway.app/) 2. Railway will create a new project using the latest version of this repository
3. **Add environment variables** in Railway dashboard: 3. Add the required environment variables in the Railway dashboard
4. (Optional) Configure a cron job for your desired backup schedule
```env
DATABASE_URL= # Your PostgreSQL database URL (private)
DATABASE_PUBLIC_URL= # Public database URL (optional)
USE_PUBLIC_URL=false # Set to true to use DATABASE_PUBLIC_URL
DUMP_FORMAT=dump # Options: sql, plain, dump, custom, tar
FILENAME_PREFIX=backup # Prefix for backup files
MAX_BACKUPS=7 # Number of backups to keep
R2_ACCESS_KEY= # Cloudflare R2 access key
R2_SECRET_KEY= # Cloudflare R2 secret key
R2_BUCKET_NAME= # R2 bucket name
R2_ENDPOINT= # R2 endpoint URL
BACKUP_PASSWORD= # Optional: password for 7z encryption
BACKUP_TIME=00:00 # Daily backup time in UTC (HH:MM format)
```
### Quick Deploy
Click the button below to deploy directly to Railway:
[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/e-ywUS?referralCode=nIQTyp&utm_medium=integration&utm_source=template&utm_campaign=generic) [![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/e-ywUS?referralCode=nIQTyp&utm_medium=integration&utm_source=template&utm_campaign=generic)
--- ---
## 🔧 Environment Variables
```env
DATABASE_URL= # PostgreSQL database URL (private)
DATABASE_PUBLIC_URL= # Public PostgreSQL URL (optional)
USE_PUBLIC_URL=false # Set true to use DATABASE_PUBLIC_URL
DUMP_FORMAT=dump # sql | plain | dump | custom | tar
FILENAME_PREFIX=backup # Backup filename prefix
MAX_BACKUPS=7 # Number of backups to retain
PG_DUMP_JOBS=1 # Optional: parallel pg_dump jobs (use 24 for 12GB DBs)
R2_ACCESS_KEY= # Cloudflare R2 access key
R2_SECRET_KEY= # Cloudflare R2 secret key
R2_BUCKET_NAME= # R2 bucket name
R2_ENDPOINT= # R2 endpoint URL
BACKUP_PASSWORD= # Optional: enables 7z encryption
BACKUP_TIME=00:00 # Daily backup time (UTC, HH:MM)
```
---
## ⚡ Performance Optimization (Optional)
For larger databases (≈12 GB), you can significantly speed up backups by enabling
parallel PostgreSQL dumps.
### Parallel pg_dump
Set the number of parallel jobs:
```env
PG_DUMP_JOBS=4
```
**Notes**
- Only applies to `dump`, `custom`, or `tar` formats
- Default is `1` (safe for all users)
- Recommended values: `24`
- Higher values may overload small databases
This feature is **fully optional** and disabled by default.
---
## ⏰ Railway Cron Jobs ## ⏰ Railway Cron Jobs
You can configure the backup schedule using Railway's built-in cron jobs in the dashboard: You can configure the backup schedule using **Railway Cron Jobs**:
1. Go to your project settings 1. Open your Railway project
2. Navigate to **Deployments** > **Cron** 2. Go to **Deployments → Cron**
3. Add a new cron job pointing to your service 3. Add a cron job targeting this service
Common cron expressions: ### Common Cron Expressions
| Schedule | Cron Expression | Description | | Schedule | Cron Expression | Description |
|----------|----------------|-------------| |--------|----------------|------------|
| Hourly | `0 * * * *` | Run once every hour | | Hourly | `0 * * * *` | Every hour |
| Daily (midnight) | `0 0 * * *` | Run once per day at midnight | | Daily | `0 0 * * *` | Once per day (UTC midnight) |
| Twice Daily | `0 */12 * * *` | Run every 12 hours | | Twice Daily | `0 */12 * * *` | Every 12 hours |
| Weekly | `0 0 * * 0` | Run once per week (Sunday) | | Weekly | `0 0 * * 0` | Every Sunday |
| Monthly | `0 0 1 * *` | Run once per month | | Monthly | `0 0 1 * *` | First day of the month |
Pro Tips: **Tips**
- Use [crontab.guru](https://crontab.guru) to verify your cron expressions - All cron times are **UTC**
- All times are in UTC - Use https://crontab.guru to validate expressions
- Configure backup retention (`MAX_BACKUPS`) according to your schedule - Adjust `MAX_BACKUPS` to match your schedule
````
📜 License ---
## 🛠 Development & Contributions
Fork this repository **only if you plan to**:
- Modify the backup logic
- Add features or integrations
- Submit pull requests
- Run locally for development
For normal usage, deploying via the **Railway template** is recommended.
---
## 📜 License
This project is open source under the **MIT License**.
This project is open source under the MIT License.
You are free to use, modify, and distribute it with attribution. You are free to use, modify, and distribute it with attribution.

59
main.py
View File

@ -27,6 +27,7 @@ DUMP_FORMAT = os.environ.get("DUMP_FORMAT", "dump")
BACKUP_PASSWORD = os.environ.get("BACKUP_PASSWORD") BACKUP_PASSWORD = os.environ.get("BACKUP_PASSWORD")
USE_PUBLIC_URL = os.environ.get("USE_PUBLIC_URL", "false").lower() == "true" USE_PUBLIC_URL = os.environ.get("USE_PUBLIC_URL", "false").lower() == "true"
BACKUP_TIME = os.environ.get("BACKUP_TIME", "00:00") BACKUP_TIME = os.environ.get("BACKUP_TIME", "00:00")
PG_DUMP_JOBS = int(os.environ.get("PG_DUMP_JOBS", "1"))
def log(msg): def log(msg):
print(msg, flush=True) print(msg, flush=True)
@ -75,24 +76,38 @@ def run_backup():
timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S') timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
backup_file = f"{FILENAME_PREFIX}_{timestamp}.{ext}" backup_file = f"{FILENAME_PREFIX}_{timestamp}.{ext}"
if BACKUP_PASSWORD: compressed_file = (
compressed_file = f"{backup_file}.7z" f"{backup_file}.7z" if BACKUP_PASSWORD else f"{backup_file}.gz"
else: )
compressed_file = f"{backup_file}.gz"
compressed_file_r2 = f"{BACKUP_PREFIX}{compressed_file}" compressed_file_r2 = f"{BACKUP_PREFIX}{compressed_file}"
##Create backup # --------------------------
# Create backup
# --------------------------
try: try:
log(f"[INFO] Creating backup {backup_file}") log(f"[INFO] Creating backup {backup_file}")
subprocess.run(
["pg_dump", f"--dbname={database_url}", "-F", pg_format, "-f", backup_file], dump_cmd = [
check=True "pg_dump",
) f"--dbname={database_url}",
"-F", pg_format,
"--no-owner",
"--no-acl",
"-f", backup_file
]
if pg_format in ("c", "t") and PG_DUMP_JOBS > 1:
dump_cmd.insert(-2, f"--jobs={PG_DUMP_JOBS}")
log(f"[INFO] Using parallel pg_dump with {PG_DUMP_JOBS} jobs")
subprocess.run(dump_cmd, check=True)
if BACKUP_PASSWORD: if BACKUP_PASSWORD:
log("[INFO] Encrypting backup with 7z...") log("[INFO] Encrypting backup with 7z...")
with py7zr.SevenZipFile(compressed_file, 'w', password=BACKUP_PASSWORD) as archive: with py7zr.SevenZipFile(
compressed_file, "w", password=BACKUP_PASSWORD
) as archive:
archive.write(backup_file) archive.write(backup_file)
log("[SUCCESS] Backup encrypted successfully") log("[SUCCESS] Backup encrypted successfully")
else: else:
@ -113,11 +128,11 @@ def run_backup():
## Upload to R2 ## Upload to R2
if os.path.exists(compressed_file): if os.path.exists(compressed_file):
size = os.path.getsize(compressed_file) size = os.path.getsize(compressed_file)
log(f"[INFO] Final backup size: {size/1024/1024:.2f} MB") log(f"[INFO] Final backup size: {size / 1024 / 1024:.2f} MB")
try: try:
client = boto3.client( client = boto3.client(
's3', "s3",
endpoint_url=R2_ENDPOINT, endpoint_url=R2_ENDPOINT,
aws_access_key_id=R2_ACCESS_KEY, aws_access_key_id=R2_ACCESS_KEY,
aws_secret_access_key=R2_SECRET_KEY aws_secret_access_key=R2_SECRET_KEY
@ -139,11 +154,23 @@ def run_backup():
log(f"[SUCCESS] Backup uploaded: {compressed_file_r2}") log(f"[SUCCESS] Backup uploaded: {compressed_file_r2}")
objects = client.list_objects_v2(Bucket=R2_BUCKET_NAME, Prefix=BACKUP_PREFIX) objects = client.list_objects_v2(
if 'Contents' in objects: Bucket=R2_BUCKET_NAME,
backups = sorted(objects['Contents'], key=lambda x: x['LastModified'], reverse=True) Prefix=BACKUP_PREFIX
)
if "Contents" in objects:
backups = sorted(
objects["Contents"],
key=lambda x: x["LastModified"],
reverse=True
)
for obj in backups[MAX_BACKUPS:]: for obj in backups[MAX_BACKUPS:]:
client.delete_object(Bucket=R2_BUCKET_NAME, Key=obj['Key']) client.delete_object(
Bucket=R2_BUCKET_NAME,
Key=obj["Key"]
)
log(f"[INFO] Deleted old backup: {obj['Key']}") log(f"[INFO] Deleted old backup: {obj['Key']}")
except Exception as e: except Exception as e: