Add parallel pg_dump support and document PG_DUMP_JOBS

This commit is contained in:
BigDaddyAman 2025-12-23 11:08:14 +08:00
parent 07e45aa13e
commit 367d366dec
2 changed files with 137 additions and 64 deletions

140
README.md
View File

@ -1,74 +1,120 @@
# Postgres-to-R2 Backup
A lightweight automation service that creates scheduled PostgreSQL backups and securely uploads them to **Cloudflare R2 object storage**.
Designed for **Railway deployments**, with built-in support for Docker and cron scheduling.
Designed specifically as a **Railway deployment template**, with built-in support for Docker and cron scheduling.
---
## ✨ Features
- 📦 **Automated Backups** — scheduled daily or hourly backups of your PostgreSQL database
- 🔐 **Optional Encryption** — compress with gzip or encrypt with 7z and password-protection
- ☁️ **Cloudflare R2 Integration** — seamless upload to your R2 bucket
- 🧹 **Retention Policy** — keep a fixed number of backups, auto-clean old ones
- 🔗 **Flexible Database URL** — supports both private and public PostgreSQL URLs
- 🐳 **Docker Ready** — lightweight container for portable deployment
- 📦 **Automated Backups** — scheduled daily or hourly PostgreSQL backups
- 🔐 **Optional Encryption** — gzip compression or 7z encryption with password
- ☁️ **Cloudflare R2 Integration** — seamless S3-compatible uploads
- 🧹 **Retention Policy** — automatically delete old backups
- 🔗 **Flexible Database URLs** — supports private and public PostgreSQL URLs
- ⚡ **Optimized Performance** — parallel pg_dump and multipart R2 uploads
- 🐳 **Docker Ready** — portable, lightweight container
- 🚀 **Railway Template First** — no fork required for normal usage
---
## 🚀 Deployment on Railway
## 🚀 Deployment on Railway (Recommended)
1. **Fork this repository**
2. **Create a new project** on [Railway](https://railway.app/)
3. **Add environment variables** in Railway dashboard:
```env
DATABASE_URL= # Your PostgreSQL database URL (private)
DATABASE_PUBLIC_URL= # Public database URL (optional)
USE_PUBLIC_URL=false # Set to true to use DATABASE_PUBLIC_URL
DUMP_FORMAT=dump # Options: sql, plain, dump, custom, tar
FILENAME_PREFIX=backup # Prefix for backup files
MAX_BACKUPS=7 # Number of backups to keep
R2_ACCESS_KEY= # Cloudflare R2 access key
R2_SECRET_KEY= # Cloudflare R2 secret key
R2_BUCKET_NAME= # R2 bucket name
R2_ENDPOINT= # R2 endpoint URL
BACKUP_PASSWORD= # Optional: password for 7z encryption
BACKUP_TIME=00:00 # Daily backup time in UTC (HH:MM format)
```
### Quick Deploy
Click the button below to deploy directly to Railway:
1. Click the **Deploy on Railway** button below
2. Railway will create a new project using the latest version of this repository
3. Add the required environment variables in the Railway dashboard
4. (Optional) Configure a cron job for your desired backup schedule
[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/e-ywUS?referralCode=nIQTyp&utm_medium=integration&utm_source=template&utm_campaign=generic)
---
## 🔧 Environment Variables
```env
DATABASE_URL= # PostgreSQL database URL (private)
DATABASE_PUBLIC_URL= # Public PostgreSQL URL (optional)
USE_PUBLIC_URL=false # Set true to use DATABASE_PUBLIC_URL
DUMP_FORMAT=dump # sql | plain | dump | custom | tar
FILENAME_PREFIX=backup # Backup filename prefix
MAX_BACKUPS=7 # Number of backups to retain
PG_DUMP_JOBS=1 # Optional: parallel pg_dump jobs (use 24 for 12GB DBs)
R2_ACCESS_KEY= # Cloudflare R2 access key
R2_SECRET_KEY= # Cloudflare R2 secret key
R2_BUCKET_NAME= # R2 bucket name
R2_ENDPOINT= # R2 endpoint URL
BACKUP_PASSWORD= # Optional: enables 7z encryption
BACKUP_TIME=00:00 # Daily backup time (UTC, HH:MM)
```
---
## ⚡ Performance Optimization (Optional)
For larger databases (≈12 GB), you can significantly speed up backups by enabling
parallel PostgreSQL dumps.
### Parallel pg_dump
Set the number of parallel jobs:
```env
PG_DUMP_JOBS=4
```
**Notes**
- Only applies to `dump`, `custom`, or `tar` formats
- Default is `1` (safe for all users)
- Recommended values: `24`
- Higher values may overload small databases
This feature is **fully optional** and disabled by default.
---
## ⏰ Railway Cron Jobs
You can configure the backup schedule using Railway's built-in cron jobs in the dashboard:
You can configure the backup schedule using **Railway Cron Jobs**:
1. Go to your project settings
2. Navigate to **Deployments** > **Cron**
3. Add a new cron job pointing to your service
1. Open your Railway project
2. Go to **Deployments → Cron**
3. Add a cron job targeting this service
Common cron expressions:
### Common Cron Expressions
| Schedule | Cron Expression | Description |
|----------|----------------|-------------|
| Hourly | `0 * * * *` | Run once every hour |
| Daily (midnight) | `0 0 * * *` | Run once per day at midnight |
| Twice Daily | `0 */12 * * *` | Run every 12 hours |
| Weekly | `0 0 * * 0` | Run once per week (Sunday) |
| Monthly | `0 0 1 * *` | Run once per month |
|--------|----------------|------------|
| Hourly | `0 * * * *` | Every hour |
| Daily | `0 0 * * *` | Once per day (UTC midnight) |
| Twice Daily | `0 */12 * * *` | Every 12 hours |
| Weekly | `0 0 * * 0` | Every Sunday |
| Monthly | `0 0 1 * *` | First day of the month |
Pro Tips:
- Use [crontab.guru](https://crontab.guru) to verify your cron expressions
- All times are in UTC
- Configure backup retention (`MAX_BACKUPS`) according to your schedule
````
**Tips**
- All cron times are **UTC**
- Use https://crontab.guru to validate expressions
- Adjust `MAX_BACKUPS` to match your schedule
📜 License
---
## 🛠 Development & Contributions
Fork this repository **only if you plan to**:
- Modify the backup logic
- Add features or integrations
- Submit pull requests
- Run locally for development
For normal usage, deploying via the **Railway template** is recommended.
---
## 📜 License
This project is open source under the **MIT License**.
This project is open source under the MIT License.
You are free to use, modify, and distribute it with attribution.

61
main.py
View File

@ -27,6 +27,7 @@ DUMP_FORMAT = os.environ.get("DUMP_FORMAT", "dump")
BACKUP_PASSWORD = os.environ.get("BACKUP_PASSWORD")
USE_PUBLIC_URL = os.environ.get("USE_PUBLIC_URL", "false").lower() == "true"
BACKUP_TIME = os.environ.get("BACKUP_TIME", "00:00")
PG_DUMP_JOBS = int(os.environ.get("PG_DUMP_JOBS", "1"))
def log(msg):
print(msg, flush=True)
@ -75,24 +76,38 @@ def run_backup():
timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
backup_file = f"{FILENAME_PREFIX}_{timestamp}.{ext}"
if BACKUP_PASSWORD:
compressed_file = f"{backup_file}.7z"
else:
compressed_file = f"{backup_file}.gz"
compressed_file = (
f"{backup_file}.7z" if BACKUP_PASSWORD else f"{backup_file}.gz"
)
compressed_file_r2 = f"{BACKUP_PREFIX}{compressed_file}"
##Create backup
# --------------------------
# Create backup
# --------------------------
try:
log(f"[INFO] Creating backup {backup_file}")
subprocess.run(
["pg_dump", f"--dbname={database_url}", "-F", pg_format, "-f", backup_file],
check=True
)
dump_cmd = [
"pg_dump",
f"--dbname={database_url}",
"-F", pg_format,
"--no-owner",
"--no-acl",
"-f", backup_file
]
if pg_format in ("c", "t") and PG_DUMP_JOBS > 1:
dump_cmd.insert(-2, f"--jobs={PG_DUMP_JOBS}")
log(f"[INFO] Using parallel pg_dump with {PG_DUMP_JOBS} jobs")
subprocess.run(dump_cmd, check=True)
if BACKUP_PASSWORD:
log("[INFO] Encrypting backup with 7z...")
with py7zr.SevenZipFile(compressed_file, 'w', password=BACKUP_PASSWORD) as archive:
with py7zr.SevenZipFile(
compressed_file, "w", password=BACKUP_PASSWORD
) as archive:
archive.write(backup_file)
log("[SUCCESS] Backup encrypted successfully")
else:
@ -113,11 +128,11 @@ def run_backup():
## Upload to R2
if os.path.exists(compressed_file):
size = os.path.getsize(compressed_file)
log(f"[INFO] Final backup size: {size/1024/1024:.2f} MB")
log(f"[INFO] Final backup size: {size / 1024 / 1024:.2f} MB")
try:
client = boto3.client(
's3',
"s3",
endpoint_url=R2_ENDPOINT,
aws_access_key_id=R2_ACCESS_KEY,
aws_secret_access_key=R2_SECRET_KEY
@ -139,11 +154,23 @@ def run_backup():
log(f"[SUCCESS] Backup uploaded: {compressed_file_r2}")
objects = client.list_objects_v2(Bucket=R2_BUCKET_NAME, Prefix=BACKUP_PREFIX)
if 'Contents' in objects:
backups = sorted(objects['Contents'], key=lambda x: x['LastModified'], reverse=True)
objects = client.list_objects_v2(
Bucket=R2_BUCKET_NAME,
Prefix=BACKUP_PREFIX
)
if "Contents" in objects:
backups = sorted(
objects["Contents"],
key=lambda x: x["LastModified"],
reverse=True
)
for obj in backups[MAX_BACKUPS:]:
client.delete_object(Bucket=R2_BUCKET_NAME, Key=obj['Key'])
client.delete_object(
Bucket=R2_BUCKET_NAME,
Key=obj["Key"]
)
log(f"[INFO] Deleted old backup: {obj['Key']}")
except Exception as e: