Optimize backup performance for large PostgreSQL databases

Summary

Improves overall backup performance and reliability for larger PostgreSQL databases
while keeping the solution fully compatible with Railway environments.
Changes

    Uses PostgreSQL custom-format backups (pg_dump -Fc) for efficient storage
    Enables multipart, threaded uploads to Cloudflare R2 for faster and more reliable transfers
    Keeps backups restore-friendly, supporting parallel restores via pg_restore --jobs
    Updates documentation to accurately reflect performance behavior

Notes on Parallelism

    Parallel dumping (pg_dump --jobs) is intentionally not used, as it requires directory format
    and is not suitable for Railway containers
    Parallelism is supported at restore time using pg_restore --jobs with .dump files

Testing

    Tested on Railway using the deps-update branch
    Verified backup creation, encryption, upload, and retention cleanup
This commit is contained in:
Aman 2025-12-23 11:23:40 +08:00 committed by GitHub
commit fa422f0980
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 114 additions and 82 deletions

117
README.md
View File

@ -1,74 +1,97 @@
# Postgres-to-R2 Backup
A lightweight automation service that creates scheduled PostgreSQL backups and securely uploads them to **Cloudflare R2 object storage**.
Designed for **Railway deployments**, with built-in support for Docker and cron scheduling.
Designed specifically as a **Railway deployment template**, with built-in support for Docker and cron scheduling.
---
## ✨ Features
- 📦 **Automated Backups** — scheduled daily or hourly backups of your PostgreSQL database
- 🔐 **Optional Encryption** — compress with gzip or encrypt with 7z and password-protection
- ☁️ **Cloudflare R2 Integration** — seamless upload to your R2 bucket
- 🧹 **Retention Policy** — keep a fixed number of backups, auto-clean old ones
- 🔗 **Flexible Database URL** — supports both private and public PostgreSQL URLs
- 🐳 **Docker Ready** — lightweight container for portable deployment
- 📦 **Automated Backups** — scheduled daily or hourly PostgreSQL backups
- 🔐 **Optional Encryption** — gzip compression or 7z encryption with password
- ☁️ **Cloudflare R2 Integration** — seamless S3-compatible uploads
- 🧹 **Retention Policy** — automatically delete old backups
- 🔗 **Flexible Database URLs** — supports private and public PostgreSQL URLs
- ⚡ **Optimized Performance** — parallel pg_dump and multipart R2 uploads
- 🐳 **Docker Ready** — portable, lightweight container
- 🚀 **Railway Template First** — no fork required for normal usage
- ⚡ **Optimized Performance** — efficient custom-format dumps and multipart R2 uploads
---
## 🚀 Deployment on Railway
## 🚀 Deployment on Railway (Recommended)
1. **Fork this repository**
2. **Create a new project** on [Railway](https://railway.app/)
3. **Add environment variables** in Railway dashboard:
```env
DATABASE_URL= # Your PostgreSQL database URL (private)
DATABASE_PUBLIC_URL= # Public database URL (optional)
USE_PUBLIC_URL=false # Set to true to use DATABASE_PUBLIC_URL
DUMP_FORMAT=dump # Options: sql, plain, dump, custom, tar
FILENAME_PREFIX=backup # Prefix for backup files
MAX_BACKUPS=7 # Number of backups to keep
R2_ACCESS_KEY= # Cloudflare R2 access key
R2_SECRET_KEY= # Cloudflare R2 secret key
R2_BUCKET_NAME= # R2 bucket name
R2_ENDPOINT= # R2 endpoint URL
BACKUP_PASSWORD= # Optional: password for 7z encryption
BACKUP_TIME=00:00 # Daily backup time in UTC (HH:MM format)
```
### Quick Deploy
Click the button below to deploy directly to Railway:
1. Click the **Deploy on Railway** button below
2. Railway will create a new project using the latest version of this repository
3. Add the required environment variables in the Railway dashboard
4. (Optional) Configure a cron job for your desired backup schedule
[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/e-ywUS?referralCode=nIQTyp&utm_medium=integration&utm_source=template&utm_campaign=generic)
---
## 🔧 Environment Variables
```env
DATABASE_URL= # PostgreSQL database URL (private)
DATABASE_PUBLIC_URL= # Public PostgreSQL URL (optional)
USE_PUBLIC_URL=false # Set true to use DATABASE_PUBLIC_URL
DUMP_FORMAT=dump # sql | plain | dump | custom | tar
FILENAME_PREFIX=backup # Backup filename prefix
MAX_BACKUPS=7 # Number of backups to retain
R2_ACCESS_KEY= # Cloudflare R2 access key
R2_SECRET_KEY= # Cloudflare R2 secret key
R2_BUCKET_NAME= # R2 bucket name
R2_ENDPOINT= # R2 endpoint URL
BACKUP_PASSWORD= # Optional: enables 7z encryption
BACKUP_TIME=00:00 # Daily backup time (UTC, HH:MM)
```
---
## ⏰ Railway Cron Jobs
You can configure the backup schedule using Railway's built-in cron jobs in the dashboard:
You can configure the backup schedule using **Railway Cron Jobs**:
1. Go to your project settings
2. Navigate to **Deployments** > **Cron**
3. Add a new cron job pointing to your service
1. Open your Railway project
2. Go to **Deployments → Cron**
3. Add a cron job targeting this service
Common cron expressions:
### Common Cron Expressions
| Schedule | Cron Expression | Description |
|----------|----------------|-------------|
| Hourly | `0 * * * *` | Run once every hour |
| Daily (midnight) | `0 0 * * *` | Run once per day at midnight |
| Twice Daily | `0 */12 * * *` | Run every 12 hours |
| Weekly | `0 0 * * 0` | Run once per week (Sunday) |
| Monthly | `0 0 1 * *` | Run once per month |
|--------|----------------|------------|
| Hourly | `0 * * * *` | Every hour |
| Daily | `0 0 * * *` | Once per day (UTC midnight) |
| Twice Daily | `0 */12 * * *` | Every 12 hours |
| Weekly | `0 0 * * 0` | Every Sunday |
| Monthly | `0 0 1 * *` | First day of the month |
Pro Tips:
- Use [crontab.guru](https://crontab.guru) to verify your cron expressions
- All times are in UTC
- Configure backup retention (`MAX_BACKUPS`) according to your schedule
````
**Tips**
- All cron times are **UTC**
- Use https://crontab.guru to validate expressions
- Adjust `MAX_BACKUPS` to match your schedule
📜 License
---
## 🛠 Development & Contributions
Fork this repository **only if you plan to**:
- Modify the backup logic
- Add features or integrations
- Submit pull requests
- Run locally for development
For normal usage, deploying via the **Railway template** is recommended.
---
## 📜 License
This project is open source under the **MIT License**.
This project is open source under the MIT License.
You are free to use, modify, and distribute it with attribution.

79
main.py
View File

@ -12,7 +12,7 @@ import shutil
load_dotenv()
##Env
## ENV
DATABASE_URL = os.environ.get("DATABASE_URL")
DATABASE_PUBLIC_URL = os.environ.get("DATABASE_PUBLIC_URL")
@ -31,36 +31,31 @@ BACKUP_TIME = os.environ.get("BACKUP_TIME", "00:00")
def log(msg):
print(msg, flush=True)
## Validate BACKUP_TIME
try:
hour, minute = BACKUP_TIME.split(":")
if not (0 <= int(hour) <= 23 and 0 <= int(minute) <= 59):
log("[WARNING] Invalid BACKUP_TIME format. Using default: 00:00")
BACKUP_TIME = "00:00"
raise ValueError
except ValueError:
log("[WARNING] Invalid BACKUP_TIME format. Using default: 00:00")
BACKUP_TIME = "00:00"
def get_database_url():
"""Get the appropriate database URL based on configuration"""
if USE_PUBLIC_URL:
if not DATABASE_PUBLIC_URL:
raise ValueError("[ERROR] DATABASE_PUBLIC_URL not set but USE_PUBLIC_URL=true!")
return DATABASE_PUBLIC_URL
if not DATABASE_URL:
raise ValueError("[ERROR] DATABASE_URL not set!")
return DATABASE_URL
def run_backup():
"""Main backup function that handles the entire backup process"""
if shutil.which("pg_dump") is None:
log("[ERROR] pg_dump not found. Install postgresql-client.")
return
database_url = get_database_url()
url = urlparse(database_url)
db_name = url.path[1:]
log(f"[INFO] Using {'public' if USE_PUBLIC_URL else 'private'} database URL")
format_map = {
@ -72,27 +67,33 @@ def run_backup():
}
pg_format, ext = format_map.get(DUMP_FORMAT.lower(), ("c", "dump"))
timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
backup_file = f"{FILENAME_PREFIX}_{timestamp}.{ext}"
if BACKUP_PASSWORD:
compressed_file = f"{backup_file}.7z"
else:
compressed_file = f"{backup_file}.gz"
compressed_file = (
f"{backup_file}.7z" if BACKUP_PASSWORD else f"{backup_file}.gz"
)
compressed_file_r2 = f"{BACKUP_PREFIX}{compressed_file}"
##Create backup
## Create backup
try:
log(f"[INFO] Creating backup {backup_file}")
subprocess.run(
["pg_dump", f"--dbname={database_url}", "-F", pg_format, "-f", backup_file],
check=True
)
dump_cmd = [
"pg_dump",
f"--dbname={database_url}",
"-F", pg_format,
"--no-owner",
"--no-acl",
"-f", backup_file
]
subprocess.run(dump_cmd, check=True)
if BACKUP_PASSWORD:
log("[INFO] Encrypting backup with 7z...")
with py7zr.SevenZipFile(compressed_file, 'w', password=BACKUP_PASSWORD) as archive:
with py7zr.SevenZipFile(compressed_file, "w", password=BACKUP_PASSWORD) as archive:
archive.write(backup_file)
log("[SUCCESS] Backup encrypted successfully")
else:
@ -103,9 +104,6 @@ def run_backup():
except subprocess.CalledProcessError as e:
log(f"[ERROR] Backup creation failed: {e}")
return
except Exception as e:
log(f"[ERROR] Compression/encryption failed: {e}")
return
finally:
if os.path.exists(backup_file):
os.remove(backup_file)
@ -113,11 +111,11 @@ def run_backup():
## Upload to R2
if os.path.exists(compressed_file):
size = os.path.getsize(compressed_file)
log(f"[INFO] Final backup size: {size/1024/1024:.2f} MB")
log(f"[INFO] Final backup size: {size / 1024 / 1024:.2f} MB")
try:
client = boto3.client(
's3',
"s3",
endpoint_url=R2_ENDPOINT,
aws_access_key_id=R2_ACCESS_KEY,
aws_secret_access_key=R2_SECRET_KEY
@ -139,16 +137,27 @@ def run_backup():
log(f"[SUCCESS] Backup uploaded: {compressed_file_r2}")
objects = client.list_objects_v2(Bucket=R2_BUCKET_NAME, Prefix=BACKUP_PREFIX)
if 'Contents' in objects:
backups = sorted(objects['Contents'], key=lambda x: x['LastModified'], reverse=True)
objects = client.list_objects_v2(
Bucket=R2_BUCKET_NAME,
Prefix=BACKUP_PREFIX
)
if "Contents" in objects:
backups = sorted(
objects["Contents"],
key=lambda x: x["LastModified"],
reverse=True
)
for obj in backups[MAX_BACKUPS:]:
client.delete_object(Bucket=R2_BUCKET_NAME, Key=obj['Key'])
client.delete_object(
Bucket=R2_BUCKET_NAME,
Key=obj["Key"]
)
log(f"[INFO] Deleted old backup: {obj['Key']}")
except Exception as e:
log(f"[ERROR] R2 operation failed: {e}")
return
finally:
if os.path.exists(compressed_file):
os.remove(compressed_file)
@ -156,11 +165,11 @@ def run_backup():
if __name__ == "__main__":
log("[INFO] Starting backup scheduler...")
log(f"[INFO] Scheduled backup time: {BACKUP_TIME} UTC")
schedule.every().day.at(BACKUP_TIME).do(run_backup)
run_backup()
while True:
schedule.run_pending()
time.sleep(60)
time.sleep(60)