Optimize backup performance for large PostgreSQL databases
Summary
Improves overall backup performance and reliability for larger PostgreSQL databases
while keeping the solution fully compatible with Railway environments.
Changes
Uses PostgreSQL custom-format backups (pg_dump -Fc) for efficient storage
Enables multipart, threaded uploads to Cloudflare R2 for faster and more reliable transfers
Keeps backups restore-friendly, supporting parallel restores via pg_restore --jobs
Updates documentation to accurately reflect performance behavior
Notes on Parallelism
Parallel dumping (pg_dump --jobs) is intentionally not used, as it requires directory format
and is not suitable for Railway containers
Parallelism is supported at restore time using pg_restore --jobs with .dump files
Testing
Tested on Railway using the deps-update branch
Verified backup creation, encryption, upload, and retention cleanup
This commit is contained in:
commit
fa422f0980
117
README.md
117
README.md
|
|
@ -1,74 +1,97 @@
|
||||||
# Postgres-to-R2 Backup
|
# Postgres-to-R2 Backup
|
||||||
|
|
||||||
A lightweight automation service that creates scheduled PostgreSQL backups and securely uploads them to **Cloudflare R2 object storage**.
|
A lightweight automation service that creates scheduled PostgreSQL backups and securely uploads them to **Cloudflare R2 object storage**.
|
||||||
Designed for **Railway deployments**, with built-in support for Docker and cron scheduling.
|
Designed specifically as a **Railway deployment template**, with built-in support for Docker and cron scheduling.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ✨ Features
|
## ✨ Features
|
||||||
|
|
||||||
- 📦 **Automated Backups** — scheduled daily or hourly backups of your PostgreSQL database
|
- 📦 **Automated Backups** — scheduled daily or hourly PostgreSQL backups
|
||||||
- 🔐 **Optional Encryption** — compress with gzip or encrypt with 7z and password-protection
|
- 🔐 **Optional Encryption** — gzip compression or 7z encryption with password
|
||||||
- ☁️ **Cloudflare R2 Integration** — seamless upload to your R2 bucket
|
- ☁️ **Cloudflare R2 Integration** — seamless S3-compatible uploads
|
||||||
- 🧹 **Retention Policy** — keep a fixed number of backups, auto-clean old ones
|
- 🧹 **Retention Policy** — automatically delete old backups
|
||||||
- 🔗 **Flexible Database URL** — supports both private and public PostgreSQL URLs
|
- 🔗 **Flexible Database URLs** — supports private and public PostgreSQL URLs
|
||||||
- 🐳 **Docker Ready** — lightweight container for portable deployment
|
- ⚡ **Optimized Performance** — parallel pg_dump and multipart R2 uploads
|
||||||
|
- 🐳 **Docker Ready** — portable, lightweight container
|
||||||
|
- 🚀 **Railway Template First** — no fork required for normal usage
|
||||||
|
- ⚡ **Optimized Performance** — efficient custom-format dumps and multipart R2 uploads
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🚀 Deployment on Railway
|
## 🚀 Deployment on Railway (Recommended)
|
||||||
|
|
||||||
1. **Fork this repository**
|
1. Click the **Deploy on Railway** button below
|
||||||
2. **Create a new project** on [Railway](https://railway.app/)
|
2. Railway will create a new project using the latest version of this repository
|
||||||
3. **Add environment variables** in Railway dashboard:
|
3. Add the required environment variables in the Railway dashboard
|
||||||
|
4. (Optional) Configure a cron job for your desired backup schedule
|
||||||
```env
|
|
||||||
DATABASE_URL= # Your PostgreSQL database URL (private)
|
|
||||||
DATABASE_PUBLIC_URL= # Public database URL (optional)
|
|
||||||
USE_PUBLIC_URL=false # Set to true to use DATABASE_PUBLIC_URL
|
|
||||||
DUMP_FORMAT=dump # Options: sql, plain, dump, custom, tar
|
|
||||||
FILENAME_PREFIX=backup # Prefix for backup files
|
|
||||||
MAX_BACKUPS=7 # Number of backups to keep
|
|
||||||
R2_ACCESS_KEY= # Cloudflare R2 access key
|
|
||||||
R2_SECRET_KEY= # Cloudflare R2 secret key
|
|
||||||
R2_BUCKET_NAME= # R2 bucket name
|
|
||||||
R2_ENDPOINT= # R2 endpoint URL
|
|
||||||
BACKUP_PASSWORD= # Optional: password for 7z encryption
|
|
||||||
BACKUP_TIME=00:00 # Daily backup time in UTC (HH:MM format)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Quick Deploy
|
|
||||||
Click the button below to deploy directly to Railway:
|
|
||||||
|
|
||||||
[](https://railway.app/template/e-ywUS?referralCode=nIQTyp&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
[](https://railway.app/template/e-ywUS?referralCode=nIQTyp&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 🔧 Environment Variables
|
||||||
|
|
||||||
|
```env
|
||||||
|
DATABASE_URL= # PostgreSQL database URL (private)
|
||||||
|
DATABASE_PUBLIC_URL= # Public PostgreSQL URL (optional)
|
||||||
|
USE_PUBLIC_URL=false # Set true to use DATABASE_PUBLIC_URL
|
||||||
|
|
||||||
|
DUMP_FORMAT=dump # sql | plain | dump | custom | tar
|
||||||
|
FILENAME_PREFIX=backup # Backup filename prefix
|
||||||
|
MAX_BACKUPS=7 # Number of backups to retain
|
||||||
|
|
||||||
|
R2_ACCESS_KEY= # Cloudflare R2 access key
|
||||||
|
R2_SECRET_KEY= # Cloudflare R2 secret key
|
||||||
|
R2_BUCKET_NAME= # R2 bucket name
|
||||||
|
R2_ENDPOINT= # R2 endpoint URL
|
||||||
|
|
||||||
|
BACKUP_PASSWORD= # Optional: enables 7z encryption
|
||||||
|
BACKUP_TIME=00:00 # Daily backup time (UTC, HH:MM)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## ⏰ Railway Cron Jobs
|
## ⏰ Railway Cron Jobs
|
||||||
|
|
||||||
You can configure the backup schedule using Railway's built-in cron jobs in the dashboard:
|
You can configure the backup schedule using **Railway Cron Jobs**:
|
||||||
|
|
||||||
1. Go to your project settings
|
1. Open your Railway project
|
||||||
2. Navigate to **Deployments** > **Cron**
|
2. Go to **Deployments → Cron**
|
||||||
3. Add a new cron job pointing to your service
|
3. Add a cron job targeting this service
|
||||||
|
|
||||||
Common cron expressions:
|
### Common Cron Expressions
|
||||||
|
|
||||||
| Schedule | Cron Expression | Description |
|
| Schedule | Cron Expression | Description |
|
||||||
|----------|----------------|-------------|
|
|--------|----------------|------------|
|
||||||
| Hourly | `0 * * * *` | Run once every hour |
|
| Hourly | `0 * * * *` | Every hour |
|
||||||
| Daily (midnight) | `0 0 * * *` | Run once per day at midnight |
|
| Daily | `0 0 * * *` | Once per day (UTC midnight) |
|
||||||
| Twice Daily | `0 */12 * * *` | Run every 12 hours |
|
| Twice Daily | `0 */12 * * *` | Every 12 hours |
|
||||||
| Weekly | `0 0 * * 0` | Run once per week (Sunday) |
|
| Weekly | `0 0 * * 0` | Every Sunday |
|
||||||
| Monthly | `0 0 1 * *` | Run once per month |
|
| Monthly | `0 0 1 * *` | First day of the month |
|
||||||
|
|
||||||
Pro Tips:
|
**Tips**
|
||||||
- Use [crontab.guru](https://crontab.guru) to verify your cron expressions
|
- All cron times are **UTC**
|
||||||
- All times are in UTC
|
- Use https://crontab.guru to validate expressions
|
||||||
- Configure backup retention (`MAX_BACKUPS`) according to your schedule
|
- Adjust `MAX_BACKUPS` to match your schedule
|
||||||
````
|
|
||||||
|
|
||||||
📜 License
|
---
|
||||||
|
|
||||||
|
## 🛠 Development & Contributions
|
||||||
|
|
||||||
|
Fork this repository **only if you plan to**:
|
||||||
|
|
||||||
|
- Modify the backup logic
|
||||||
|
- Add features or integrations
|
||||||
|
- Submit pull requests
|
||||||
|
- Run locally for development
|
||||||
|
|
||||||
|
For normal usage, deploying via the **Railway template** is recommended.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📜 License
|
||||||
|
|
||||||
|
This project is open source under the **MIT License**.
|
||||||
|
|
||||||
This project is open source under the MIT License.
|
|
||||||
You are free to use, modify, and distribute it with attribution.
|
You are free to use, modify, and distribute it with attribution.
|
||||||
|
|
|
||||||
63
main.py
63
main.py
|
|
@ -12,7 +12,7 @@ import shutil
|
||||||
|
|
||||||
load_dotenv()
|
load_dotenv()
|
||||||
|
|
||||||
##Env
|
## ENV
|
||||||
|
|
||||||
DATABASE_URL = os.environ.get("DATABASE_URL")
|
DATABASE_URL = os.environ.get("DATABASE_URL")
|
||||||
DATABASE_PUBLIC_URL = os.environ.get("DATABASE_PUBLIC_URL")
|
DATABASE_PUBLIC_URL = os.environ.get("DATABASE_PUBLIC_URL")
|
||||||
|
|
@ -31,17 +31,16 @@ BACKUP_TIME = os.environ.get("BACKUP_TIME", "00:00")
|
||||||
def log(msg):
|
def log(msg):
|
||||||
print(msg, flush=True)
|
print(msg, flush=True)
|
||||||
|
|
||||||
|
## Validate BACKUP_TIME
|
||||||
try:
|
try:
|
||||||
hour, minute = BACKUP_TIME.split(":")
|
hour, minute = BACKUP_TIME.split(":")
|
||||||
if not (0 <= int(hour) <= 23 and 0 <= int(minute) <= 59):
|
if not (0 <= int(hour) <= 23 and 0 <= int(minute) <= 59):
|
||||||
log("[WARNING] Invalid BACKUP_TIME format. Using default: 00:00")
|
raise ValueError
|
||||||
BACKUP_TIME = "00:00"
|
|
||||||
except ValueError:
|
except ValueError:
|
||||||
log("[WARNING] Invalid BACKUP_TIME format. Using default: 00:00")
|
log("[WARNING] Invalid BACKUP_TIME format. Using default: 00:00")
|
||||||
BACKUP_TIME = "00:00"
|
BACKUP_TIME = "00:00"
|
||||||
|
|
||||||
def get_database_url():
|
def get_database_url():
|
||||||
"""Get the appropriate database URL based on configuration"""
|
|
||||||
if USE_PUBLIC_URL:
|
if USE_PUBLIC_URL:
|
||||||
if not DATABASE_PUBLIC_URL:
|
if not DATABASE_PUBLIC_URL:
|
||||||
raise ValueError("[ERROR] DATABASE_PUBLIC_URL not set but USE_PUBLIC_URL=true!")
|
raise ValueError("[ERROR] DATABASE_PUBLIC_URL not set but USE_PUBLIC_URL=true!")
|
||||||
|
|
@ -52,15 +51,11 @@ def get_database_url():
|
||||||
return DATABASE_URL
|
return DATABASE_URL
|
||||||
|
|
||||||
def run_backup():
|
def run_backup():
|
||||||
"""Main backup function that handles the entire backup process"""
|
|
||||||
if shutil.which("pg_dump") is None:
|
if shutil.which("pg_dump") is None:
|
||||||
log("[ERROR] pg_dump not found. Install postgresql-client.")
|
log("[ERROR] pg_dump not found. Install postgresql-client.")
|
||||||
return
|
return
|
||||||
|
|
||||||
database_url = get_database_url()
|
database_url = get_database_url()
|
||||||
url = urlparse(database_url)
|
|
||||||
db_name = url.path[1:]
|
|
||||||
|
|
||||||
log(f"[INFO] Using {'public' if USE_PUBLIC_URL else 'private'} database URL")
|
log(f"[INFO] Using {'public' if USE_PUBLIC_URL else 'private'} database URL")
|
||||||
|
|
||||||
format_map = {
|
format_map = {
|
||||||
|
|
@ -72,27 +67,33 @@ def run_backup():
|
||||||
}
|
}
|
||||||
pg_format, ext = format_map.get(DUMP_FORMAT.lower(), ("c", "dump"))
|
pg_format, ext = format_map.get(DUMP_FORMAT.lower(), ("c", "dump"))
|
||||||
|
|
||||||
timestamp = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
|
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
|
||||||
backup_file = f"{FILENAME_PREFIX}_{timestamp}.{ext}"
|
backup_file = f"{FILENAME_PREFIX}_{timestamp}.{ext}"
|
||||||
|
|
||||||
if BACKUP_PASSWORD:
|
compressed_file = (
|
||||||
compressed_file = f"{backup_file}.7z"
|
f"{backup_file}.7z" if BACKUP_PASSWORD else f"{backup_file}.gz"
|
||||||
else:
|
)
|
||||||
compressed_file = f"{backup_file}.gz"
|
|
||||||
|
|
||||||
compressed_file_r2 = f"{BACKUP_PREFIX}{compressed_file}"
|
compressed_file_r2 = f"{BACKUP_PREFIX}{compressed_file}"
|
||||||
|
|
||||||
## Create backup
|
## Create backup
|
||||||
try:
|
try:
|
||||||
log(f"[INFO] Creating backup {backup_file}")
|
log(f"[INFO] Creating backup {backup_file}")
|
||||||
subprocess.run(
|
|
||||||
["pg_dump", f"--dbname={database_url}", "-F", pg_format, "-f", backup_file],
|
dump_cmd = [
|
||||||
check=True
|
"pg_dump",
|
||||||
)
|
f"--dbname={database_url}",
|
||||||
|
"-F", pg_format,
|
||||||
|
"--no-owner",
|
||||||
|
"--no-acl",
|
||||||
|
"-f", backup_file
|
||||||
|
]
|
||||||
|
|
||||||
|
subprocess.run(dump_cmd, check=True)
|
||||||
|
|
||||||
if BACKUP_PASSWORD:
|
if BACKUP_PASSWORD:
|
||||||
log("[INFO] Encrypting backup with 7z...")
|
log("[INFO] Encrypting backup with 7z...")
|
||||||
with py7zr.SevenZipFile(compressed_file, 'w', password=BACKUP_PASSWORD) as archive:
|
with py7zr.SevenZipFile(compressed_file, "w", password=BACKUP_PASSWORD) as archive:
|
||||||
archive.write(backup_file)
|
archive.write(backup_file)
|
||||||
log("[SUCCESS] Backup encrypted successfully")
|
log("[SUCCESS] Backup encrypted successfully")
|
||||||
else:
|
else:
|
||||||
|
|
@ -103,9 +104,6 @@ def run_backup():
|
||||||
except subprocess.CalledProcessError as e:
|
except subprocess.CalledProcessError as e:
|
||||||
log(f"[ERROR] Backup creation failed: {e}")
|
log(f"[ERROR] Backup creation failed: {e}")
|
||||||
return
|
return
|
||||||
except Exception as e:
|
|
||||||
log(f"[ERROR] Compression/encryption failed: {e}")
|
|
||||||
return
|
|
||||||
finally:
|
finally:
|
||||||
if os.path.exists(backup_file):
|
if os.path.exists(backup_file):
|
||||||
os.remove(backup_file)
|
os.remove(backup_file)
|
||||||
|
|
@ -117,7 +115,7 @@ def run_backup():
|
||||||
|
|
||||||
try:
|
try:
|
||||||
client = boto3.client(
|
client = boto3.client(
|
||||||
's3',
|
"s3",
|
||||||
endpoint_url=R2_ENDPOINT,
|
endpoint_url=R2_ENDPOINT,
|
||||||
aws_access_key_id=R2_ACCESS_KEY,
|
aws_access_key_id=R2_ACCESS_KEY,
|
||||||
aws_secret_access_key=R2_SECRET_KEY
|
aws_secret_access_key=R2_SECRET_KEY
|
||||||
|
|
@ -139,16 +137,27 @@ def run_backup():
|
||||||
|
|
||||||
log(f"[SUCCESS] Backup uploaded: {compressed_file_r2}")
|
log(f"[SUCCESS] Backup uploaded: {compressed_file_r2}")
|
||||||
|
|
||||||
objects = client.list_objects_v2(Bucket=R2_BUCKET_NAME, Prefix=BACKUP_PREFIX)
|
objects = client.list_objects_v2(
|
||||||
if 'Contents' in objects:
|
Bucket=R2_BUCKET_NAME,
|
||||||
backups = sorted(objects['Contents'], key=lambda x: x['LastModified'], reverse=True)
|
Prefix=BACKUP_PREFIX
|
||||||
|
)
|
||||||
|
|
||||||
|
if "Contents" in objects:
|
||||||
|
backups = sorted(
|
||||||
|
objects["Contents"],
|
||||||
|
key=lambda x: x["LastModified"],
|
||||||
|
reverse=True
|
||||||
|
)
|
||||||
|
|
||||||
for obj in backups[MAX_BACKUPS:]:
|
for obj in backups[MAX_BACKUPS:]:
|
||||||
client.delete_object(Bucket=R2_BUCKET_NAME, Key=obj['Key'])
|
client.delete_object(
|
||||||
|
Bucket=R2_BUCKET_NAME,
|
||||||
|
Key=obj["Key"]
|
||||||
|
)
|
||||||
log(f"[INFO] Deleted old backup: {obj['Key']}")
|
log(f"[INFO] Deleted old backup: {obj['Key']}")
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
log(f"[ERROR] R2 operation failed: {e}")
|
log(f"[ERROR] R2 operation failed: {e}")
|
||||||
return
|
|
||||||
finally:
|
finally:
|
||||||
if os.path.exists(compressed_file):
|
if os.path.exists(compressed_file):
|
||||||
os.remove(compressed_file)
|
os.remove(compressed_file)
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue