3.4 KiB
3.4 KiB
YouTube Transcript API
A lightweight FastAPI service that extracts YouTube video captions (no speech-to-text). No video or audio downloads — just clean, structured captions returned as JSON.
Built to be simple, stateless, and easy to deploy anywhere.
✨ Features
- Extract human or auto-generated captions
- No media downloads (captions only)
- Clean JSON output with timestamps
- Accepts normal, playlist, and radio-style YouTube URLs (single video only)
- Docker friendly
- Built-in Swagger UI at
/docs
🧪 API Usage
Endpoint
POST /transcript
Query Parameters
| Name | Type | Required | Description |
|---|---|---|---|
url |
string | ✅ | YouTube video URL |
Supported URLs:
https://www.youtube.com/watch?v=VIDEO_IDhttps://www.youtube.com/watch?v=VIDEO_ID&list=RD...https://youtu.be/VIDEO_ID
Example (curl)
curl -X POST "http://localhost:8000/transcript?url=https://www.youtube.com/watch?v=PY9DcIMGxMs"
Example Response
{
"video": {
"id": "PY9DcIMGxMs",
"title": "Everything you think you know about addiction is wrong | TED",
"channel": "TED",
"duration": 882,
"url": "https://www.youtube.com/watch?v=PY9DcIMGxMs"
},
"captions": [
{
"start": 12.597,
"end": 14.338,
"text": "One of my earliest memories"
}
],
"language": "auto",
"source": "human"
}
📄 API Docs
Once running, open:
/docs
Swagger UI is enabled by default.
🐳 Run Locally with Docker
Build
docker build -t youtube-transcript-api .
Run
docker run -p 8000:8000 youtube-transcript-api
Then open:
http://localhost:8000/docs
⚙️ Environment Variables (Optional)
No environment variables are required.
| Variable | Default | Description |
|---|---|---|
PORT |
8000 |
Port to bind |
REQUEST_TIMEOUT |
25 |
yt-dlp execution timeout (seconds) |
🧠 Design Notes
- Uses
yt-dlponly for metadata and captions - No Redis, database, or background workers
- Fully stateless and container-friendly
- Designed to fail safely with clear error responses
⚠️ Notes on Reliability
This project depends on YouTube availability and yt-dlp behavior.
On cloud platforms, requests may occasionally fail due to:
- IP-based rate limiting
- YouTube bot detection
- regional consent or throttling
When this happens, the API returns a structured error instead of crashing.
⚠️ Limitations
- Does not download audio or video
- Does not perform speech-to-text
- Captions must already exist on YouTube
- Shorts and embedded players are not a primary target
📜 License
MIT License
🙌 Credits
- FastAPI — https://fastapi.tiangolo.com/
- yt-dlp — https://github.com/yt-dlp/yt-dlp
✅ Status
- Docker tested
- Real-world URLs tested
- Cloud-friendly
- Ready for open-source use