youtube-transcript-api/README.md

177 lines
3.4 KiB
Markdown

![License](https://img.shields.io/badge/license-MIT-blue.svg)
![Python](https://img.shields.io/badge/python-3.13-blue)
![Framework](https://img.shields.io/badge/FastAPI-green)
![Docker](https://img.shields.io/badge/docker-supported-blue)
# YouTube Transcript API
A lightweight **FastAPI** service that extracts **YouTube video captions (no speech-to-text)**.
No video or audio downloads — just clean, structured captions returned as JSON.
Built to be simple, stateless, and easy to deploy anywhere.
---
## ✨ Features
* Extract **human or auto-generated captions**
* No media downloads (captions only)
* Clean JSON output with timestamps
* Accepts normal, playlist, and radio-style YouTube URLs (single video only)
* Docker friendly
* Built-in Swagger UI at `/docs`
---
## 🧪 API Usage
### Endpoint
```
POST /transcript
```
### Query Parameters
| Name | Type | Required | Description |
| ----- | ------ | -------- | ----------------- |
| `url` | string | ✅ | YouTube video URL |
Supported URLs:
* `https://www.youtube.com/watch?v=VIDEO_ID`
* `https://www.youtube.com/watch?v=VIDEO_ID&list=RD...`
* `https://youtu.be/VIDEO_ID`
---
### Example (curl)
```bash
curl -X POST "http://localhost:8000/transcript?url=https://www.youtube.com/watch?v=PY9DcIMGxMs"
```
---
### Example Response
```json
{
"video": {
"id": "PY9DcIMGxMs",
"title": "Everything you think you know about addiction is wrong | TED",
"channel": "TED",
"duration": 882,
"url": "https://www.youtube.com/watch?v=PY9DcIMGxMs"
},
"captions": [
{
"start": 12.597,
"end": 14.338,
"text": "One of my earliest memories"
}
],
"language": "auto",
"source": "human"
}
```
---
## 📄 API Docs
Once running, open:
```
/docs
```
Swagger UI is enabled by default.
---
## 🐳 Run Locally with Docker
### Build
```bash
docker build -t youtube-transcript-api .
```
### Run
```bash
docker run -p 8000:8000 youtube-transcript-api
```
Then open:
```
http://localhost:8000/docs
```
---
## ⚙️ Environment Variables (Optional)
No environment variables are required.
| Variable | Default | Description |
| ----------------- | ------- | ---------------------------------- |
| `PORT` | `8000` | Port to bind |
| `REQUEST_TIMEOUT` | `25` | yt-dlp execution timeout (seconds) |
---
## 🧠 Design Notes
* Uses `yt-dlp` **only for metadata and captions**
* No Redis, database, or background workers
* Fully stateless and container-friendly
* Designed to fail safely with clear error responses
---
## ⚠️ Notes on Reliability
This project depends on **YouTube availability and yt-dlp behavior**.
On cloud platforms, requests may occasionally fail due to:
* IP-based rate limiting
* YouTube bot detection
* regional consent or throttling
When this happens, the API returns a structured error instead of crashing.
---
## ⚠️ Limitations
* Does **not** download audio or video
* Does **not** perform speech-to-text
* Captions must already exist on YouTube
* Shorts and embedded players are not a primary target
---
## 📜 License
MIT License
---
## 🙌 Credits
* FastAPI — [https://fastapi.tiangolo.com/](https://fastapi.tiangolo.com/)
* yt-dlp — [https://github.com/yt-dlp/yt-dlp](https://github.com/yt-dlp/yt-dlp)
---
### ✅ Status
* Docker tested
* Real-world URLs tested
* Cloud-friendly
* Ready for open-source use