177 lines
3.4 KiB
Markdown
177 lines
3.4 KiB
Markdown

|
|

|
|

|
|

|
|
|
|
# YouTube Transcript API
|
|
|
|
A lightweight **FastAPI** service that extracts **YouTube video captions (no speech-to-text)**.
|
|
No video or audio downloads — just clean, structured captions returned as JSON.
|
|
|
|
Built to be simple, stateless, and easy to deploy anywhere.
|
|
|
|
---
|
|
|
|
## ✨ Features
|
|
|
|
* Extract **human or auto-generated captions**
|
|
* No media downloads (captions only)
|
|
* Clean JSON output with timestamps
|
|
* Accepts normal, playlist, and radio-style YouTube URLs (single video only)
|
|
* Docker friendly
|
|
* Built-in Swagger UI at `/docs`
|
|
|
|
---
|
|
|
|
## 🧪 API Usage
|
|
|
|
### Endpoint
|
|
|
|
```
|
|
POST /transcript
|
|
```
|
|
|
|
### Query Parameters
|
|
|
|
| Name | Type | Required | Description |
|
|
| ----- | ------ | -------- | ----------------- |
|
|
| `url` | string | ✅ | YouTube video URL |
|
|
|
|
Supported URLs:
|
|
|
|
* `https://www.youtube.com/watch?v=VIDEO_ID`
|
|
* `https://www.youtube.com/watch?v=VIDEO_ID&list=RD...`
|
|
* `https://youtu.be/VIDEO_ID`
|
|
|
|
---
|
|
|
|
### Example (curl)
|
|
|
|
```bash
|
|
curl -X POST "http://localhost:8000/transcript?url=https://www.youtube.com/watch?v=PY9DcIMGxMs"
|
|
```
|
|
|
|
---
|
|
|
|
### Example Response
|
|
|
|
```json
|
|
{
|
|
"video": {
|
|
"id": "PY9DcIMGxMs",
|
|
"title": "Everything you think you know about addiction is wrong | TED",
|
|
"channel": "TED",
|
|
"duration": 882,
|
|
"url": "https://www.youtube.com/watch?v=PY9DcIMGxMs"
|
|
},
|
|
"captions": [
|
|
{
|
|
"start": 12.597,
|
|
"end": 14.338,
|
|
"text": "One of my earliest memories"
|
|
}
|
|
],
|
|
"language": "auto",
|
|
"source": "human"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 📄 API Docs
|
|
|
|
Once running, open:
|
|
|
|
```
|
|
/docs
|
|
```
|
|
|
|
Swagger UI is enabled by default.
|
|
|
|
---
|
|
|
|
## 🐳 Run Locally with Docker
|
|
|
|
### Build
|
|
|
|
```bash
|
|
docker build -t youtube-transcript-api .
|
|
```
|
|
|
|
### Run
|
|
|
|
```bash
|
|
docker run -p 8000:8000 youtube-transcript-api
|
|
```
|
|
|
|
Then open:
|
|
|
|
```
|
|
http://localhost:8000/docs
|
|
```
|
|
|
|
---
|
|
|
|
## ⚙️ Environment Variables (Optional)
|
|
|
|
No environment variables are required.
|
|
|
|
| Variable | Default | Description |
|
|
| ----------------- | ------- | ---------------------------------- |
|
|
| `PORT` | `8000` | Port to bind |
|
|
| `REQUEST_TIMEOUT` | `25` | yt-dlp execution timeout (seconds) |
|
|
|
|
---
|
|
|
|
## 🧠 Design Notes
|
|
|
|
* Uses `yt-dlp` **only for metadata and captions**
|
|
* No Redis, database, or background workers
|
|
* Fully stateless and container-friendly
|
|
* Designed to fail safely with clear error responses
|
|
|
|
---
|
|
|
|
## ⚠️ Notes on Reliability
|
|
|
|
This project depends on **YouTube availability and yt-dlp behavior**.
|
|
|
|
On cloud platforms, requests may occasionally fail due to:
|
|
|
|
* IP-based rate limiting
|
|
* YouTube bot detection
|
|
* regional consent or throttling
|
|
|
|
When this happens, the API returns a structured error instead of crashing.
|
|
|
|
---
|
|
|
|
## ⚠️ Limitations
|
|
|
|
* Does **not** download audio or video
|
|
* Does **not** perform speech-to-text
|
|
* Captions must already exist on YouTube
|
|
* Shorts and embedded players are not a primary target
|
|
|
|
---
|
|
|
|
## 📜 License
|
|
|
|
MIT License
|
|
|
|
---
|
|
|
|
## 🙌 Credits
|
|
|
|
* FastAPI — [https://fastapi.tiangolo.com/](https://fastapi.tiangolo.com/)
|
|
* yt-dlp — [https://github.com/yt-dlp/yt-dlp](https://github.com/yt-dlp/yt-dlp)
|
|
|
|
---
|
|
|
|
### ✅ Status
|
|
|
|
* Docker tested
|
|
* Real-world URLs tested
|
|
* Cloud-friendly
|
|
* Ready for open-source use
|