youtube-transcript-api/README.md

3.4 KiB

License Python Framework Docker

YouTube Transcript API

A lightweight FastAPI service that extracts YouTube video captions (no speech-to-text). No video or audio downloads — just clean, structured captions returned as JSON.

Built to be simple, stateless, and easy to deploy anywhere.


Features

  • Extract human or auto-generated captions
  • No media downloads (captions only)
  • Clean JSON output with timestamps
  • Accepts normal, playlist, and radio-style YouTube URLs (single video only)
  • Docker friendly
  • Built-in Swagger UI at /docs

🧪 API Usage

Endpoint

POST /transcript

Query Parameters

Name Type Required Description
url string YouTube video URL

Supported URLs:

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://www.youtube.com/watch?v=VIDEO_ID&list=RD...
  • https://youtu.be/VIDEO_ID

Example (curl)

curl -X POST "http://localhost:8000/transcript?url=https://www.youtube.com/watch?v=PY9DcIMGxMs"

Example Response

{
  "video": {
    "id": "PY9DcIMGxMs",
    "title": "Everything you think you know about addiction is wrong | TED",
    "channel": "TED",
    "duration": 882,
    "url": "https://www.youtube.com/watch?v=PY9DcIMGxMs"
  },
  "captions": [
    {
      "start": 12.597,
      "end": 14.338,
      "text": "One of my earliest memories"
    }
  ],
  "language": "auto",
  "source": "human"
}

📄 API Docs

Once running, open:

/docs

Swagger UI is enabled by default.


🐳 Run Locally with Docker

Build

docker build -t youtube-transcript-api .

Run

docker run -p 8000:8000 youtube-transcript-api

Then open:

http://localhost:8000/docs

⚙️ Environment Variables (Optional)

No environment variables are required.

Variable Default Description
PORT 8000 Port to bind
REQUEST_TIMEOUT 25 yt-dlp execution timeout (seconds)

🧠 Design Notes

  • Uses yt-dlp only for metadata and captions
  • No Redis, database, or background workers
  • Fully stateless and container-friendly
  • Designed to fail safely with clear error responses

⚠️ Notes on Reliability

This project depends on YouTube availability and yt-dlp behavior.

On cloud platforms, requests may occasionally fail due to:

  • IP-based rate limiting
  • YouTube bot detection
  • regional consent or throttling

When this happens, the API returns a structured error instead of crashing.


⚠️ Limitations

  • Does not download audio or video
  • Does not perform speech-to-text
  • Captions must already exist on YouTube
  • Shorts and embedded players are not a primary target

📜 License

MIT License


🙌 Credits


Status

  • Docker tested
  • Real-world URLs tested
  • Cloud-friendly
  • Ready for open-source use