Initial release: YouTube Transcript API v1.0.0

2026-02-02 01:52:38 +08:00 · 2026-02-02 01:52:38 +08:00 · 3adbc16dfb
commit 3adbc16dfb
20 changed files with 589 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,5 @@
 .venv/
 __pycache__/
 *.pyc
 .env
 .DS_Store
--- a/76
+++ b/76
@ -0,0 +1,76 @@
 # Contributing
 Thanks for your interest in contributing! 🎉  
 This project aims to stay **simple, stable, and template-friendly**, so please read this first.
 ---
 ## 🧭 Project Principles
 - **Caption-only** (no audio/video downloads)
 - **Stateless** (no database required)
 - **Railway & Docker friendly**
 - **Minimal dependencies**
 - **Clear API contracts**
 Changes that break these principles are unlikely to be accepted.
 ---
 ## 🐛 Reporting Bugs
 Please include:
 - The YouTube URL used
 - Expected vs actual behavior
 - Logs or error messages
 - Whether captions were human or auto-generated
 Open an issue with a clear title and reproduction steps.
 ---
 ## ✨ Feature Requests
 Good feature requests:
 - Improve caption parsing / cleanup
 - Better validation or error messages
 - Performance or stability improvements
 - Optional flags that do NOT break defaults
 Please avoid:
 - Adding mandatory databases
 - Downloading media files
 - Authentication requirements
 ---
 ## 🧪 Development Setup
 ```bash
 python -m venv .venv
 source .venv/bin/activate  # Windows: .venv\Scripts\activate
 pip install -r requirements.txt
 uvicorn app.main:app --reload
 ```
 Open:
 ```
 http://localhost:8000/docs
 ```
 ---
 ## 🧹 Code Style
 - Python 3.12+ compatible
 - Use type hints
 - Keep functions small and readable
 - Avoid over-engineering
 ---
 ## 📜 License
 By contributing, you agree that your contributions will be licensed under the **MIT License**.
 Thank you for helping improve the project 🙌
--- a/19
+++ b/19
@ -0,0 +1,19 @@
 FROM python:3.13-slim
 ENV PYTHONUNBUFFERED=1
 ENV PIP_NO_CACHE_DIR=1
 RUN apt-get update && apt-get install -y \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*
 WORKDIR /app
 COPY requirements.txt .
 RUN pip install -r requirements.txt
 COPY . .
 EXPOSE 8000
 CMD ["sh", "-c", "uvicorn app.main:app --host 0.0.0.0 --port ${PORT:-8000}"]
--- a/21
+++ b/21
@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2026 Aman
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/README.md
+++ b/README.md
@ -0,0 +1,176 @@
 ![License](https://img.shields.io/badge/license-MIT-blue.svg)
 ![Python](https://img.shields.io/badge/python-3.13-blue)
 ![Framework](https://img.shields.io/badge/FastAPI-green)
 ![Docker](https://img.shields.io/badge/docker-supported-blue)
 # YouTube Transcript API
 A lightweight **FastAPI** service that extracts **YouTube video captions (no speech-to-text)**.
 No video or audio downloads — just clean, structured captions returned as JSON.
 Built to be simple, stateless, and easy to deploy anywhere.
 ---
 ## ✨ Features
 * Extract **human or auto-generated captions**
 * No media downloads (captions only)
 * Clean JSON output with timestamps
 * Accepts normal, playlist, and radio-style YouTube URLs (single video only)
 * Docker friendly
 * Built-in Swagger UI at `/docs`
 ---
 ## 🧪 API Usage
 ### Endpoint
 ```
 POST /transcript
 ```
 ### Query Parameters
 | Name  | Type   | Required | Description       |
 | ----- | ------ | -------- | ----------------- |
 | `url` | string | ✅        | YouTube video URL |
 Supported URLs:
 * `https://www.youtube.com/watch?v=VIDEO_ID`
 * `https://www.youtube.com/watch?v=VIDEO_ID&list=RD...`
 * `https://youtu.be/VIDEO_ID`
 ---
 ### Example (curl)
 ```bash
 curl -X POST "http://localhost:8000/transcript?url=https://www.youtube.com/watch?v=PY9DcIMGxMs"
 ```
 ---
 ### Example Response
 ```json
 {
  "video": {
    "id": "PY9DcIMGxMs",
    "title": "Everything you think you know about addiction is wrong | TED",
    "channel": "TED",
    "duration": 882,
    "url": "https://www.youtube.com/watch?v=PY9DcIMGxMs"
  },
  "captions": [
    {
      "start": 12.597,
      "end": 14.338,
      "text": "One of my earliest memories"
    }
  ],
  "language": "auto",
  "source": "human"
 }
 ```
 ---
 ## 📄 API Docs
 Once running, open:
 ```
 /docs
 ```
 Swagger UI is enabled by default.
 ---
 ## 🐳 Run Locally with Docker
 ### Build
 ```bash
 docker build -t youtube-transcript-api .
 ```
 ### Run
 ```bash
 docker run -p 8000:8000 youtube-transcript-api
 ```
 Then open:
 ```
 http://localhost:8000/docs
 ```
 ---
 ## ⚙️ Environment Variables (Optional)
 No environment variables are required.
 | Variable          | Default | Description                        |
 | ----------------- | ------- | ---------------------------------- |
 | `PORT`            | `8000`  | Port to bind                       |
 | `REQUEST_TIMEOUT` | `25`    | yt-dlp execution timeout (seconds) |
 ---
 ## 🧠 Design Notes
 * Uses `yt-dlp` **only for metadata and captions**
 * No Redis, database, or background workers
 * Fully stateless and container-friendly
 * Designed to fail safely with clear error responses
 ---
 ## ⚠️ Notes on Reliability
 This project depends on **YouTube availability and yt-dlp behavior**.
 On cloud platforms, requests may occasionally fail due to:
 * IP-based rate limiting
 * YouTube bot detection
 * regional consent or throttling
 When this happens, the API returns a structured error instead of crashing.
 ---
 ## ⚠️ Limitations
 * Does **not** download audio or video
 * Does **not** perform speech-to-text
 * Captions must already exist on YouTube
 * Shorts and embedded players are not a primary target
 ---
 ## 📜 License
 MIT License
 ---
 ## 🙌 Credits
 * FastAPI — [https://fastapi.tiangolo.com/](https://fastapi.tiangolo.com/)
 * yt-dlp — [https://github.com/yt-dlp/yt-dlp](https://github.com/yt-dlp/yt-dlp)
 ---
 ### ✅ Status
 * Docker tested
 * Real-world URLs tested
 * Cloud-friendly
 * Ready for open-source use
--- a/init.py
+++ b/init.py
--- a/app/init.py
+++ b/app/init.py
--- a/app/main.py
+++ b/app/main.py
@ -0,0 +1,19 @@
 from fastapi import FastAPI
 from routes.health import router as health_router
 from routes.transcript import router as transcript_router
 app = FastAPI(
    title="YouTube Transcript API",
    description="Caption-only YouTube transcript extraction (no downloads)",
    version="1.0.0",
 )
@app.get("/")
 def root():
    return {
        "name": "YouTube Transcript API",
        "docs": "/docs"
    }
 app.include_router(health_router)
 app.include_router(transcript_router)
--- a/core/config.py
+++ b/core/config.py
@ -0,0 +1,19 @@
 from pydantic_settings import BaseSettings
 class Settings(BaseSettings):
    app_env: str = "production"
    request_timeout: int = 25
    max_video_duration: int = 7200
    enable_redis: bool = False
    enable_postgres: bool = False
    enable_rate_limit: bool = False
    class Config:
        env_file = ".env"
        extra = "ignore"
 settings = Settings()
--- a/core/errors.py
+++ b/core/errors.py
@ -0,0 +1,21 @@
 from fastapi import HTTPException
 def bad_request(message: str, code: str = "BAD_REQUEST"):
    raise HTTPException(
        status_code=400,
        detail={
            "error": message,
            "code": code,
        },
    )
 def not_found(message: str, code: str = "NOT_FOUND"):
    raise HTTPException(
        status_code=404,
        detail={
            "error": message,
            "code": code,
        },
    )
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,6 @@
 fastapi==0.128.0
 uvicorn==0.40.0
 pydantic==2.12.5
 pydantic-settings==2.12.0
 webvtt-py==0.5.1
 yt-dlp==2026.1.31
--- a/routes/health.py
+++ b/routes/health.py
@ -0,0 +1,8 @@
 from fastapi import APIRouter
 router = APIRouter()
@router.get("/health")
 def health():
    return {"status": "ok"}
--- a/routes/transcript.py
+++ b/routes/transcript.py
@ -0,0 +1,45 @@
 from fastapi import APIRouter, Query
 from utils.validators import validate_youtube_url
 from utils.filesystem import temp_dir
 from core.errors import not_found
 from services.ytdlp import extract_metadata_and_captions
 from services.captions import parse_vtt
 from services.metadata import normalize_metadata
 from schemas.transcript import TranscriptResponse
 router = APIRouter()
@router.post("/transcript", response_model=TranscriptResponse)
 def transcript(
    url: str = Query(..., description="YouTube video URL"),
 ):
    validate_youtube_url(url)
    with temp_dir() as tmp:
        metadata, caption_files = extract_metadata_and_captions(url, tmp)
        if not caption_files:
            not_found("No captions available for this video", "NO_CAPTIONS")
        human = [p for p in caption_files if "auto" not in p.name.lower()]
        auto = [p for p in caption_files if "auto" in p.name.lower()]
        if human:
            caption_path = human[0]
            source = "human"
        elif auto:
            caption_path = auto[0]
            source = "auto"
        else:
            not_found("No captions available", "NO_CAPTIONS")
        captions = parse_vtt(str(caption_path))
        video = normalize_metadata(metadata)
    return TranscriptResponse(
        video=video,
        captions=captions,
        language="auto",
        source=source,
    )
--- a/schemas/error.py
+++ b/schemas/error.py
@ -0,0 +1,6 @@
 from pydantic import BaseModel
 class ErrorResponse(BaseModel):
    error: str
    code: str
--- a/schemas/transcript.py
+++ b/schemas/transcript.py
@ -0,0 +1,23 @@
 from pydantic import BaseModel
 from typing import List
 from typing import List, Literal
 class CaptionSegment(BaseModel):
    start: float
    end: float
    text: str
 class VideoMetadata(BaseModel):
    id: str
    title: str
    channel: str
    duration: int
    url: str
 class TranscriptResponse(BaseModel):
    video: VideoMetadata
    captions: List[CaptionSegment]
    language: str
    source: Literal["human", "auto"]
--- a/services/captions.py
+++ b/services/captions.py
@ -0,0 +1,51 @@
 import webvtt
 from schemas.transcript import CaptionSegment
 from typing import List
 def parse_vtt(path: str):
    segments = []
    for caption in webvtt.read(path):
        segments.append(
            CaptionSegment(
                start=_to_seconds(caption.start),
                end=_to_seconds(caption.end),
                text=caption.text.strip(),
            )
        )
    return dedupe_segments(segments)
 def _to_seconds(ts: str) -> float:
    h, m, rest = ts.split(":")
    s, ms = rest.split(".")
    return (
        int(h) * 3600
        + int(m) * 60
        + int(s)
        + int(ms) / 1000
    )
 def dedupe_segments(segments: List[CaptionSegment]) -> List[CaptionSegment]:
    cleaned = []
    for seg in segments:
        text = seg.text.strip()
        if not text:
            continue
        if cleaned:
            prev = cleaned[-1]
            prev_text = prev.text.strip()
            if prev_text and prev_text in text:
                cleaned[-1] = seg
                continue
        cleaned.append(seg)
    return cleaned
--- a/services/metadata.py
+++ b/services/metadata.py
@ -0,0 +1,11 @@
 from schemas.transcript import VideoMetadata
 def normalize_metadata(raw: dict) -> VideoMetadata:
    return VideoMetadata(
        id=raw["id"],
        title=raw["title"],
        channel=raw.get("uploader", ""),
        duration=raw.get("duration", 0),
        url=raw["webpage_url"],
    )
--- a/services/ytdlp.py
+++ b/services/ytdlp.py
@ -0,0 +1,50 @@
 import json
 import subprocess
 from pathlib import Path
 from typing import Tuple, List
 from core.errors import bad_request
 from core.config import settings
 def extract_metadata_and_captions(
    url: str,
    workdir: str,
 ) -> Tuple[dict, List[Path]]:
    cmd = [
        "yt-dlp",
        "--skip-download",
        "--write-subs",
        "--write-auto-subs",
        "--sub-format", "vtt",
        "--no-playlist",
        "--print-json",
        "-o", f"{workdir}/%(id)s",
        url,
    ]
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            check=True,
            timeout=settings.request_timeout,
        )
    except subprocess.TimeoutExpired:
        bad_request("yt-dlp timed out", "TIMEOUT")
    except subprocess.CalledProcessError:
        bad_request("Failed to extract video data", "YTDLP_ERROR")
    lines = result.stdout.splitlines()
    if not lines:
        bad_request("No metadata returned from yt-dlp", "EMPTY_RESPONSE")
    try:
        metadata = json.loads(lines[0])
    except json.JSONDecodeError:
        bad_request("Invalid metadata returned from yt-dlp", "INVALID_METADATA")
    subtitle_files = list(Path(workdir).glob("*.vtt"))
    return metadata, subtitle_files
--- a/utils/filesystem.py
+++ b/utils/filesystem.py
@ -0,0 +1,5 @@
 from tempfile import TemporaryDirectory
 def temp_dir():
    return TemporaryDirectory()
--- a/utils/validators.py
+++ b/utils/validators.py
@ -0,0 +1,28 @@
 from urllib.parse import urlparse, parse_qs
 from core.errors import bad_request
 YOUTUBE_DOMAINS = ("youtube.com", "www.youtube.com", "youtu.be")
 def validate_youtube_url(url: str):
    try:
        parsed = urlparse(url)
    except Exception:
        bad_request("Invalid YouTube URL", "INVALID_URL")
    if parsed.netloc not in YOUTUBE_DOMAINS:
        bad_request("Invalid YouTube URL", "INVALID_URL")
    if parsed.netloc == "youtu.be":
        if not parsed.path.strip("/"):
            bad_request("Invalid YouTube video URL", "INVALID_URL")
        return
    if parsed.path == "/watch":
        qs = parse_qs(parsed.query)
        if "v" not in qs or not qs["v"][0]:
            bad_request("Invalid YouTube video URL", "INVALID_URL")
        return
    bad_request("Invalid YouTube video URL", "INVALID_URL")