video-use

Claude Code skill from browser-use that turns raw footage in a folder into final.mp4, using ElevenLabs Scribe for word-level audio timestamps.

Table of Contents

What It Is

A Claude Code skill from the browser-use team that turns video editing into a conversation. Point it at a folder of raw clips, describe what you want, get final.mp4. Handles filler-word removal, color grading, audio fades, subtitles, and overlay animations.

Language

Python (~75%). Uses ffmpeg for cutting and ElevenLabs Scribe for word-level transcript timestamps. Optional yt-dlp for pulling reference footage.

Install

git clone https://github.com/browser-use/video-use.git
ln -s "$(pwd)/video-use" ~/.claude/skills/video-use
pip install -r video-use/requirements.txt
export ELEVENLABS_API_KEY=...

Requires ffmpeg on PATH.

Value

The clever design choice is "text + on-demand visuals": instead of feeding raw frames to the LLM, it reasons over the audio transcript with timestamps, and only renders visual composites when needed. Keeps token usage low and lets FFmpeg do the heavy work.

Capabilities worth noting:

  • Removes filler words and dead space between takes
  • Adds 30ms audio fades at cut points to prevent artifacts
  • Burns in customizable subtitles
  • Generates overlays via Manim, Remotion, or PIL
  • Self-evaluates rendered output at cut boundaries before handing back

links

social