Project
SkyWatch
Nine camera feeds, one grid, automatic missile-streak detection.
- Role
- Engineer
- Stack
- Python 3, OpenCV, ffmpeg (AMF / D3D11VA / VideoToolbox), MOG2, Hungarian tracker
- Metrics
- 9 parallel 1080p streams · ~10 FPS/stream · ~15 s glass-to-browser latency · sub-second detection-to-alert
During the war the only way to see what was happening over the city was to pull up a YouTube live stream from a cousin’s rooftop and watch. One stream, manual channel-hopping, no notification when something actually happened. The information already existed — dozens of public cameras were pointed at the sky, many of them 24/7 live on YouTube. What was missing was any way to watch them in aggregate, or to be told when a streak or flash appeared in any one of them. The latency budget is whatever YouTube live gives you (~15 s glass-to-browser); the rest has to be as close to zero as possible. A 2 s detection delay on top of that is fine. A 20 s one is useless.
What it does
SkyWatch ingests nine public camera feeds — YouTube Live, RTSP, HLS — in parallel, displays them in a TV-mounted grid, and runs a computer-vision pipeline on each feed that looks for two things: streaks (fast-moving objects consistent with missile trails) and flashes (bright interception events). When a detection fires above threshold, the grid auto-focuses on that camera, an audio alert plays, and a clip is saved to disk with a JSON metadata sidecar.
Detections from independent cameras in the same 3-second window are correlated to reduce false positives — a single-camera event at 0.6 confidence is confirmed at 0.4 once a second camera agrees. Birds, clouds, and passing planes mostly don’t survive that filter.
Hard parts
Nine concurrent decoders, one machine. OpenCV’s VideoCapture was the first try and the wrong one — nine concurrent decoders contended for the global decoder state and dropped frames in bursts. Per-stream ffmpeg subprocesses with hardware decode (AMD AMF on Windows, VideoToolbox on macOS, D3D11VA as fallback) scaled cleanly to nine 1080p streams at ~10 FPS each. Each camera is its own thread spawning its own ffmpeg process; the Python side only handles the decoded frames.
False-positive budget. Streak signatures look like a lot of things: a jet contrail, a sunbeam cutting between clouds, a flock of birds in low light. A five-stage pipeline — ROI crop → MOG2 background subtraction → streak / flash filters → Hungarian multi-frame tracking → false-positive rejection — rejects 50–90% at every stage; nothing heavy runs on frames that didn’t survive the light checks. Cross-camera temporal correlation was the single biggest FP killer: two cameras on an event, at time-bucketed windows, beats any per-stream threshold tuning.
Reliability budget
Pre-detection context. Every camera holds a 10-second rolling ring buffer of raw frames. When a detection fires, pre-buffer + post-detection frames dump to MP4 so the operator sees the full context, not just the moment the alert went off. Without the pre-buffer the clips were useless — you’d see the streak already exiting the frame.
Day / night mode switching. Streak and flash signatures differ sharply between bright-sky and dark-sky conditions. The system reads average frame brightness and picks thresholds accordingly — no manual mode selection. The transition windows (dusk, dawn) are the adversarial cases; the FP rate spikes briefly and settles.
Result
Built for one person’s apartment. Ended up useful enough to keep running.
It still runs on that apartment TV. The thing I’d point to isn’t the detector — it’s that nine hardware-decoded streams and a five-stage filter held up, unattended, on the nights it actually mattered.
Source private — built for personal/emergency use, not open-sourced.
What I’d do differently
Skip the ONNX classifier entirely. I built the DirectML path as a confirmation layer, and in practice the pure CV pipeline was reliable enough that the model never paid for its startup cost. A tighter no-ML scope would have shipped a week sooner.