#pyannote_audio — Public Fediverse posts on home.social

----------------

🛠️ Tool: meetscribe — Local meeting capture, diarization and summaries
===================

meetscribe is a locally‑run meeting capture and transcription tool that records dual‑channel audio (user mic and remote system audio) at the OS level and produces diarized transcripts, time‑aligned text, AI‑generated summaries, and a polished PDF export. The project chains several open components to provide an end‑to‑end offline workflow for meetings.

Architecture and core components
• Audio capture: captures mic and remote audio as separate channels via PipeWire or PulseAudio with ffmpeg handling recording and file creation.
• ASR and alignment: uses WhisperX for batched inference with the openai/whisper-large-v3-turbo model and performs word‑level timestamp alignment using wav2vec2 alignment methods.
• Speaker diarization: uses pyannote‑audio to assign speech segments to speakers; the dual‑channel signal enables automatic YOU/REMOTE labeling.
• Local LLM summaries: integrates with local LLM runtimes (Ollama) to extract key topics, action items, decisions, and follow‑ups without sending data to cloud services.
• Outputs and UX: produces multiple export formats (.txt, .srt, .json, .summary.md, and a professionally formatted PDF containing summary plus full transcript) and exposes both a small GTK3 always‑on widget for recording control and a command‑line interface for scripted workflows.

Operational details and requirements
• Platform: Linux with PipeWire or PulseAudio. The tool is designed to work with any meeting app that plays audio through the system (Zoom, Meet, Teams, Slack, Discord, etc.).
• Models and tokens: diarization requires a HuggingFace model token for pyannote‑audio; ASR relies on WhisperX with model artifacts. Local LLM summarization is optional and requires a local LLM runtime and model.
• Hardware: GPU acceleration is supported and recommended (NVIDIA CUDA, 8GB+ VRAM suggested) for faster inference; CPU mode is available but slower.

Capabilities and limitations
• Capabilities: reliable dual‑channel capture, word‑level timestamps, speaker diarization with automatic YOU/REMOTE labels, offline LLM summaries, organized per‑session folders, and multi‑format exports including a professional PDF.
• Limitations: Linux‑centric; diarization depends on a HuggingFace model access token; LLM summaries require a local LLM runtime and model artifacts. Performance and latency depend on local hardware.

🔹 meetscribe #WhisperX #pyannote_audio #Ollama #PipeWire

🔗 Source: https://github.com/pretyflaco/meetscribe