home.social

#pyannote_audio — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #pyannote_audio, aggregated by home.social.

  1. ----------------

    🛠️ Tool: meetscribe — Local meeting capture, diarization and summaries
    ===================

    meetscribe is a locally‑run meeting capture and transcription tool that records dual‑channel audio (user mic and remote system audio) at the OS level and produces diarized transcripts, time‑aligned text, AI‑generated summaries, and a polished PDF export. The project chains several open components to provide an end‑to‑end offline workflow for meetings.

    Architecture and core components
    • Audio capture: captures mic and remote audio as separate channels via PipeWire or PulseAudio with ffmpeg handling recording and file creation.
    • ASR and alignment: uses WhisperX for batched inference with the openai/whisper-large-v3-turbo model and performs word‑level timestamp alignment using wav2vec2 alignment methods.
    • Speaker diarization: uses pyannote‑audio to assign speech segments to speakers; the dual‑channel signal enables automatic YOU/REMOTE labeling.
    • Local LLM summaries: integrates with local LLM runtimes (Ollama) to extract key topics, action items, decisions, and follow‑ups without sending data to cloud services.
    • Outputs and UX: produces multiple export formats (.txt, .srt, .json, .summary.md, and a professionally formatted PDF containing summary plus full transcript) and exposes both a small GTK3 always‑on widget for recording control and a command‑line interface for scripted workflows.

    Operational details and requirements
    • Platform: Linux with PipeWire or PulseAudio. The tool is designed to work with any meeting app that plays audio through the system (Zoom, Meet, Teams, Slack, Discord, etc.).
    • Models and tokens: diarization requires a HuggingFace model token for pyannote‑audio; ASR relies on WhisperX with model artifacts. Local LLM summarization is optional and requires a local LLM runtime and model.
    • Hardware: GPU acceleration is supported and recommended (NVIDIA CUDA, 8GB+ VRAM suggested) for faster inference; CPU mode is available but slower.

    Capabilities and limitations
    • Capabilities: reliable dual‑channel capture, word‑level timestamps, speaker diarization with automatic YOU/REMOTE labels, offline LLM summaries, organized per‑session folders, and multi‑format exports including a professional PDF.
    • Limitations: Linux‑centric; diarization depends on a HuggingFace model access token; LLM summaries require a local LLM runtime and model artifacts. Performance and latency depend on local hardware.

    🔹 meetscribe #WhisperX #pyannote_audio #Ollama #PipeWire

    🔗 Source: github.com/pretyflaco/meetscri