AI Meeting Intelligence System: Automated Transcription, Speaker Diarization, and Retrieval-Augmented Summarization over Meeting Audio

Authors

  • Pranjal Bhangare Student, Vishwakarma Institute of Information Technology
    Author
  • Anushka Gargelwar Student, Vishwakarma Institute of Information Technology
    Author
  • Krishna Garg Student , Vishwakarma Institute of Information Technology
    Author
  • Amit Bhande Student, Vishwakarma Institute of Information Technology
    Author

DOI:

Keywords:

Automatic Speech Recognition, Speaker Diariza tion, Retrieval-Augmented Generation, Large Language Models, Meeting Summarization, Knowledge Graph, FastAPI, Next.js, ChromaDB, MongoDB, Whisper, pyannote.audio

Abstract

Meetings drive most organizational decision
making, yet the knowledge they produce rarely survives in
usable form. Notes are scattered, action items go untracked,
and recordings sit unwatched. This paper presents an AI
Meeting Intelligence System that addresses this problem end
to end. The system accepts a recorded audio or video file,
runs transcription through a locally deployed Whisper model,
applies WhisperX and pyannote.audio for speaker attribution,
and passes the labeled transcript to a large language model—
either Google Gemini 2.5 Flash or Groq-hosted Llama 3.3—to
produce structured summaries, action item lists, and sentiment
assessments. A Retrieval-Augmented Generation (RAG) module
stores chunk-level embeddings in ChromaDB so users can query
any past meeting in plain English. Each meeting also yields an
entity-relationship graph, extracted by the LLM and rendered
interactively in the browser via Cytoscape.js. The backend is
FastAPI with asynchronous MongoDB persistence; the frontend
is Next.js 14 in TypeScript. Evaluation on real recordings shows
that a 30-minute meeting is fully processed in roughly four
minutes on a standard laptop, with Word Error Rates competitive
with published Whisper benchmarks.

Downloads

Published

2026-04-30

How to Cite

[1]
Pranjal Bhangare , “AI Meeting Intelligence System: Automated Transcription, Speaker Diarization, and Retrieval-Augmented Summarization over Meeting Audio”, Int. J. Web Multidiscip. Stud. pp. 400-408, 2026-04-30 doi: .