back_office_ops · workflow
Building Semantic Search on Podcast Transcripts: Audio Transcription with OpenAI Whisper and Storage in ApertureDB (Part 1)
The author hosts 20+ AI-focused podcasts and wanted an easy way to search and rediscover knowledge shared across episodes, rather than manually revisiting recordings.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Upload podcast audio files
Podcast audio files are uploaded to a directory to begin the transcription pipeline.
Tools used
ApertureDBGoogle Colab
Outcome
Part 1 of the series transcribed podcast episodes using OpenAI Whisper and stored them in ApertureDB, with 23 episodes successfully ingested and the majority of transcription content described as accurate.
Results
Volume23
Grounding & classification
Source type: technical build writeup
12 fields verified against source quotes, 1 dropped as unverifiable.
speech to textknowledge basebuilder submittedmetric backedtools describedworkflow describedsoftwaretechnical build writeupback office opsdocument to record