Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
ai-podcast-assistant-phi4-mulitmodal.ipynb	ai-podcast-assistant-phi4-mulitmodal.ipynb

AI Podcast Assistant

A comprehensive toolkit for generating detailed notes, summary, and translation of podcast content using the Phi-4-Multimodal LLM with NVIDIA NIM Microservices.

Overview

This repository contains a Jupyter notebook that demonstrates a complete workflow for processing podcast audio:

Notes generation: Convert spoken content from podcasts into detailed text notes
Summarization: Generate concise summaries of the transcribed content
Translation: Translate both the transcription and summary into different languages

The implementation leverages the powerful Phi-4-Multimodal LLM (5.6B parameters) through NVIDIA's NIM Microservices, enabling efficient processing of long-form audio content.

Learn more about the model here.

Features

Long Audio Processing: Automatically chunks long audio files for processing
Detailed Notes Generation: Creates well-formatted, detailed notes from audio content
Summarization: Generates concise summaries capturing key points
Translation: Translates content to multiple languages while preserving formatting
File Export: Saves results as text files for easy sharing and reference

Requirements

Python 3.10+
Jupyter Notebook or JupyterLab
NVIDIA API Key (see Installation section for setup instructions)
Required Python packages:
- requests
- base64
- pydub
- Pillow (PIL)

Installation

Clone this repository:

git clone https://github.com/NVIDIA/GenerativeAIExamples.git
cd GenerativeAIExamples/community/ai-podcast-assistant

Set up your NVIDIA API key:
- Sign up for NVIDIA NIM Microservices
- Generate an API key
- Replace the placeholder in the notebook with your API key

Usage

Open the Jupyter notebook:

jupyter notebook ai-podcast-assistant-phi4-mulitmodal.ipynb

Update the podcast_audio_path variable with the path to your audio file.
Run the notebook cells sequentially to:
- Process the audio file
- Generate detailed notes
- Create a summary
- Translate the content (optional)
- Save results to text files

Example Output

The notebook generates:

Detailed Notes: Bullet-pointed notes capturing the main content of the podcast
Summary: A concise paragraph summarizing the key points
Translation: The notes and summary translated to your chosen language

All outputs are saved as text files for easy reference and sharing.

Model Details

The Phi-4-Multimodal LLM used in this project has the following specifications:

Parameters: 5.6B
Inputs: Text, Image, Audio
Context Length: 128K tokens
Training Data: 5T text tokens, 2.3M speech hours, 1.1T image-text tokens
Supported Languages: Multilingual text and audio (English, Chinese, German, French, etc.)

Acknowledgments

Microsoft for providing access to the Phi-4 multimodal
NVIDIA for providing access to the NIM microservices preview

Contact

For questions, feedback, or collaboration opportunities, please open an issue in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

AI Podcast Assistant

Overview

Features

Requirements

Installation

Usage

Example Output

Model Details

Acknowledgments

Contact

FilesExpand file tree

ai-podcast-assistant

Directory actions

More options

Directory actions

More options

Latest commit

History

ai-podcast-assistant

Folders and files

parent directory

README.md

AI Podcast Assistant

Overview

Features

Requirements

Installation

Usage

Example Output

Model Details

Acknowledgments

Contact