Настройка Codex AI агента: от установки на Ubuntu до команд в Telegram
The transcript covers the setup of a Codex AI agent with voice message support in Telegram, specifically addressing the limitation of standard ChatGPT subscriptions with Whisper. The speaker walks through obtaining an OpenAI API key to enable Whisper transcription and debugging a file extension error with OGA voice files. By the end, the bot successfully receives and transcribes Telegram voice messages.
Summary
The video begins by highlighting a key limitation: a standard ChatGPT subscription does not support tools like Whisper, which is required for transcribing voice messages sent to a Telegram bot. The speaker explains that without Whisper, the bot simply cannot process voice input, and an additional tool must be configured to enable this functionality.
To resolve this, the speaker walks through the process of obtaining an OpenAI API key. This involves logging into the OpenAI platform at platform.openai.com, navigating to the settings and API keys section, creating a new secret API key tied to a specific project, and then sending that key to the agent so it can authenticate and use Whisper when voice messages are received.
After providing the API key, the speaker tests the bot by sending a voice message. An error occurs, and the speaker investigates by sending the full error text to the agent and asking it to fix the problem. The agent diagnoses the issue: Telegram voice messages arrive with an .OGA file extension, but the OpenAI API rejects this suffix. Additionally, FFmpeg is not installed in the environment. The agent proposes and implements a minimal fix — saving the files with an .OGG extension without altering the binary container — effectively resolving the compatibility issue without requiring FFmpeg.
After the fix is applied, the speaker conducts a second test by sending another voice message asking the bot to confirm it can receive voice input by outputting a specific number sequence. The bot successfully transcribes the voice message and responds with the correct sequence, confirming that voice message support is now fully functional. The speaker briefly mentions that the same approach applies to video messages as well.
Key Insights
- The speaker explains that a standard ChatGPT subscription does not grant access to Whisper, meaning the bot cannot transcribe voice messages without an explicit OpenAI API key being provided to the agent.
- The speaker demonstrates that the OpenAI API key must be generated at platform.openai.com under Settings > API Keys, tied to a specific project, and then manually sent to the agent to unlock Whisper functionality.
- The agent self-diagnoses the voice message error, identifying that Telegram sends voice files with an .OGA extension which the OpenAI API rejects, and that FFmpeg is not installed in the environment.
- The agent proposes and applies a minimal fix — renaming the file extension from .OGA to .OGG without altering the binary container — rather than requiring FFmpeg installation, resolving the transcription error.
- After the fix, the speaker confirms successful voice message transcription by having the bot respond with a specific number sequence dictated in a voice message, validating that Whisper integration is working end-to-end.
Topics
Full transcript available for MurmurCast members
Sign Up to Access