Integrate GPT 4o without TTS/STT #210

clemlesne · 2024-05-25T18:53:37Z

OpenAI GPT 4o model supports both in and out of text, image and audio. Understanding is finer than usual STT > model > TTS approach because the model has direct access to user behavior, emotions, etc.

Is there a way to use Communication Services and receive the raw audio flow, bypassing the STT step?

Qwatro55 · 2024-05-29T16:17:26Z

I'm also interested in this question.

agentverket · 2024-05-31T16:11:54Z

What about response time?
What about costs?
Can you stream data?

clemlesne · 2024-06-07T18:01:06Z

I know I know :) OpenAI APIs are not yet available:

Plus, Communication Services APIs are not yet available to use with raw audio stream.

If you have ideas, don't hesitate!

JunJD · 2024-07-22T04:58:27Z

m

clemlesne · 2024-09-26T18:23:46Z

Audio streaming is now available with Communication Services!

https://learn.microsoft.com/en-us/azure/communication-services/how-tos/call-automation/audio-streaming-quickstart?pivots=programming-language-python

clemlesne added the enhancement New feature or request label May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate GPT 4o without TTS/STT #210

Integrate GPT 4o without TTS/STT #210

clemlesne commented May 25, 2024

Qwatro55 commented May 29, 2024

agentverket commented May 31, 2024

clemlesne commented Jun 7, 2024

JunJD commented Jul 22, 2024

clemlesne commented Sep 26, 2024

Integrate GPT 4o without TTS/STT #210

Integrate GPT 4o without TTS/STT #210

Comments

clemlesne commented May 25, 2024

Qwatro55 commented May 29, 2024

agentverket commented May 31, 2024

clemlesne commented Jun 7, 2024

JunJD commented Jul 22, 2024

clemlesne commented Sep 26, 2024