RetroSearch Browse

Showing content from https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/5672 below:

🥰 需求描述

🧐 解决方案逻辑

realtime api，使用websocket接入
api本身内置了sessions, conversation等概念，session支持配置modalities, instructions, voice, input_audio_format, output_audio_format, turn_detection, input_audio_transcription, tools等，支持function call
支持input_audio_buffer.append以及input_audio_buffer.commit方式上传音频，再通过response.create开始生成结果（turn_detection如果开启，可以不用手动调用）
支持客户端发送conversation.item.create将上下文的内容直接添加到当前的conversation，如果是历史记录，需要设置status=completed
conversation.item.truncate支持打断输入
通过监听事件response.audio.delta拿到base64 audio data，通过response.text.delta同步拿到文本。
通过监听事件response.output_item.added拿到是否是function call, 通过监听response.function_call_arguments.delta拿到function call参数。或者直接在response.done里面拿function call相关信息？

交互

可能会新增OpenAI客户端一样的语音交互页面直接调用realtime api。
当前的语音交互界面，默认全屏，支持缩小到输入框大小（替换输入框位置）。同时保留语音输入界面以及chat history页面（保留这里，可以支持展示插件执行生成的中间结果等，例如中间调用插件生成一张图，语音是无法直接描述的）。
语音通话生成的结果（audio buffer）以及同时拿到的文本信息，需要持久化到sessions里面
语音通话支持选择voice，format，detection模式，tools等（这些按钮需要保留，或者在语音界面重新布局）

讨论

📝 补充信息

价格

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4