ChatGPT-Next-Web [Bug] 重复渲染答案导致多次图片请求

polhcujo 于 2个月前发布在其他

关注(0)|答案(2)|浏览(117)

Bug描述

重复渲染答案导致多次请求图片

重现步骤

模拟一个包含链接的Markdown翻译任务。
将以下内容发送到LLM,并观察Web控制台，查看对https://placehold.co/600x400的无数次图片请求。

Please translate the following Markdown into Chinese

![https://placehold.co/600x400](https://placehold.co/600x400)

Hello GPT-4o
============

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

All videos on this page are at 1x real time.

Guessing May 13th’s announcement.

GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to [human response time(opens in a new window)](https://www.pnas.org/doi/10.1073/pnas.0903616106) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

Model capabilities
------------------

Two GPT-4os interacting and singing.  

Interview prep.

Rock Paper Scissors.

Sarcasm.

Math with Sal and Imran Khan.

Two GPT-4os harmonizing.

Point and learn Spanish.

Meeting AI.

Real-time translation.

Lullaby.

Talking faster.

Happy Birthday.

Dog.

Dad jokes.

GPT-4o with Andy, from BeMyEyes in London.

Customer service proof of concept.

Prior to GPT-4o, you could use [Voice Mode](https://openai.com/index/chatgpt-can-now-see-hear-and-speak) to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. This process means that the main source of intelligence, GPT-4, loses a lot of information—it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.

With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

Explorations of capabilities
----------------------------

Select sample:

Visual Narratives - Robot Writer’s BlockVisual narratives - Sally the mailwomanPoster creation for the movie 'Detective'Character design - Geary the robotPoetic typography with iterative editing 1Poetic typography with iterative editing 2Commemorative coin design for GPT-4oPhoto to caricatureText to font3D object synthesisBrand placement - logo on coasterPoetic typographyMultiline rendering - robot textingMeeting notes with multiple speakersLecture summarizationVariable binding - cube stackingConcrete poetry

Model safety and limitations
----------------------------

GPT-4o has safety built-in by design across modalities, through techniques such as filtering training data and refining the model’s behavior through post-training. We have also created new safety systems to provide guardrails on voice outputs.  

We’ve evaluated GPT-4o according to our [Preparedness Framework](https://openai.com/preparedness) and in line with our [voluntary commitments](https://openai.com/index/moving-ai-governance-forward/). Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that GPT-4o does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities.  

GPT-4o has also undergone extensive external red teaming with 70+ [external experts](https://openai.com/index/red-teaming-network) in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they’re discovered.  

We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies. We will share further details addressing the full range of GPT-4o’s modalities in the forthcoming system card.  

Through our testing and iteration with the model, we have observed several limitations that exist across all of the model’s modalities, a few of which are illustrated below.  

Examples of model limitations

We would love feedback to help identify tasks where GPT-4 Turbo still outperforms GPT-4o, so we can continue to improve the model. 

Model availability
------------------

GPT-4o is our latest step in pushing the boundaries of deep learning, this time in the direction of practical usability. We spent a lot of effort over the last two years working on efficiency improvements at every layer of the stack. As a first fruit of this research, we’re able to make a GPT-4 level model available much more broadly. GPT-4o’s capabilities will be rolled out iteratively (with extended red team access starting today). 

GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits. We'll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks.

Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.

Authors
-------

[OpenAI](/news/?author=openai#results)

预期行为

我在Chrome DevTools Network面板中观察到，位于https://placehold.co/600x400的图片被请求了一百多次。

截图

部署方法

Docker
Vercel
服务器

桌面操作系统

OSX

桌面浏览器

Chrome

桌面浏览器版本

125.0.6422.142

智能手机设备

无响应*

智能手机操作系统

无响应*

智能手机浏览器

无响应*

智能手机浏览器版本

无响应*

其他日志

无响应*

ChatGPT-Next-Web

来源：https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4827

2条答案

按热度按时间

c6ubokkw1#

markdown语法![alter text](url)导入了图像，这将在对话框中呈现，这就是为什么有这么多失败的请求。
用有效的URL替换它，你会看到一张图片。

赞(0）回复(0）举报 2个月前

5uzkadbs2#

markdown语法![alter text](url)导入了图片，这将在对话框中渲染。这就是为什么有这么多失败的请求。
用一个有效的url替换它，你会看到一张图片。
我错了，这是一个有效的图片url。忽略我的评论。