Visual ChatGPT: Temporary solution until GPT-4's launch

Microsoft researchers have launched Visual ChatGPT, which aims to gather ChatGPT's and visual foundation models' abilities together to offer a better service before GPT-4.
ChatGPT launched a while ago and started a new era in the generative AI industry. More AI tools were produced following the fame and success of the chatbot. Microsoft has taken important steps to improve generative AI tools, especially in the past few years. Unfortunately, ChatGPT is a text-based language model, and it doesn't have the same abilities as DALL-E 2 or Wombo Dream. However, it has changed with the launch of Visual ChatGPT.
What is Visual ChatGPT?
ChatGPT is a text-only chatbot that doesn't have the ability to generate images or videos, which is expected to change with GPT-4. However, Visual ChatGPT helps you generate, modify or crop an image. It mixes the powers of ChatGPT with other VFMs, such as Stable Diffusion, connecting ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.
In other words, Visual ChatGPT helps users generate images out of text prompts. It lacked what other AI tools like Stable Diffusion had, and now, in a way, it is complete.
"Instead of training a new multi-modal ChatGPT from scratch, we build Visual ChatGPT directly based on ChatGPT and incorporate a variety of VFMs," says Microsoft.

GPU memory usage?
The researchers have also given the GPU memory usage stats on the official GitHub page. It requires high GPU and computation power. Below you will find the GPU memory usage of each visual foundation model:
Foundation Model | Memory Usage (MB) |
---|---|
ImageEditing | 6667 |
ImageCaption | 1755 |
T2I | 6677 |
canny2image | 5540 |
line2image | 6679 |
hed2image | 6679 |
scribble2image | 6679 |
pose2image | 6681 |
BLIPVQA | 2709 |
seg2image | 5540 |
depth2image | 6677 |
normal2image | 3974 |
InstructPix2Pix | 2795 |
Capabilities
As mentioned, ChatGPT was trained to give users text-based answers but lacked image or video creation. Visual ChatGPT's capabilities are as follows:
- Send and Receive not only languages but also images.
- Provide complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.
- Provide feedback and ask for corrected results.
GPT-4 release date
Last week, the CTO of Microsoft Germany announced that GPT-4 would be released "next week." He gave the statement on March 9, meaning the new model could launch in the upcoming days. OpenAI will at least introduce it to the community if it doesn't launch.
GPT-4 will be a multi-modal LLM that has the ability to create images and videos from text prompt on top of GPT-3.5's text prompt abilities. For more information about Visual ChatGPT, you can check the official Github page
Advertisement
Uhh, this has already been possible – I am not sure how but remember my brother telling me about it. I’m not a whatsapp user so not sure of the specifics, but something about sending the image as a file and somehow bypassing the default compression settings that are applied to inbound photos.
He has also used this to share movies to whatsapp groups, and files 1Gb+.
Like I said, I never used whatsapp, but I know 100% this isn’t a “brand new feature”, my brother literally showed me him doing it, like… 5 months ago?
Martin, what happened to those: 12 Comments (https://www.ghacks.net/chatgpt-gets-schooled-by-princeton-university/#comments). Is there a specific justifiable reason why they were deleted?
Hmm, it looks like the gHacks website database is faulty, and not populating threads with their relevant cosponsoring posts.
The page on ghacks this is on represents the best of why it has become so worthless, fill of click-bait junk that it’s about to be deleted from my ‘daily reads’.
It’s really like “Press Release as re-written by some d*ck for clicks…poorly.” And the subjects are laughable. Can’t wait for “How to search for files on Windows”.
> The page on ghacks this is on represents the best of why it has become so worthless, fill of click-bait junk…
Sadly, I have to agree.
Only Martin and Ashwin are worth subscribing to.
Especially Emre Çitak and Shaun are the worst ones.
If ghacks.net intended “Clickbait”, it would mark the end of Ghacks Technology News.
Ghacks doesn’t need crappy clickbaits. Clearly separate articles from newer authors (perhaps AIs and external sales person or external advertising man) as just “Advertisements”!
We, the subscribers of Ghacks, urge Martin to make a decision.
because nevermore wants to “monetize” on every aspect of human life…
“Threads” is like the Walmart of Social Media.
How hard can it be to clone a twitter version of that as well? They’re slow.
Yes, why not mention how large the HD files can be?
Why, not mention what version of WhatsApp is needed?
These omissions make the article feel so bare. If not complete.
Sorry posted on the wrong page.
such a long article for such a simple matter. Worthless article ! waste of time
I already do this by attaching them via the ‘Document’ option.
I don’t know what’s going on here at Ghacks but it’s obvious that something is broken, comments are being mixed whatever the article, I am unable to find some of my later posts neither. :S
Quoting the article,
“As users gain popularity, the value of their tokens may increase, allowing investors to reap rewards.”
Besides, beyond the thrill and privacy risks or not, the point is to know how you gain popularity, be it on social sites as everywhere in life. Is it by being authentic, by remaining faithful to ourselves or is it to have this particular skill which is to understand what a majority likes, just like politicians, those who’d deny to the maximum extent compatible with their ideological partnership, in order to grab as many of the voters they can?
I see the very concept of this Friend.tech as unhealthy, propagating what is already an increasing flaw : the quest for fame. I won’t be the only one to count himself out, definitely.
@John G. is right : my comment was posted on [https://www.ghacks.net/2023/08/23/what-is-friend-tech/] and it appears there but as well here at [https://www.ghacks.net/2023/07/08/how-to-follow-everyone-on-threads/]
This has been lasting for several days. Fix it or at least provide some explanations if you don’t mind.
> Google Chrome is following in Safari’s footsteps by introducing a new feature that allows users to move the Chrome address bar to the bottom of the screen, enhancing user accessibility and interaction.
Firefox did this long before Safari.
Basically they’ll do anything except fair royalties.