GPT-4 is Coming Next Week – and It Will Be Multimodal, Says Microsoft Germany

Mar 10, 2023

Updated • Mar 10, 2023

Apps

Before you say not another GPT story, just take some time and read this one. I promise it's not another collaboration with Open AI’s ChatGPT. Although, I believe we’re yet to see more. The release of GPT-4 is in the pipeline, according to Andreas Braun, Microsoft Germany's CTO, at the AI kickoff event on the 9th of March 2023.

He went on to mention at the kickoff event that was held in German that GPT-4 would be coming next week. GPT-4 will be a multimodal model. Multimodal refers to using multiple modes or forms of communication, such as kinesthetic, tactile and auditory. In the context of technology, multimodal interfaces allow users to interact with the system using different modes of output and input, such as gesture, voice, touch, and text. Microsoft's fine-tuning of multi-modality no longer comes as a surprise since the release of Kosmos-1 at the beginning of March.

Large Language Models (LLM) are considered game changers because it teaches machines to understand natural language. With LLM, machines can now understand statistics and what was previously only read and understood by people. The technology basically works in any language. The system allows you to ask a question in another language and get an answer in a different language. Multimodality makes the model comprehensive.

Disruption and Losing Jobs

Some may be worried that this new technology will steal their jobs considering the massive layoffs Microsoft had early this year. Braun was joined by Marianne Janik, CEO of Microsoft Germany, in confirming that AI isn’t here to replace any jobs but instead is here to assist in doing repetitive tasks differently.

She recommends that companies should form “ competent centers” that can train and give clarity on using AI to ensure a smooth migration. She also mentioned that Microsoft doesn’t use any customer data to train their models.

Use Cases for GPT-4

Holger Kenn (Chief Technologist of Business Development AI and Emerging Technologies) and Clemens Siebler (Senior AI Specialist), both from Microsoft Germany, provided insight on AI’s practical uses.

Kenn explains that multimodal AI can not only translate text into images but can also translate it into music and video.
Siebler illustrated a case where this could be used in call centers, meaning the speech-to-text telephone calls could be recorded, and the agents would no longer need to summarise or type the content manually.
Siebler mentioned that AI won’t always be correct, so verifying all information is important. In terms of regulation, Microsoft Germany took a positive stance at the AI kickoff.