Meet Visual ChatGPT; Meituan CEO Joins ChatGPT Race; Tencent-Backed Self-Driving Firm Eyes $1B IPO

Weekly China AI News from Mar 6 to Mar 12

6 min readMar 13, 2023

Welcome readers! This week, we’ll cover some new ChatGPT challengers in China backed by notable names. Microsoft Research Asia just released Visual ChatGPT capable of conversing and drawing, as conversational AI moves towards multimodality. We’ll also take a look at Momenta, backed by Tencent, as it makes its way to the public market.

Weekly News Roundup

Meet Chinese Startups Chasing ChatGPT Dreams

What’s new: With the rise of ChatGPT and large language models, many Chinese entrepreneurs are recognizing the potential for these technologies to drive significant growth in the digital economy. These entrepreneurs are launching new ventures with the goal of capitalizing on this imminent inflection point.

🫶🏻 Wang Xing, Meituan Co-founder and CEO, has announced his personal investment in the start-up “Beyond Lightyear” (光年之外), launched by his former roommate and Meituan Co-founder Wang Huiwen. Wang will serve as a board director and the investment will be part of the company’s Series A funding round. [LINK]

Wang said that while he is excited about the enormous productivity that can be created with AI large models, he is also worried about their impact on the world.
Beyond Lightyear is a start-up that aims to build China’s ChatGPT. The company has raised $50 million, out of Wang Huiwen’s pocket, at an estimated valuation of $200 million.

📽 Li Yan, former Head of the Multimedia Understanding Team at Kuaishou, reportedly launched an AI company called “Yuanshi Technology” last year. The company’s main focus will be on developing large multimodal AI models. [LINK]

Li Yan is a veteran AI expert of Kuaishou and a founding member of Kuaishou’s first deep learning department.

🤖 A ChatGPT-like Chinese website named Inspo quietly went live last week and garnered attention.

👍🏻 The functionalities of Inspo are similar to ChatGPT, such as responding to text prompts and generating human-like writings for multiple rounds. Inspo’s answers are unexpectedly smooth and even comparable to ChatGPT in terms of structure and word choices, leading some to speculate whether Inspo uses GPT APIs.
👎🏻 Inspo is not as versatile as ChatGPT in tasks like code programming or table generation.
Inspo reportedly underwent months of internal testing.
MINIMAX, the startup behind Inspo, was founded by Yan Junjie, the former Vice Head of Research and R&D Vice President at SenseTime. MINIMAX is also responsible for creating Glow, a popular AI chatbot app with millions of users.
You can play with Inspo here, but the website went down.

MSRA Unveils Visual ChatGPT with the Ability to Converse, Draw, and Edit

What’s new: Microsoft Research Asia has developed a new AI system called Visual ChatGPT, which allows users to chat with an AI model using both text and images. The system can also perform image editing and drawing tasks based on natural language instructions.

How it works: Instead of a multimodal foundation model built from the scratch, Visual ChatGPT works by connecting ChatGPT with a series of specialized Visual Foundation Models (VFM), including Stable Diffusion, BLIP、pix2pix, and more. These models are designed for different visual tasks, such as drawing, editing, and captioning, among others.

Researchers proposed a Prompt Manager to connect ChatGPT and the VFMs, which has the following functions:

Clearly communicates the capabilities of each VFM to ChatGPT and specifies the input-output formats.
Translates diverse visual information, such as png images, depth images, and mask matrix, into a language format to help ChatGPT comprehend them
Manages the histories, priorities, and conflicts of different VFMs.

Limitations: Visual ChatGPT’s performance depends on accurate and effective VFMs. Prompt engineering and task decomposition can limit real-time capabilities. Maximum token length and security risks with model plugging also require careful consideration for sensitive data protection.

Tencent, GM-Backed Autonomous Driving Upstart Momenta to Raise $1 Billion in US IPO

What’s new: Momenta, a Chinese autonomous driving start-up, is reportedly considering an initial public offering (IPO) in either Hong Kong or the United States as early as this year. The company is looking to raise up to $1 billion in its IPO.

Who’s Momenta: Founded in 2016 in Beijing, Momenta’s core team includes alumni from Tsinghua University and Microsoft Asia Research Institute. The company has raised funds from top automakers such as GM, Daimler, SAIC, Mercedes-Benz, and Toyota, as well as investment firms including NIO Capital, Shunwei Capital, and GGV Capital.

By November 2021, Momenta had completed seven rounds of fundraising, with participation from Tencent, Yunfeng Capital, Temasek, and Sequoia Capital, among others.

Momenta’s products: Momenta’s autonomous driving solutions include the production-level autonomous driving system (Mpilot) and the fully driverless RoboTaxi system (MSD).

The company focuses on developing its autonomous driving tech engine using data-driven algorithms to accumulate data and iterate its technology.

Momenta’s autonomous driving system has received favors from major Chinese automakers, including SAIC, Geely, BYD, and Great Wall.

Trending Research

Parameter-efficient Fine-tuning of Large-scale Pre-trained Language Models

As the costs of fine-tuning and storing all the parameters of pre-trained language models become prohibitive, researchers from Tsinghua University and Beijing Academy of Artificial Intelligence proposed a new approach called delta-tuning, which optimizes a small portion of the model parameters while keeping the rest fixed. This approach can effectively stimulate large-scale models and drastically cut down computation and storage costs.

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Researchers from the Chinese Academy of Sciences proposed CaFo, a Cascade of Foundation models, to improve visual recognition in low-data situations. This involves cascading multiple pre-training paradigms, including CLIP’s language-contrastive knowledge, DINO’s vision-contrastive knowledge, DALL-E’s vision-generative knowledge, and GPT-3’s language-generative knowledge. The approach has shown state-of-the-art performance for few-shot classification.

Cones: Concept Neurons in Diffusion Models for Customized Generation

Researchers from the University of Science and Technology of China. Shanghai Jiao Tong University, Ant Group, and Alibaba investigated whether deep neural networks have concept neurons that respond to specific stimuli. The researchers found clusters of neurons that corresponded to a given subject and called them concept neurons. These neurons could generate multiple related concepts in a single image, and shutting them off could change the subject in the image.

Noteworthy Stories

A yacht party organizer has been outed as an artificial intelligence scammer after potential VIP customers noticed ‘freaky’ fingers in promotion photos. — SCMP
Chinese tech companies building their own versions of ChatGPT are trying to recruit artificial intelligence researchers from abroad. — SCMP
A booming illicit market for OpenAI’s chatbot shows the huge potential, and risks, for Chinese generative AI. — Wired
Proposals and recommendations in this year’s two sessions had a high frequency of terms related to cutting-edge technologies, including ChatGPT, humanoid robots, and autonomous driving. — TMTPost