Alibaba Losses Head of Self-Driving Unit; AI Creates Images from Text, and Vice Versa; Geely, Mobileye to Build Self-Driving EV for Consumer

China’s AI news in the week of January 9, 2022

Alibaba Mastermind of Delivery Robot Departs For Startup

Three months ago, Recode China A.I. reported that Chinese e-commerce giant Alibaba celebrated a milestone of one million packages delivered by the company’s 200 homegrown driverless delivery robots. The number of robots will be expanded to 10,000 over the next three years, the company pledged.

However, Alibaba’s high-flying autonomous driving plan is now clouded by the departure of Gang Wang, the mastermind behind this driverless robot, vice president of Alibaba Group, and head of Autonomous Driving Lab at Damo Academy Alibaba’s research wing.

The 40-year-old scientist is building a new startup specialized in cleaning robots, with financing bagged in the pocket.

Wang joined Alibaba in 2017 and served as the chief scientist of Alibaba’s newborn A.I. Labs. Starting in November 2017, Wang founded the Autonomous Driving Lab and led the development of driverless robots.

Before that, Wang was a tenured Associate Professor at Nanyang Technological University, Singapore, where he was an associate director of the ROSE lab. He obtained his Ph.D. in Electrical and Computer Engineering from the University of Illinois Urbana-Champaign under Fei-Fei Li, David Forsyth, and Derek Hoiem, and the Bachelor of Science in Electrical Engineering from the Harbin Institute of Technology. Wang is also a recipient of the MIT Technology Review innovator under 35 award in 2017.

Li Cheng, Alibaba CTO, will serve as the interim head of the Autonomous Driving Lab at Damo Academy.

A.I. Can Now Create Images from Text, and Vice Versa

An A.I. model was commonly designed to solve only one narrow task, recognizing an image or understanding a sentence. It can’t do both, unlike humans that can process vision, languages, speech, or whatever modality they are using the same architecture, i.e., neocortex. However, thanks to Transformers, A.I. across different areas are starting to look identical and consolidate.

A recent A.I. innovation from Baidu exemplifies this trend. The Chinese tech company introduced ERNIE-ViLG, a Transformer-based pre-training framework for both text-image and image-text generations. The model with 10 billion parameters was the largest multimodal generative model in Chinese.

Multimodal generation means one modal, such as vision, speech, and language, is translated into a different modal with coherently aligned semantics. One typical multimodal generation task is text-image synthesis — creating images from text captions. DALL·E, introduced by OpenAI, is well trained to generate images from text descriptions.

ERNIE -ViLG takes a step forward by adding visual understanding capability — understanding visual content and expressing that understanding with languages. To encompass both image synthesis and visual understanding under one architecture, the research team used techniques like image quantization to represent images into a sequence of discrete representations and feed the sequence into a transformer for autoregressive image-to-text (text-to-image) generation. They further proposed an end-to-end training method for text-to-image synthesis.

Experiments showed ERNIE-ViLG achieves state-of-the-art performance for both text-to-image and image-to-text tasks, obtaining an FID of 7.9 on MS-COCO for text-to-image synthesis and best results on COCO-CN and AIC-ICC for image captioning.

Excellent new applications of ERNIE-ViLG for Baidu as a search engine could include more accurate visual search and image captioning. For example, users can point to an object with a camera and search for more context-rich results or type a poem and create an aesthetic art piece.

Geely, Mobileye Team up to Release a Self-Driving EV for Consumers

Consumer AV, a new term referring to hands-free self-driving vehicles for mass consumers, has made headlines at CES 2022. Mobileye, Intel’s autonomous driving subsidiary, announced a deepened strategic technology partnership with Geely’s electric vehicle brand Zeekr to develop a new all-electric consumer vehicle with Level 4 capabilities.

Zeekr is a premium electric mobility technology brand launched this year by Geely, one of the top Chinese automotive companies and the owner of Volvo, Polestar, and Lotus. Zeekr’s first production vehicle, Zeekr 001, gained huge traction in China with a delivery number of 3,796 in December, its third delivery month. The car with a starting price of US$44,000, can accelerate from 0 to 60 mph in under 4 seconds and reach a top speed of 124 miles per hour.

The planned Zeekr vehicle will make debut in 2024. The car will be embedded with six Mobileye’s EyeQ5 system-on-chips, each providing a 2.4 TOPS of computing performance per watt. Six EyeQ5 SoCs put together are able to power a Level 4 self-driving system that processes two independent perception systems relying on camera and radar& LiDAR respectively, a crowdsourced mapping solution for autonomous vehicles, and a safety reduduncy.

Alike Qualcomm Snapdragon vs MediaTek vs Apple in-house chip in the smart phone era, Mobileye is feeling pressure from NVIDIA and Tesla. NVIDIA Orin, which processes 254 TOPS of computing performance, is winning favours of major electric vehicle makers like NIO, Xpeng, Li Auto, and Baidu’s JiDU for their next-gen premium models. Tesla’s homemade chip is also supercharing their Full Self-Driving system to navigate through city streets and highways.

Besides, Mobileye also unveils their next-generation SoC for autonomous vehicles, EyeQ Ultra, which aims for a 2025 release. The chips features a computing performance of 176 TOPS.

Investment News:

  • SenseTime shares has doubled on the Hong Kong Stock Exchange since its market debut in December, despite the investment ban imposed by the US government.
  • Artosyn Microelectronics, a Shanghai-based supplier of AI system-on-chips (SoCs), has raised a total of RMB500 million in its Series B and B+ funding rounds. Founded in 2011, the company provides NPUs for drones, automotives, security cameras, and more edge devices.
  •, a Shenzhen-based game AI exploration company, has raised US$100 million in its Series B funding round led by Sequoia China. Founded in 2019, the company offers artificial intelligence for game companies by integrating AI capabilities with game scenarios solution.



A weekly newsletter on emerging AI trends and technologies in China

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Recode China AI

A weekly newsletter on emerging AI trends and technologies in China