Megvii Chief Scientist, ResNet Creator Dies; Baidu’s EV Arm Unveils a Self-Driving Concept Car; Alibaba Introduces CIPU to Power Data Centers

Weekly China AI News from June 6 to June 12

6 min readJun 14, 2022

News of the Week

Megvii Chief Scientist, ResNet Creator Dies at 45

R.I.P.: Jian Sun, Chief Scientist of Megvii and Managing Director of Megvii Research, passed away at midnight on June 14, Megvii announced in a statement. A high-profile AI mastermind, Dr. Sun’s sudden unexpected death is shocking news that ripples across the AI community and a huge loss for the global computer vision field.

Who is Jian Sun? Dr. Sun joined Megvii Technology, one of the world’s leading facial tech developers, in 2016 as Chief Scientist and Managing Director of Research. Prior to that, Jian Sun spent thirteen years at Microsoft Research and served as Principal Research Manager between 2015 and 2016. Dr. Sun received his bachelor’s, master’s, and P.hD. in electrical engineering at Xian Jiaotong University.

Dr. Sun was named to Technology Review magazine’s annual list of the world’s top innovators under the age of 35 in 2010 and received the Ho Leung Ho Lee Foundation Young Innovator Prize in 2019.

ResNet: Dr. Sun is also famous for his co-creation of ResNet, an artificial neural network (ANN) that makes it possible to train up to hundreds or even thousands of layers. In the paper Deep Residual Learning for Image Recognition, Dr. Sun and his research partners created residual nets with a depth of up to 152 layers, which achieved a 3.57% error on the ImageNet test set and won the 1st place in the ILSVRC 2015 classification task. ResNet was crowned the Best Paper Award at CVPR 2016 and became one of the most widely-used ANNs in computer vision tasks.

More: Chinese media reported that people surrounding Dr. Sun were shocked because he went to work in the company the day before he died as normal. A person who is familiar with the manner told Recode China AI that a sudden myocardial infarction is the cause that killed Dr. Sun, but it has not been confirmed by other media outlets or Megvii.

Baidu, Geely’s EV Startup Unveils First Self-Driving Robocar Concept ROBO-01

What’s new: JIDU, an intelligent electric vehicle startup backed by Baidu and Geely, unveiled its first concept robocar ROBO-01. The concept car, or as the company called it “a concept production car”, is 90 percent similar to the production model which will make its debut later this year.

Futuristic design: The exterior and interior of ROBO-01 characterize a similar iPhone-like sleek and clean styling with no lines, chromes, and decorations. The car features two butterfly wing doors, rear pair of doors, and non-marking side windows. A giant eye-popping screen that runs from the driving seat to the main passenger seat and a spacelike adaptive zero-gravity seat inside the car are apotheoses of this robocar’s futuristic design philosophy.

AI: The selling point of this robocar is its autonomous driving capability provided by Baidu Apollo. ROBO-01’s autonomous driving system is equipped with Nvidia’s “dual” Orin X chips and 31 external sensors, including 2 LiDAR, 5 millimeter-level wave radars, 12 ultrasonic radars, and 12 cameras. Two liftable LiDARs mounted surprisingly on the front hood can offer a 180-degree FOV. The press release also said JIDU’s system is able to drive in three main driving scenarios: high-speed, urban roads, and parking.

Pricing? While there is little specification data revealed in terms of pricing, range, and battery capacity, Baidu CEO Robin Li once said in an earnings conference call that the car will be priced above RMB200,000 (~$30,000). JIDU’s first production model is positioned as a mid-size SUV that will compete directly with the Tesla Model Y, said Xia Yiping, JIDU CEO.

Alibaba Cloud Unveils New Cloud Infrastructure Processing Unit (CIPU) to Take on CPUs

What’s new: Alibaba Cloud announced a new cloud infrastructure system named Cloud Infrastructure Processing Unit (CIPU) to power its cloud-native data centers at its annual summit.

“The rapid increase in data volume and scale, together with higher demand for lower latency, call for the creation of a new tech infrastructure,” said Alibaba Cloud Intelligence President Jeff Zhang in a speech during the summit.

Main features: The press release said each bare metal system can run 2000 containers and uses sandbox container technology to provide more secure isolation for containers. The chip can run storage I/O 3 million times per second and network I/O at 50 million packets per second. Supported by the RDMA technology, CIPU can reduce the network latency to as low as five microseconds, and reach a maximum of bandwidth of 200GB.

Other examples: Alibaba’s CIPU is a natural creation in response to increased computational needs on the cloud. U.S. chip giants like NVIDIA and Intel also launched similar products in the past few years, such as NVIDIA’s data processing units and Intel’s infrastructure processing units.

Papers and Project

Vision GNN: An Image is Worth Graph of Nodes

Researchers from the Chinese Academy of Science and Huawei propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks. Extensive experiments on image recognition and object detection tasks demonstrate the superiority of the ViG architecture. The PyTorch code will be available at this https URL and the MindSpore code will be available at this https URL.

Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework

Researchers from Qihoo 360 AI Research and Tsinghua University build a large-scale Chinese cross-modal benchmark called Zero for the research community to fairly compare VLP models. They release two pre-training datasets and five fine-tuning datasets for downstream tasks. Alongside, they propose a novel pre-training framework of pre-Ranking + Ranking for cross-modal learning, named R2D2. When conducting zero-shot tasks on Flickr30k-CN, COCO-CN, and MUGE, R2D2 pre-trained on a 250 million dataset achieves significant improvements of 4.7%, 5.4%, and 6.3% in mean recall compared to the state-of-the-art. The datasets, models, and codes are available at this https URL.

Siamese Image Modeling for Self-Supervised Vision Representation Learning

Researchers from SenseTime and Tsinghua University propose Siamese Image Modeling (SIM), which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations. Their method uses a Siamese network with two branches. The online branch encodes the first view, and predicts the second view’s representation according to the relative positions between these two views. The target branch produces the target by encoding the second view. In this way, they are able to achieve comparable linear probing and dense prediction performances with ID and MIM, respectively. They also demonstrate that decent linear probing results can be obtained without a global loss.

Rising Startups

Insilico Medicine, a clinical-stage end-to-end artificial intelligence (AI)-driven drug discovery company, has completed a $60 million Series D financing from a syndicate of global investors. Founded in 2014, the Hong Kong-New York-based company is using AI to create an entirely new AI-driven drug discovery pipeline from A to Z.

Hai Robotics, an autonomous case-handling robotic company, has raised over $100 million in its Series D funding round. Founded in 2016, the Shenzhen-based startup develops the first autonomous case-handling robotics system ever created and put into commercial use.

Motovis, an autonomous driving startup, has raised multi-hundreds of millions of RMB in its Series C funding round. Founded in 2015, the Shanghai-based firm develops advanced AI embedded on automotive chips to provide autonomous driving products and solutions.