Chinese Prominent AI Lab Plagiarizes Big Model Paper; Microsoft Research Asia Halts Internship Hiring from US-Banned Universities; Beijing Announces New RISC-V Chip Institute

Weekly China AI News from April 11 to April 17

Dear readers, sorry for missing the newsletter issue last week due to personal reasons. This week, we will talk about the blockbuster plagiarism involving BAAI and its long-awaiting Big Model paper. Also, a social media post that exposed MSRA’s rejection of interns from the Beijing University of Posts and Telecommunications went viral. Finally, if you are a RISC-V enthusiast, you may hear of Xiangshan. Now the China-grown open-source chip project has found its home.

News of the Week

A 200-page academic paper composed by a number of top AI organizations in China that investigates the technologies and applications of large AI models has been alleged for copying from a dozen papers.

Nicholas Carlini, a Google Brain researcher, first spotted the plagiarism case after his co-author identified some of the text in the paper A Roadmap for Big Model seemed familiar. Carlini then used a duplication examination tool to identify all repeated sequences and wrote a blog of his findings. His blog, A Case of Plagiarism in Machine Learning Research, was quickly echoed by Katherine Lee, another Google Brain researcher who also discovered plagiarism from her previous papers and posted it on Twitter.

One day after Lee’s post gained traction on social media, the Beijing Academy of Artificial Intelligence (BAAI), the leading research institute that designed and produced this paper, quickly issued an apology statement in response and initiated an independent review of third-party experts to assess the issue and its accountabilities further.

Plagiarism is a severe and dishonest offense but also transpires very often in academia without being noticed. So why does this paper draw such significant attention?

About the paper: A Roadmap for Big Model is a grand project that systematically reviews the overall progress of large AI models, also known as foundation models coined by Stanford University, and guides the follow-up research. In late March, the paper was authored by over 100 researchers from 19 organizations, including elite universities like Tsinghua University, AI labs from tech giants like Tencent and Huawei, and the Chinese Academy of Sciences.

About BAAI: BAAI was established in November 2018 in Beijing, China, as a non-profit research institute. The lab grabbed headlines last year when it released Wudao, China’s first large-scale multimodal AI model with 1.75 trillion parameters capable of generating text and predicting protein structure.

Three types of plagiarism: While the Big Model paper was found plagiarizing from multiple different papers, not all plagiarism are direct duplicates.

  • Duplicates of multi-paragraphs: The paragraphs below in Section 2.3.1 of Article 2 are clear duplicates.
  • Duplicates without references: The paragraph below from Section 8.3.1 of Article 8 introduced LXMERT using the same description from the original paper without a paper inference
  • Duplicates with a simple citation: The paragraph below from Section 2.4.3 of Article 2 is copying text with a citation. As Nicholas later updated, this could be attributed to “some junior authors meant well and thought that a citation was enough to copy text.”

Implications: The reputation of BAAI and the Chinese academia will take a hit following the plagiarism. Under relevant social media posts are flooded with negative comments like “Roadmap to big plagiarism.” Academic reputation isn’t built in a day but could be collapsed in a day.

An anonymous paper author familiar with the matter said he was left with only one week to contribute his segment. His access to the paper was retrieved one week after his submission. “It looks like the organizer only knitted segments from different authors together without further proofreading and examination.”

Microsoft Research Asia (MSRA) has reportedly stopped hiring students from Chinese universities added to the U.S. entity sanction list as interns. Multiple social media posts and media outlets have confirmed the authenticity of the news.

Bad news: Students from Seven Sons of National Defence, Beijing University of Posts and Telecommunications, and the rest of the US-restricted companies are barred from MSRA’s internship programs. 18 Chinese universities are on the “Entity Sanctions List.”

Good news: MSRA’s full-time positions are still available to these universities, and the restriction only applies to enrolled students, not graduates.

Why it matters: This is another event that represents the downfall of US-China tech collaborations amid a worsening geopolitical conflict. Founded in 1998, MSRA is one of the leading research labs and a talent hub for Chinese computer scientists. MSRA fosters China’s AI ecosystem and has trained thousands of talented Chinese researchers who play major roles in Chinese tech companies and research organizations.

Last year, Recode China AI introduced an open-source, high-performance RISC-V processor project developed by the Chinese Academy of Sciences named XiangShan (香山). Now the project has a new home: Beijing Open-Source Chip Research Institute.

What’s new: Yungang Bao, Deputy Director at Institute of Computing Technology of the Chinese Academy of Sciences, announced the opening of the new chip institute on his social media.

“The research institute aims to build an open-source chip technology system and accelerate the development of the open-source chip ecosystem … The research institute will focus on the “Xiangshan” open-source, high-performance RISC-V processor core.”

About Xiangshan: The first-gen micro-architecture of XiangShan Yanqihu (雁栖湖) achieves a maximum clock frequency of 1.3 GHz and a 7/Ghz performance measured on SPEC2006.

The second-gen XiangShan, also known as Nanhu (南湖), is still under development on the master branch with an estimated performance of 20 GHz at a maximum clock frequency of 2 GHz, close to i9–10900K of 11.08/GHz. SiFive announced its P550 with a SEPC2006 score of 8.65/GHz.

Papers & Projects

This CVPR’22 paper Learning to Answer Questions in Dynamic Audio-Visual Scenarios studied Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. Researchers from Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, and University of Rochester introduced a large-scale MUSIC- AVQA dataset, which contains more than 45K question-answer pairs covering 33 different question templates spanning over different modalities and question types. They also developed several baselines and created a spatio-temporal grounded audio-visual network for the AVQA problem. More about this project on GitHub.

Using AI technologies for human motion tracking and capturing promises great potential to lower the threshold and reduce costs. This CVPR’22 paper Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimationproposed to capture arm and hand dynamics simultaneously by leveraging the arm-hand correlations. Researchers from the NetEase Games AI Lab designed a spatial-temporal parallel transformer model to make full use of arm-hand correlation as well as inter-frame information, which enhances the robustness of the prediction.

A recent nature Cortical ensembles orchestrate social competition through hypothalamic outputs discovered how the brain represents social rank and guides behaviour on the basis of this representation. Researchers from Shanghai Jiaotong University and a number of top institutes developed a social competition assay in which mice compete for rewards, as well as a computer vision tool (AlphaTracker) to track multiple, unmarked animals. A hidden Markov model combined with generalized linear models was able to decode social competition behaviour from mPFC ensemble activity. Population dynamics in the mPFC predicted social rank and competitive success.

Rising Startups

Adaps Photonics, a manufacturer of high-performance photoelectric sensor chips, has raised hundreds of millions of yuan in its Series C funding round. Founded in 2018, the Shenzhen-based company develops 3D dToF sensing for mobile phones, lidars and other high-performance depth sensing systems.

Wuzhi Intelligence, an AI-powered SOAR startup, has raised almost RMB 100 million yuan in its Series Pre-A funding round. SOAR stands for Security Orchestration, Automation, and Response, and allows companies to collect threat-related data from a range of sources and automate the responses to the threat. Founded in 2019, the Shanghai-based company has developed its flagship SOAR product HoneyGuide.



