🥸 AI Scam Robs Man of Millions; Baidu Previews ERNIE-Powered Search; Chinese Evaluation of Foundation Models

Weekly China AI News from May 22 to May 28

8 min readMay 29, 2023

In this week’s issue, I’ll delve into a recent AI scam case that takes 4.3 million yuan out of a man’s pocket in ten minutes. Mandopop singer Stephanie Sun first reacts to her surprising AI-induced viral resurgence. What happens when Baidu Search merges with LLM? Additionally, Chinese researchers designed a Chinese evaluation suite for foundation models, in which only GPT-4 exceeds 60% accuracy.

Man Deceived out of Millions by Scammers Leveraging Generative AI

What’s new: Voice-cloning AI scam calls, once a novel concern, are now overshadowed by an emerging threat in the digital world — deepfake face-swapping technology being used in video call scams. The severity of this threat was spotlighted in a recent case unveiled by the Chinese police, where a victim was scammed out of 4.3 million yuan within just ten minutes.

What’s going on: Roughly one month ago, Mr. Guo, the representative of a tech company in Fuzhou, received an unexpected WeChat video call. The face on the screen was familiar, an old “friend” who appeared anxious.

This “friend” explained his trouble: one of his friends was bidding on a contract and urgently needed a bid bond of 4.3 million yuan. He was unable to transfer the money through a public-to-public account himself, so he asked Mr. Guo to lend his company account as the conduit for this transaction.

The “friend” asked Mr. Guo for his bank account number. Shortly after, he shared a screenshot of a bank transfer slip, assuring him that the money had been deposited into his account.

Trusting his old friend, Mr. Guo felt no need to cross-check this transaction. Within minutes, he made two separate transfers and then messaged his real friend: “The job’s done.”

However, his friend responded with a question mark.

Panic set in. Mr. Guo quickly realized he was a victim of a scam. He called the police and the bank, and managed to take back over 3.3 million yuan within ten minutes.

How it works: Behind the scam are two generative AI technologies involved: voice cloning and face swapping. According to a Chinese media outlet, it costs a mere 2 to 10 yuan (~$1.5) to animate a 2D photo into a video. A model for real-time video face-swapping for a large-scale live-streaming platform can be purchased for 35,000 yuan (~$5000).

For voice cloning, Microsoft’s new model, VALL-E, needs only three seconds to clone anyone’s voice, even mimicking the background noise of the environment.

If combined with a virtual camera, any video resource can be used in a real-time video call.

The Beijing Anti-Fraud Center and the Wuhan Network Fraud Center have issued warnings about the seriousness of new scams, stating that “the success rate of these scams is close to 100%”.

Implications: The AI scam has stirred a heated discussion on social media, potentially reinforcing regulators’ strict measures to restrain the exploitation of generative AI technologies.

AI Revives Mandopop Singer’s Voice, Star Reacts to Viral Resurgence

What’s new: Over the past few weeks, Stefanie Sun, a Mandopop singer who has been out of the spotlight for years, has surprisingly resurfaced in the spotlight on Bilibili, the Chinese equivalent of YouTube. Her revival is largely due to an AI model that has cloned her voice and integrated it into hundreds of Mandopop classics. Read the full story from TechCrunch for more. Watch Jay Chou’s 发如雪 featuring Sun’s voice above.

Last week, Sun shared her first reaction to this surreal twist of events, a response I found to be particularly well-articulated. Here’s a look:

As my AI voice takes on a life of its own while I despair over my overhanging stomach and my children’s every damn thing, I can’t help but want to write something about it.
My fans have officially switched sides and accepted that I am indeed 冷门歌手 while my AI persona is the current hot property. I mean really, how do you fight with someone who is putting out new albums in the time span of minutes.
Whether it is ChatGPT or AI or whatever name you want to call it, this “thing” is now capable of mimicking and/or conjuring, unique and complicated content by processing a gazillion chunks of information while piecing and putting together in a most coherent manner the task being asked at hand. Wait a minute, isn’t that what humans do? The very task that we have always convinced ourselves; that the formation of thought or opinion is not replicable by robots, the very idea that this is beyond their league, is now the looming thing that will threaten thousands of human conjured jobs. Legal, medical, accountancy, and currently, singing a song.
You will protest, well I can tell the difference, there is no emotion or variance in tone/breath or whatever technical jargon you can come up with. Sorry to say, I suspect that this would be a very short term response.
Ironically, in no time at all, no human will be able to rise above that. No human will be able to have access to this amount of information AND make the right calls OR make the right mistakes (ok mayyyybe I’m jumping ahead). This new technology will be able to churn out what exactly EVERYTHING EVERYONE needs. As indie or as warped or as psychotic as you can get, there’s probably a unique content that could be created just for you. You are not special you are already predictable and also unfortunately malleable.
At this point, I feel like a popcorn eater with the best seat in the theatre. (Sidenote: Quite possibly in this case no tech is able to predict what it’s like to be me, except when this is published then ok it’s free for all). It’s like watching that movie that changed alot of our lives Everything Everywhere All At Once, except in this case, I don’t think it will be the idea of love that will save the day.
In this boundless sea of existence, where anything is possible, where nothing matters, I think it will be purity of thought, that being exactly who you are will be enough.
With this I fare thee well.

Baidu Teases New Search Feature Powered by ERNIE

What’s new: Last week at a Baidu event, a new chat feature known as the “AI Companion” (AI伙伴) built into Baidu Search and Baidu APP was introduced by a Baidu executive. As per the demo, the AI Companion icon is located in the upper right corner of the Baidu APP. By clicking on it, users join a standalone chat interface akin to Bing Chat.

How it works: Mirroring ERNIE Bot, the AI Companion can understand language prompts, and generate text responses, images, and audio. It also supports voice commands. Below are additional capabilities.

The AI Companion can craft personalized travel plans and offer a list of hotels along with real-time pricing for reservations.
The AI Companion can access a web link and summarize its content in mere seconds.
As per the preview video, the AI Companion can fulfill tasks such as checking weather updates, scheduling appointments, exhibiting map navigation to any location, document analysis, and captioning images akin to GPT-4.

AI BOT: Alongside this, Baidu Search has also teased AI BOT, a tool that provides virtual AI avatars to creators, businesses, institutions, and brands with the aim of enhancing their content creation and service efficiency.

One more thing: At ZGC Forum 2023, CEO Robin Li asserted that Baidu will soon roll out ERNIE 3.5, the newest LLM that powers ERNIE Bot and all other Baidu products. Li underscored Baidu’s ambition to be the first to rebuild all of its applications using foundation models.

Weekly News Roundup

⚯ On May 19, the Beijing Science and Technology Commission, along with other Beijing authorities, jointly launched the “Beijing Artificial General Intelligence Industry Innovation Partner Program.” The first batch includes 39 member units across five categories — computing partners, data partners, model partners, application partners, and investment partners. Baidu is one of the seven model partners, with Alibaba Cloud and DAMO Academy selected as computing and model partners, respectively.

💪🏻 At the 7th World Intelligence Congress, the National Supercomputing Center in Tianjin unveiled the latest generation Tianhe supercomputer, a new Chinese exascale supercomputer. The center also launched the “Tianhe E-class Intelligent Computing Open Innovation Platform” and the “Tianhe Tianyuan Large Langauge Model.”

📖 If you are tired of “all-including buffet-like mega machine learning conferences,” you should pay attention to the Conference on Parsimony and Learning (CPAL), an annual research conference focused on addressing the parsimonious, low-dimensional structures that prevail in machine learning, signal processing, optimization, and beyond. The inaugural conference will be hosted by HKU Data Science Institute.

🇨🇳 Elon Musk said the US has the most advanced AI, but China is close behind. The gap is on the order of 12 months. Musk believes AGI will be cracked in this decade (though, considering his track record with predictions on autonomous driving…)

Trending Research

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

Affiliations: Shanghai Jiao Tong University, Tsinghua University, University of Edinburgh
C-EVAL, the first extensive Chinese evaluation suite, has been launched to assess the advanced knowledge and reasoning abilities of large language models (LLMs) in a Chinese context. Covering 52 disciplines across four difficulty levels, it includes C-EVAL HARD, featuring particularly challenging subjects. Initial results show only GPT-4 exceeding 60% accuracy, suggesting significant scope for LLMs’ improvement. C-EVAL aims to highlight key strengths and weaknesses of foundational models, spurring their evolution for Chinese users.

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

Affiliations: Tsinghua University, SenseTime Research, Centre for Artificial Intelligence and Robotics, HKISI, CAS, University of Science and Technology of China, the Chinese University of Hong Kong, Shanghai Artificial Intelligence Laboratory, Institute of Automation, Chinese Academy of Science(CASIA)
The Ghost in the Minecraft (GITM) framework combines LLMs with text-based knowledge to create adaptable agents in Minecraft. These agents can handle multiple tasks in complex environments, significantly outperforming previous methods, and only require a single CPU node for training. GITM highlights the potential of using LLMs for complex, open-world applications.

VideoLLM: Modeling Video Sequence with Large Language Models

Affiliations: Nanjing University, Shanghai AI Laboratory
VideoLLM, a new framework, harnesses LLMs’ abilities to comprehend and analyze growing video data. It translates diverse inputs into a unified token sequence for an LLM decoder, providing a comprehensive solution for diverse video understanding tasks. Extensive testing confirms LLMs’ effective application to video tasks. The framework’s code is publicly available.

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

Affiliations: Tsinghua University, Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, ShengShu, Pazhou Laboratory
This work introduces Variational Score Distillation (VSD), a framework which treats the 3D parameter as a random variable, addressing these issues. The new system, called ProlificDreamer, improves the diversity and quality of generated samples and allows for enhancements like distillation time schedule and density initialization. It produces high-resolution, high-fidelity NeRF with intricate effects, and when combined with VSD, results in highly detailed and photorealistic meshes. ProlificDreamer stands as an upgrade to traditional SDS approaches in text-to-3D generation.