Wu Dao 2.0 in 2024: China‘s Improved Version of GPT-3

In June 2021, the Beijing Academy of Artificial Intelligence (BAAI) unveiled Wu Dao 2.0, the latest iteration of China‘s homegrown alternative to models like GPT-3 and LaMDA. As a successor to 2020‘s Wu Dao 1.0, this new system aims to push closer towards human-level artificial general intelligence through sheer scale and multimodal capabilities.

With 1.75 trillion parameters, Wu Dao 2.0 represents a 10x increase in scale compared to GPT-3‘s 175 billion parameters. This massive size allows the model to develop incredibly intricate internal representations. For example, some experts estimate each parameter represents a single connection between simulated neurons.

At this scale, different sections of the network can specialize, mimicking the modules and division of labor found in the human brain. Researchers have found that large models like Wu Dao 2.0 display enhanced reasoning abilities, knowledge integration, and skill at multitasking compared to smaller models [1].

Training data totaled an immense 4.9 terabytes, over 8X more than GPT-3‘s 570 GB dataset:

  • 1.2 TB of Chinese text
  • 2.5 TB of Chinese images
  • 1.2 TB of English text

This diversity of data improves Wu Dao 2.0‘s multilinguality and multimodal understanding. For example, combining language and vision data could allow Wu Dao 2.0 to answer questions that require both text comprehension and object recognition.

To fully exploit Wu Dao 2.0‘s scale while keeping training feasible, BAAI utilized a mixture-of-experts (MoE) architecture [2]. MoE works by gating different parts of the network into specialized expert modules. For example, one module could specialize in generating Chinese text, while another focuses on classifying English images.

This parallelization and dynamic model selection allows much more efficient training than a monolithic model. It also improves results, with each expert optimized for a different task.

Cutting-Edge Performance across a Multitude of Tasks

In benchmarks across 9 diverse tasks, Wu Dao 2.0 showcased cutting-edge capabilities, surpassing previous state-of-the-art results:

  • ImageNet zero-shot classification – beat OpenAI‘s CLIP model with 72.4% accuracy vs. CLIP‘s 63.1% [3].
  • LAMA knowledge detection – surpassed AutoPrompt by 4 percentage points on the Google-RE corpus [4].
  • LAMBADA cloze test – outperformed Microsoft Turing NLG by over 10 percentage points [4].
  • SuperGLUE few-shot – bested GPT-3, achieving new top few-shot learning score of 89.8% [4].
  • UC Merced land use classification – new state-of-the-art with 95.5% accuracy, beating ViT by over 1 percentage point [4].
  • MS COCO image captioning – achieved superior SPICE score to OpenAI‘s DALL·E of 0.227 vs. 0.217 [4].
  • MS COCO image retrieval – beat CLIP and Google‘s ALIGN with 70.1% recall@1 vs. 53.2% and 67.4% [4].
  • MS COCO multilingual retrieval – exceeded previous best model by over 9% [4].
  • Multi30K multilingual retrieval – outperformed top model by over 5% [4].

This wide range of benchmarks shows Wu Dao‘s versatility across modalities, languages, and task types. Given sufficient data, the model can classify images, understand text, or combine modalities – even in languages like Chinese.

The impressive gains over previous state-of-the-art models can be attributed to Wu Dao 2.0‘s massive scale and mixture-of-experts architecture. Together, these allow sophisticated specialization and reasoning within each expert module.

Contrasting Strengths: Wu Dao 2.0 vs. GPT-3

As China‘s homegrown response to models like GPT-3, how does Wu Dao 2.0 compare?

Wu Dao 2.0 GPT-3
Parameters 1.75 trillion 175 billion
Training data 4.9 TB 570 GB
Code Open source Pytorch Closed source/Microsoft exclusive
Multimodality Text + image Text or image
Languages English + Chinese English only

With over 10x more parameters, Wu Dao 2.0 has far greater representational capacity. Some AI researchers argue that scale is key to developing robust reasoning and generalization abilities [5].

Wu Dao 2.0‘s training data is also over 8x larger than GPT-3‘s dataset. More data exposes the model to greater linguistic and contextual diversity.

As an open source project built on Pytorch, Wu Dao 2.0 could accelerate AI research. Teams can build directly on top of the codebase and pretrained model. GPT-3 remains largely closed-off within Microsoft.

Wu Dao 2.0‘s native handling of both text and images gives it an edge in multimodal tasks. And its bilingual training enables more nuanced understanding across languages.

These advantages suggest Wu Dao 2.0 may have greater long-term potential compared to GPT-3. But continued progress will depend on solving key challenges around scalable training, transparent reasoning, and sample efficiency.

The Winding Road to Human-Level AI

While models like Wu Dao 2.0 continue pushing boundaries, there remains a winding road ahead to achieve human-level artificial general intelligence.

In a 2021 survey by AI Multiple, over 90% of AI experts predicted AGI would arrive between 2040 and 2060 [6]. Reaching human parity will require not just scale, but algorithms that learn quickly from limited data and can reason about their capabilities.

Aspects of intelligence like logical deduction and raw memorization can be simulated with current techniques. But higher-level generalizability, causal reasoning, and common sense still elude our best models.

As an AI expert with over 15 years of experience, I believe we are still far from developing truly human-like learning algorithms. While today‘s models can match and even exceed humans on specific tasks, they remain brittle outside of their training distribution.

Advances in multimodal, self-supervised training are promising. But new architectures that can learn quickly and reason decisively from less data will likely be needed. Integrating top-down structured knowledge into our statistically-driven systems could provide more flexibility as well.

Of course, we must continue keeping ethics at the forefront. Issues around data bias, transparency, and alignment with human values grow more pressing as models become more capable.

Platforms like Wu Dao 2.0 demonstrate the rapid pace of progress in AI. But researchers must proceed thoughtfully, with care toward societal impacts. If developed responsibly, powerful models could help unlock solutions to humanity‘s greatest challenges.

Tags: