Computer Vision , AI

[One-page summary] MetaFormer Is Actually What You Need for Vision (CVPR 2022) by Yu et al. 본문

Paper_review[short]

[One-page summary] MetaFormer Is Actually What You Need for Vision (CVPR 2022) by Yu et al.

Elune001 2024. 1. 16. 00:58

● Summary: The performance of a transformer comes from its architecture, not the attention module

 

● Approach highlight

  • MetaFormer: The structure of the transformer plays a bigger role in performance than the type of token mixer

The entire architecture of the transformer is the key to performance, not the token mixer(red box).

  • PoolFormer: Prove that the structure of the MetaFormer has a greater impact on the performance of the transformer by replacing the token mixer with a pooling layer to validate performance.

● Main Results:

 

● Discussion

  • The reason why the proposed method(PoolFormer) doesn't work NLP tasks.