待看的
- AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE(ViT)
- Masked Autoencoders Are Scalable Vision Learners(MAE)
- Swin Transformer: Hierarchical Vision Transformer using ShiftedWindows
- Joint Detection and Identification Feature Learning for Person Search
- Norm-Aware Embedding for Efficient Person Search
- Exploring Visual Context for Weakly Supervised Person Search
- Domain Adaptive Person Search
在看的
Sequential End-to-end Network for Efficient Person Search Tongji University
A Survey on MultiView Clustering IEEE 2021
看过的
ResNet CVPR-2016-Kaiming He
Attention Is All You Need Transformer google
Vision Transformer ICLR-2021-Google-Brain Team
MAE CVPR-2021-Kaiming He
Swin Transformer ICCV-2021-微软亚洲研究所
Joint Detection and Identification Feature Learning for Person Search The Chinese University of Hong Kong 2017