The proceedings of the 27th International Conference, MMM2021 in Prague, Czech Republic have been published by Springer as LNCS 12573 and are accessible through the following links:

Published papers

  1. Crossed-Time Delay Neural Network for Speaker Recognition
    Liang Chen, Yanchun Liang, Xiaoshu Shi, You Zhou, Chunguo Wu
    Pages 1-10
  2. An Asymmetric Two-Sided Penalty Term for CT-GAN
    Huan Zhao, Yu Wang, Tingting Li, Yuqing Zhao
    Pages 11-23
  3. Fast Discrete Matrix Factorization Hashing for Large-Scale Cross-Modal Retrieval
    Huan Zhao, Xiaolin She, Song Wang, Kaili Ma
    Pages 24-36
  4. Fast Optimal Transport Artistic Style Transfer
    Ting Qiu, Bingbing Ni, Ziang Liu, Xuanhong Chen
    Pages 37-49
  5. Stacked Sparse Autoencoder for Audio Object Coding
    Yulin Wu, Ruimin Hu, Xiaochen Wang, Chenhao Hu, Gang Li
    Pages 50-61
  6. A Collaborative Multi-modal Fusion Method Based on Random Variational Information Bottleneck for Gesture Recognition
    Yang Gu, Yajie Li, Yiqiang Chen, Jiwei Wang, Jianfei Shen
    Pages 62-74
  7. Frame Aggregation and Multi-modal Fusion Framework for Video-Based Person Recognition
    Fangtao Li, Wenzhe Wang, Zihe Liu, Haoran Wang, Chenghao Yan, Bin Wu
    Pages 75-86
  8. An Adaptive Face-Iris Multimodal Identification System Based on Quality Assessment Network
    Zhengding Luo, Qinghua Gu, Guoxiong Su, Yuesheng Zhu, Zhiqiang Bai
    Pages 87-98
  9. Thermal Face Recognition Based on Multi-scale Image Synthesis
    Wei-Ta Chu, Ping-Shen Huang
    Pages 99-110
  10. Contrastive Learning in Frequency Domain for Non-I.I.D. Image Classification
    Huan Shao, Zhaoquan Yuan, Xiao Peng, Xiao Wu
    Pages 111-122
  11. Group Activity Recognition by Exploiting Position Distribution and Appearance Relation
    Duoxuan Pei, Annan Li, Yunhong Wang
    Pages 123-135
  12. Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization
    Fan Zhang, Meng Li, Guisheng Zhai, Yizhao Liu
    Pages 136-147
  13. Dense Attention-Guided Network for Boundary-Aware Salient Object Detection
    Zhe Zhang, Junhui Ma, Panpan Xu, Wencheng Wang
    Pages 148-161
  14. Generative Image Inpainting by Hybrid Contextual Attention Network
    Zhijiao Xiao, Donglun Li
    Pages 162-173
  15. Atypical Lyrics Completion Considering Musical Audio Signals
    Kento Watanabe, Masataka Goto
    Pages 174-186
  16. Improving Supervised Cross-modal Retrieval with Semantic Graph Embedding
    Changting Feng, Dagang Li, Jingwei Zheng
    Pages 187-199
  17. Confidence-Based Global Attention Guided Network for Image Inpainting
    Zhilin Huang, Chujun Qin, Lei Li, Ruixin Liu, Yuesheng Zhu
    Pages 200-212
  18. Multi-task Deep Learning for No-Reference Screen Content Image Quality Assessment
    Rui Gao, Ziqing Huang, Shiguang Liu
    Pages 213-226
  19. Language Person Search with Pair-Based Weighting Loss
    Peng Zhang, Deqiang Ouyang, Chunlin Jiang, Jie Shao
    Pages 227-239
  20. DeepFusion: Deep Ensembles for Domain Independent System Fusion
    Mihai Gabriel Constantin, Liviu-Daniel Ştefan, Bogdan Ionescu
    Pages 240-252
  21. Illuminate Low-Light Image via Coarse-to-fine Multi-level Network
    Yansheng Qiu, Jun Chen, Xiao Wang, Kui Jang
    Pages 253-264
  22. MM-Net: Learning Adaptive Meta-metric for Few-Shot Biometric Recognition
    Qinghua Gu, Zhengding Luo, Wanyu Zhao, Yuesheng Zhu
    Pages 265-277
  23. A Sentiment Similarity-Oriented Attention Model with Multi-task Learning for Text-Based Emotion Recognition
    Yahui Fu, Lili Guo, Longbiao Wang, Zhilei Liu, Jiaxing Liu, Jianwu Dang
    Pages 278-289
  24. Locating Visual Explanations for Video Question Answering
    Xuanwei Chen, Rui Liu, Xiaomeng Song, Yahong Han
    Pages 290-302
  25. Global Cognition and Local Perception Network for Blind Image Deblurring
    Chuanfa Zhang, Wei Zhang, Feiyu Chen, Yiting Cheng, Shuyong Gao, Wenqiang Zhang
    Pages 303-314
  26. Multi-grained Fusion for Conditional Image Retrieval
    Yating Liu, Yan Lu
    Pages 315-327
  27. A Hybrid Music Recommendation Algorithm Based on Attention Mechanism
    Weite Feng, Tong Li, Haiyang Yu, Zhen Yang
    Pages 328-339
  28. Few-Shot Learning with Unlabeled Outlier Exposure
    Haojie Wang, Jieya Lian, Shengwu Xiong
    Pages 340-351
  29. Fine-Grained Video Deblurring with Event Camera
    Limeng Zhang, Hongguang Zhang, Chenyang Zhu, Shasha Guo, Jihua Chen, Lei Wang
    Pages 352-364
  30. Discriminative and Selective Pseudo-Labeling for Domain Adaptation
    Fei Wang, Youdong Ding, Huan Liang, Jing Wen
    Pages 365-377
  31. Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-Instance Normalization for Semantic Image Synthesis
    Jia Long, Hongtao Lu
    Pages 378-390
  32. Robust Multispectral Pedestrian Detection via Uncertainty-Aware Cross-Modal Learning
    Sungjune Park, Jung Uk Kim, Yeon Gyun Kim, Sang-Keun Moon, Yong Man Ro
    Pages 391-402
  33. Time-Dependent Body Gesture Representation for Video Emotion Recognition
    Jie Wei, Xinyu Yang, Yizhuo Dong
    Pages 403-416
  34. MusiCoder: A Universal Music-Acoustic Encoder Based on Transformer
    Yilun Zhao, Jia Guo
    Pages 417-429
  35. DANet: Deformable Alignment Network for Video Inpainting
    Xutong Lu, Jianfu Zhang
    Pages 430-442
  36. Deep Centralized Cross-modal Retrieval
    Zhenyu Wen, Aimin Feng
    Pages 443-455
  37. Shot Boundary Detection Through Multi-stage Deep Convolution Neural Network
    Tingting Wang, Na Feng, Junqing Yu, Yunfeng He, Yangliu Hu, Yi-Ping Phoebe Chen
    Pages 456-468
  38. Towards Optimal Multirate Encoding for HTTP Adaptive Streaming
    Hadi Amirpour, Ekrem Çetinkaya, Christian Timmerer, Mohammad Ghanbari
    Pages 469-480
  39. Fast Mode Decision Algorithm for Intra Encoding of the 3rd Generation Audio Video Coding Standard
    Shengyuan Wu, Zhenyu Wang, Yangang Cai, Ronggang Wang
    Pages 481-492
  40. Graph Structure Reasoning Network for Face Alignment and Reconstruction
    Xing Wang, Xinyu Li, Suping Wu
    Pages 493-505
  41. Game Input with Delay – A Model of the Time Distribution for Selecting a Moving Target with a Mouse
    Shengmei Liu, Mark Claypool
    Pages 506-518
  42. Unsupervised Temporal Attention Summarization Model for User Created Videos
    Min Hu, Ruimin Hu, Xiaocheng Wang, Rui Sheng
    Pages 519-530
  43. Learning from the Negativity: Deep Negative Correlation Meta-Learning for Adversarial Image Classification
    Wenbo Zheng, Lan Yan, Fei-Yue Wang, Chao Gou
    Pages 531-540
  44. Learning 3D-Craft Generation with Predictive Action Neural Network
    Ze-yu Liu, Jian-wei Liu, Xin Zuo, Weimin Li
    Pages 541-553
  45. Unsupervised Multi-shot Person Re-identification via Dynamic Bi-directional Normalized Sparse Representation
    Xiaobao Li, Wen Wang, Qingyong Li, Lijun Guo
    Pages 554-566
  46. Classifier Belief Optimization for Visual Categorization
    Gang Yang, Xirong Li
    Pages 567-579
  47. Fine-Grained Generation for Zero-Shot Learning
    Weimin Sun, Jieping Xu, Gang Yang
    Pages 580-591
  48. Fine-Grained Image-Text Retrieval via Complementary Feature Learning
    Min Zheng, Yantao Jia, Huajie Jiang
    Pages 592-604
  49. Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations
    Luca Rossetto, Werner Bailer, Abraham Bernstein
    Pages 605-616
  50. Learning Multi-level Interaction Relations and Feature Representations for Group Activity Recognition
    Lihua Lu, Yao Lu, Shunzhou Wang
    Pages 617-628
  51. A Structured Feature Learning Model for Clothing Keypoints Localization
    Ruhan He, Yuyi Su, Tao Peng, Jia Chen, Zili Zhang, Xinrong Hu
    Pages 629-640
  52. Automatic Pose Quality Assessment for Adaptive Human Pose Refinement
    Gang Chu, Chi Xie, Shuang Liang
    Pages 641-652
  53. Deep Attributed Network Embedding with Community Information
    Li Xue, Wenbin Yao, Yamei Xia, Xiaoyong Li
    Pages 653-665
  54. An Acceleration Framework for Super-Resolution Network via Region Difficulty Self-adaption
    Zhenfang Guo, Yuyao Ye, Yang Zhao, Ronggang Wang
    Pages 666-677
  55. Spatial Gradient Guided Learning and Semantic Relation Transfer for Facial Landmark Detection
    Jian Wang, Yaoyi Li, Hongtao Lu
    Pages 678-690
  56. DVRCNN: Dark Video Post-processing Method for VVC
    Donghui Feng, Yiwei Zhang, Chen Zhu, Han Zhang, Li Song
    Pages 691-703
  57. An Efficient Image Transmission Pipeline for Multimedia Services
    Zeyu Wang
    Pages 704-715
  58. Gaussian Mixture Model Based Semi-supervised Sparse Representation for Face Recognition
    Xinxin Shan, Ying Wen
    Pages 716-727
  1. MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting
    Yani Zhang, Huailin Zhao, Fangbo Zhou, Qing Zhang, Yanjiao Shi, Lanjun Liang
    Pages 1-12
  2. Tropical Cyclones Tracking Based on Satellite Cloud Images: Database and Comprehensive Study
    Cheng Huang, Sixian Chan, Cong Bai, Weilong Ding, Jinglin Zhang
    Pages 13-25
  3. Image Registration Improved by Generative Adversarial Networks
    Shiyan Jiang, Ci Wang, Chang Huang
    Pages 26-35
  4. Deep 3D Modeling of Human Bodies from Freehand Sketching
    Kaizhi Yang, Jintao Lu, Siyu Hu, Xuejin Chen
    Pages 36-48
  5. Two-Stage Real-Time Multi-object Tracking with Candidate Selection
    Fan Wang, Lei Luo, En Zhu
    Pages 49-61
  6. Tell as You Imagine: Sentence Imageability-Aware Image Captioning
    Kazuki Umemura, Marc A. Kastner, Ichiro Ide, Yasutomo Kawanishi, Takatsugu Hirayama, Keisuke Doman et al.
    Pages 62-73
  7. Deep Face Swapping via Cross-Identity Adversarial Training
    Shuhui Yang, Han Xue, Jun Ling, Li Song, Rong Xie
    Pages 74-86
  8. Res2-Unet: An Enhanced Network for Generalized Nuclear Segmentation in Pathological Images
    Shuai Zhao, Xuanya Li, Zhineng Chen, Chang Liu, Changgen Peng
    Pages 87-98
  9. Automatic Diagnosis of Glaucoma on Color Fundus Images Using Adaptive Mask Deep Network
    Gang Yang, Fan Li, Dayong Ding, Jun Wu, Jie Xu
    Pages 99-110
  10. Initialize with Mask: For More Efficient Federated Learning
    Zirui Zhu, Lifeng Sun
    Pages 111-120
  11. Unsupervised Gaze: Exploration of Geometric Constraints for 3D Gaze Estimation
    Yawen Lu, Yuxing Wang, Yuan Xin, Di Wu, Guoyu Lu
    Pages 121-133
  12. Median-Pooling Grad-CAM: An Efficient Inference Level Visual Explanation for CNN Networks in Remote Sensing Image Classification
    Wei Song, Shuyuan Dai, Dongmei Huang, Jinling Song, Liotta Antonio
    Pages 134-146
  13. Multi-granularity Recurrent Attention Graph Neural Network for Few-Shot Learning
    Xu Zhang, Youjia Zhang, Zuyu Zhang
    Pages 147-158
  14. EEG Emotion Recognition Based on Channel Attention for E-Healthcare Applications
    Xu Zhang, Tianzhi Du, Zuyu Zhang
    Pages 159-169
  15. The MovieWall: A New Interface for Browsing Large Video Collections
    Marij Nefkens, Wolfgang Hürst
    Pages 170-182
  16. Keystroke Dynamics as Part of Lifelogging
    Alan F. Smeaton, Naveen Garaga Krishnamurthy, Amruth Hebbasuru Suryanarayana
    Pages 183-195
  17. HTAD: A Home-Tasks Activities Dataset with Wrist-Accelerometer and Audio Features
    Enrique Garcia-Ceja, Vajira Thambawita, Steven A. Hicks, Debesh Jha, Petter Jakobsen, Hugo L. Hammer et al.
    Pages 196-205
  18. MNR-Air: An Economic and Dynamic Crowdsourcing Mechanism to Collect Personal Lifelog and Surrounding Environment Dataset. A Case Study in Ho Chi Minh City, Vietnam
    Dang-Hieu Nguyen, Tan-Loc Nguyen-Tai, Minh-Tam Nguyen, Thanh-Binh Nguyen, Minh-Son Dao
    Pages 206-217
  19. Kvasir-Instrument: Diagnostic and Therapeutic Tool Segmentation Dataset in Gastrointestinal Endoscopy
    Debesh Jha, Sharib Ali, Krister Emanuelsen, Steven A. Hicks, Vajira Thambawita, Enrique Garcia-Ceja et al.
    Pages 218-229
  20. CatMeows: A Publicly-Available Dataset of Cat Vocalizations
    Luca A. Ludovico, Stavros Ntalampiras, Giorgio Presti, Simona Cannas, Monica Battini, Silvana Mattiello
    Pages 230-243
  21. Search and Explore Strategies for Interactive Analysis of Real-Life Image Collections with Unknown and Unique Categories
    Floris Gisolf, Zeno Geradts, Marcel Worring
    Pages 244-255
  22. Graph-Based Indexing and Retrieval of Lifelog Data
    Manh-Duy Nguyen, Binh T. Nguyen, Cathal Gurrin
    Pages 256-267
  23. On Fusion of Learned and Designed Features for Video Data Analytics
    Marek Dobranský, Tomáš Skopal
    Pages 268-280
  24. XQM: Interactive Learning on Mobile Phones
    Alexandra M. Bagi, Kim I. Schild, Omar Shahbaz Khan, Jan Zahálka, Björn Þór Jónsson
    Pages 281-293
  25. A Multimodal Tensor-Based Late Fusion Approach for Satellite Image Search in Sentinel 2 Images
    Ilias Gialampoukidis, Anastasia Moumtzidou, Marios Bakratsas, Stefanos Vrochidis, Ioannis Kompatsiaris
    Pages 294-306
  26. Canopy Height Estimation from Spaceborne Imagery Using Convolutional Encoder-Decoder
    Leonidas Alagialoglou, Ioannis Manakos, Marco Heurich, Jaroslav Červenka, Anastasios Delopoulos
    Pages 307-317
  27. Implementation of a Random Forest Classifier to Examine Wildfire Predictive Modelling in Greece Using Diachronically Collected Fire Occurrence and Fire Mapping Data
    Alexis Apostolakis, Stella Girtsou, Charalampos Kontoes, Ioannis Papoutsis, Michalis Tsoutsos
    Pages 318-329
  28. Mobile eHealth Platform for Home Monitoring of Bipolar Disorder
    Joan Codina-Filbà, Sergio Escalera, Joan Escudero, Coen Antens, Pau Buch-Cardona, Mireia Farrús
    Pages 330-341
  29. Multimodal Sensor Data Analysis for Detection of Risk Situations of Fragile People in @home Environments
    Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amieva, Laura Middleton, Max Bergelt
    Pages 342-353
  30. Towards the Development of a Trustworthy Chatbot for Mental Health Applications
    Matthias Kraus, Philip Seldschopf, Wolfgang Minker
    Pages 354-366
  31. Fusion of Multimodal Sensor Data for Effective Human Action Recognition in the Service of Medical Platforms
    Panagiotis Giannakeris, Athina Tsanousa, Thanasis Mavropoulos, Georgios Meditskos, Konstantinos Ioannidis, Stefanos Vrochidis et al.
    Pages 367-378
  32. SpotifyGraph: Visualisation of User’s Preferences in Music
    Pavel Gajdusek, Ladislav Peska
    Pages 379-384
  33. A System for Interactive Multimedia Retrieval Evaluations
    Luca Rossetto, Ralph Gasser, Loris Sauter, Abraham Bernstein, Heiko Schuldt
    Pages 385-390
  34. SQL-Like Interpretable Interactive Video Search
    Jiaxin Wu, Phuong Anh Nguyen, Zhixin Ma, Chong-Wah Ngo
    Pages 391-397
  35. VERGE in VBS 2021
    Stelios Andreadis, Anastasia Moumtzidou, Konstantinos Gkountakos, Nick Pantelidis, Konstantinos Apostolidis, Damianos Galanopoulos et al.
    Pages 398-404
  36. NoShot Video Browser at VBS2021
    Christof Karisch, Andreas Leibetseder, Klaus Schoeffmann
    Pages 405-409
  37. Exquisitor at the Video Browser Showdown 2021: Relationships Between Semantic Classifiers
    Omar Shahbaz Khan, Björn Þór Jónsson, Mathias Larsen, Liam Poulsen, Dennis C. Koelma, Stevan Rudinac et al.
    Pages 410-416
  38. VideoGraph – Towards Using Knowledge Graphs for Interactive Video Retrieval
    Luca Rossetto, Matthias Baumgartner, Narges Ashena, Florian Ruosch, Romana Pernisch, Lucien Heitz et al.
    Pages 417-422
  39. IVIST: Interactive Video Search Tool in VBS 2021
    Yoonho Lee, Heeju Choi, Sungjune Park, Yong Man Ro
    Pages 423-428
  40. Video Search with Collage Queries
    Jakub Lokoč, Jana Bátoryová, Dominik Smrž, Marek Dobranský
    Pages 429-434
  41. Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr
    Silvan Heller, Ralph Gasser, Cristina Illi, Maurizio Pasquinelli, Loris Sauter, Florian Spiess et al.
    Pages 435-440
  42. Competitive Interactive Video Retrieval in Virtual Reality with vitrivr-VR
    Florian Spiess, Ralph Gasser, Silvan Heller, Luca Rossetto, Loris Sauter, Heiko Schuldt
    Pages 441-447
  43. An Interactive Video Search Tool: A Case Study Using the V3C1 Dataset
    Abdullah Alfarrarjeh, Jungwon Yoon, Seon Ho Kim, Amani Abu Jabal, Akarsh Nagaraj, Chinmayee Siddaramaiah
    Pages 448-454
  44. Less is More – diveXplore 5.0 at VBS 2021
    Andreas Leibetseder, Klaus Schoeffmann
    Pages 455-460
  45. SOMHunter V2 at Video Browser Showdown 2021
    Patrik Veselý, František Mejzlík, Jakub Lokoč
    Pages 461-466
  46. W2VV++ BERT Model at VBS 2021
    Ladislav Peška, Gregor Kovalčík, Tomáš Souček, Vít Škrhák, Jakub Lokoč
    Pages 467-472
  47. VISIONE at Video Browser Showdown 2021
    Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo et al.
    Pages 473-478
  48. IVOS – The ITEC Interactive Video Object Search System at VBS2021
    Anja Ressmann, Klaus Schoeffmann
    Pages 479-483
  49. Video Search with Sub-Image Keyword Transfer Using Existing Image Archives
    Nico Hezel, Konstantin Schall, Klaus Jung, Kai Uwe Barthel
    Pages 484-489
  50. A VR Interface for Browsing Visual Spaces at VBS2021
    Ly-Duyen Tran, Manh-Duy Nguyen, Thao-Nhu Nguyen, Graham Healy, Annalina Caputo, Binh T. Nguyen et al.
    Pages 490-495