×
The proceedings of the 27th International Conference, MMM2021 in Prague, Czech Republic have been published by Springer as LNCS 12573 and are accessible through the following links:
Published papers
Crossed-Time Delay Neural Network for Speaker Recognition
Liang Chen, Yanchun Liang, Xiaoshu Shi, You Zhou, Chunguo Wu
Pages 1-10
An Asymmetric Two-Sided Penalty Term for CT-GAN
Huan Zhao, Yu Wang, Tingting Li, Yuqing Zhao
Pages 11-23
Fast Discrete Matrix Factorization Hashing for Large-Scale Cross-Modal Retrieval
Huan Zhao, Xiaolin She, Song Wang, Kaili Ma
Pages 24-36
Fast Optimal Transport Artistic Style Transfer
Ting Qiu, Bingbing Ni, Ziang Liu, Xuanhong Chen
Pages 37-49
Stacked Sparse Autoencoder for Audio Object Coding
Yulin Wu, Ruimin Hu, Xiaochen Wang, Chenhao Hu, Gang Li
Pages 50-61
A Collaborative Multi-modal Fusion Method Based on Random Variational Information Bottleneck for Gesture Recognition
Yang Gu, Yajie Li, Yiqiang Chen, Jiwei Wang, Jianfei Shen
Pages 62-74
Frame Aggregation and Multi-modal Fusion Framework for Video-Based Person Recognition
Fangtao Li, Wenzhe Wang, Zihe Liu, Haoran Wang, Chenghao Yan, Bin Wu
Pages 75-86
An Adaptive Face-Iris Multimodal Identification System Based on Quality Assessment Network
Zhengding Luo, Qinghua Gu, Guoxiong Su, Yuesheng Zhu, Zhiqiang Bai
Pages 87-98
Thermal Face Recognition Based on Multi-scale Image Synthesis
Wei-Ta Chu, Ping-Shen Huang
Pages 99-110
Contrastive Learning in Frequency Domain for Non-I.I.D. Image Classification
Huan Shao, Zhaoquan Yuan, Xiao Peng, Xiao Wu
Pages 111-122
Group Activity Recognition by Exploiting Position Distribution and Appearance Relation
Duoxuan Pei, Annan Li, Yunhong Wang
Pages 123-135
Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization
Fan Zhang, Meng Li, Guisheng Zhai, Yizhao Liu
Pages 136-147
Dense Attention-Guided Network for Boundary-Aware Salient Object Detection
Zhe Zhang, Junhui Ma, Panpan Xu, Wencheng Wang
Pages 148-161
Generative Image Inpainting by Hybrid Contextual Attention Network
Zhijiao Xiao, Donglun Li
Pages 162-173
Atypical Lyrics Completion Considering Musical Audio Signals
Kento Watanabe, Masataka Goto
Pages 174-186
Improving Supervised Cross-modal Retrieval with Semantic Graph Embedding
Changting Feng, Dagang Li, Jingwei Zheng
Pages 187-199
Confidence-Based Global Attention Guided Network for Image Inpainting
Zhilin Huang, Chujun Qin, Lei Li, Ruixin Liu, Yuesheng Zhu
Pages 200-212
Multi-task Deep Learning for No-Reference Screen Content Image Quality Assessment
Rui Gao, Ziqing Huang, Shiguang Liu
Pages 213-226
Language Person Search with Pair-Based Weighting Loss
Peng Zhang, Deqiang Ouyang, Chunlin Jiang, Jie Shao
Pages 227-239
DeepFusion: Deep Ensembles for Domain Independent System Fusion
Mihai Gabriel Constantin, Liviu-Daniel Ştefan, Bogdan Ionescu
Pages 240-252
Illuminate Low-Light Image via Coarse-to-fine Multi-level Network
Yansheng Qiu, Jun Chen, Xiao Wang, Kui Jang
Pages 253-264
MM-Net: Learning Adaptive Meta-metric for Few-Shot Biometric Recognition
Qinghua Gu, Zhengding Luo, Wanyu Zhao, Yuesheng Zhu
Pages 265-277
A Sentiment Similarity-Oriented Attention Model with Multi-task Learning for Text-Based Emotion Recognition
Yahui Fu, Lili Guo, Longbiao Wang, Zhilei Liu, Jiaxing Liu, Jianwu Dang
Pages 278-289
Locating Visual Explanations for Video Question Answering
Xuanwei Chen, Rui Liu, Xiaomeng Song, Yahong Han
Pages 290-302
Global Cognition and Local Perception Network for Blind Image Deblurring
Chuanfa Zhang, Wei Zhang, Feiyu Chen, Yiting Cheng, Shuyong Gao, Wenqiang Zhang
Pages 303-314
Multi-grained Fusion for Conditional Image Retrieval
Yating Liu, Yan Lu
Pages 315-327
A Hybrid Music Recommendation Algorithm Based on Attention Mechanism
Weite Feng, Tong Li, Haiyang Yu, Zhen Yang
Pages 328-339
Few-Shot Learning with Unlabeled Outlier Exposure
Haojie Wang, Jieya Lian, Shengwu Xiong
Pages 340-351
Fine-Grained Video Deblurring with Event Camera
Limeng Zhang, Hongguang Zhang, Chenyang Zhu, Shasha Guo, Jihua Chen, Lei Wang
Pages 352-364
Discriminative and Selective Pseudo-Labeling for Domain Adaptation
Fei Wang, Youdong Ding, Huan Liang, Jing Wen
Pages 365-377
Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-Instance Normalization for Semantic Image Synthesis
Jia Long, Hongtao Lu
Pages 378-390
Robust Multispectral Pedestrian Detection via Uncertainty-Aware Cross-Modal Learning
Sungjune Park, Jung Uk Kim, Yeon Gyun Kim, Sang-Keun Moon, Yong Man Ro
Pages 391-402
Time-Dependent Body Gesture Representation for Video Emotion Recognition
Jie Wei, Xinyu Yang, Yizhuo Dong
Pages 403-416
MusiCoder: A Universal Music-Acoustic Encoder Based on Transformer
Yilun Zhao, Jia Guo
Pages 417-429
DANet: Deformable Alignment Network for Video Inpainting
Xutong Lu, Jianfu Zhang
Pages 430-442
Deep Centralized Cross-modal Retrieval
Zhenyu Wen, Aimin Feng
Pages 443-455
Shot Boundary Detection Through Multi-stage Deep Convolution Neural Network
Tingting Wang, Na Feng, Junqing Yu, Yunfeng He, Yangliu Hu, Yi-Ping Phoebe Chen
Pages 456-468
Towards Optimal Multirate Encoding for HTTP Adaptive Streaming
Hadi Amirpour, Ekrem Çetinkaya, Christian Timmerer, Mohammad Ghanbari
Pages 469-480
Fast Mode Decision Algorithm for Intra Encoding of the 3rd Generation Audio Video Coding Standard
Shengyuan Wu, Zhenyu Wang, Yangang Cai, Ronggang Wang
Pages 481-492
Graph Structure Reasoning Network for Face Alignment and Reconstruction
Xing Wang, Xinyu Li, Suping Wu
Pages 493-505
Game Input with Delay – A Model of the Time Distribution for Selecting a Moving Target with a Mouse
Shengmei Liu, Mark Claypool
Pages 506-518
Unsupervised Temporal Attention Summarization Model for User Created Videos
Min Hu, Ruimin Hu, Xiaocheng Wang, Rui Sheng
Pages 519-530
Learning from the Negativity: Deep Negative Correlation Meta-Learning for Adversarial Image Classification
Wenbo Zheng, Lan Yan, Fei-Yue Wang, Chao Gou
Pages 531-540
Learning 3D-Craft Generation with Predictive Action Neural Network
Ze-yu Liu, Jian-wei Liu, Xin Zuo, Weimin Li
Pages 541-553
Unsupervised Multi-shot Person Re-identification via Dynamic Bi-directional Normalized Sparse Representation
Xiaobao Li, Wen Wang, Qingyong Li, Lijun Guo
Pages 554-566
Classifier Belief Optimization for Visual Categorization
Gang Yang, Xirong Li
Pages 567-579
Fine-Grained Generation for Zero-Shot Learning
Weimin Sun, Jieping Xu, Gang Yang
Pages 580-591
Fine-Grained Image-Text Retrieval via Complementary Feature Learning
Min Zheng, Yantao Jia, Huajie Jiang
Pages 592-604
Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations
Luca Rossetto, Werner Bailer, Abraham Bernstein
Pages 605-616
Learning Multi-level Interaction Relations and Feature Representations for Group Activity Recognition
Lihua Lu, Yao Lu, Shunzhou Wang
Pages 617-628
A Structured Feature Learning Model for Clothing Keypoints Localization
Ruhan He, Yuyi Su, Tao Peng, Jia Chen, Zili Zhang, Xinrong Hu
Pages 629-640
Automatic Pose Quality Assessment for Adaptive Human Pose Refinement
Gang Chu, Chi Xie, Shuang Liang
Pages 641-652
Deep Attributed Network Embedding with Community Information
Li Xue, Wenbin Yao, Yamei Xia, Xiaoyong Li
Pages 653-665
An Acceleration Framework for Super-Resolution Network via Region Difficulty Self-adaption
Zhenfang Guo, Yuyao Ye, Yang Zhao, Ronggang Wang
Pages 666-677
Spatial Gradient Guided Learning and Semantic Relation Transfer for Facial Landmark Detection
Jian Wang, Yaoyi Li, Hongtao Lu
Pages 678-690
DVRCNN: Dark Video Post-processing Method for VVC
Donghui Feng, Yiwei Zhang, Chen Zhu, Han Zhang, Li Song
Pages 691-703
An Efficient Image Transmission Pipeline for Multimedia Services
Zeyu Wang
Pages 704-715
Gaussian Mixture Model Based Semi-supervised Sparse Representation for Face Recognition
Xinxin Shan, Ying Wen
Pages 716-727
MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting
Yani Zhang, Huailin Zhao, Fangbo Zhou, Qing Zhang, Yanjiao Shi, Lanjun Liang
Pages 1-12
Tropical Cyclones Tracking Based on Satellite Cloud Images: Database and Comprehensive Study
Cheng Huang, Sixian Chan, Cong Bai, Weilong Ding, Jinglin Zhang
Pages 13-25
Image Registration Improved by Generative Adversarial Networks
Shiyan Jiang, Ci Wang, Chang Huang
Pages 26-35
Deep 3D Modeling of Human Bodies from Freehand Sketching
Kaizhi Yang, Jintao Lu, Siyu Hu, Xuejin Chen
Pages 36-48
Two-Stage Real-Time Multi-object Tracking with Candidate Selection
Fan Wang, Lei Luo, En Zhu
Pages 49-61
Tell as You Imagine: Sentence Imageability-Aware Image Captioning
Kazuki Umemura, Marc A. Kastner, Ichiro Ide, Yasutomo Kawanishi, Takatsugu Hirayama, Keisuke Doman et al.
Pages 62-73
Deep Face Swapping via Cross-Identity Adversarial Training
Shuhui Yang, Han Xue, Jun Ling, Li Song, Rong Xie
Pages 74-86
Res2-Unet: An Enhanced Network for Generalized Nuclear Segmentation in Pathological Images
Shuai Zhao, Xuanya Li, Zhineng Chen, Chang Liu, Changgen Peng
Pages 87-98
Automatic Diagnosis of Glaucoma on Color Fundus Images Using Adaptive Mask Deep Network
Gang Yang, Fan Li, Dayong Ding, Jun Wu, Jie Xu
Pages 99-110
Initialize with Mask: For More Efficient Federated Learning
Zirui Zhu, Lifeng Sun
Pages 111-120
Unsupervised Gaze: Exploration of Geometric Constraints for 3D Gaze Estimation
Yawen Lu, Yuxing Wang, Yuan Xin, Di Wu, Guoyu Lu
Pages 121-133
Median-Pooling Grad-CAM: An Efficient Inference Level Visual Explanation for CNN Networks in Remote Sensing Image Classification
Wei Song, Shuyuan Dai, Dongmei Huang, Jinling Song, Liotta Antonio
Pages 134-146
Multi-granularity Recurrent Attention Graph Neural Network for Few-Shot Learning
Xu Zhang, Youjia Zhang, Zuyu Zhang
Pages 147-158
EEG Emotion Recognition Based on Channel Attention for E-Healthcare Applications
Xu Zhang, Tianzhi Du, Zuyu Zhang
Pages 159-169
The MovieWall: A New Interface for Browsing Large Video Collections
Marij Nefkens, Wolfgang Hürst
Pages 170-182
Keystroke Dynamics as Part of Lifelogging
Alan F. Smeaton, Naveen Garaga Krishnamurthy, Amruth Hebbasuru Suryanarayana
Pages 183-195
HTAD: A Home-Tasks Activities Dataset with Wrist-Accelerometer and Audio Features
Enrique Garcia-Ceja, Vajira Thambawita, Steven A. Hicks, Debesh Jha, Petter Jakobsen, Hugo L. Hammer et al.
Pages 196-205
MNR-Air: An Economic and Dynamic Crowdsourcing Mechanism to Collect Personal Lifelog and Surrounding Environment Dataset. A Case Study in Ho Chi Minh City, Vietnam
Dang-Hieu Nguyen, Tan-Loc Nguyen-Tai, Minh-Tam Nguyen, Thanh-Binh Nguyen, Minh-Son Dao
Pages 206-217
Kvasir-Instrument: Diagnostic and Therapeutic Tool Segmentation Dataset in Gastrointestinal Endoscopy
Debesh Jha, Sharib Ali, Krister Emanuelsen, Steven A. Hicks, Vajira Thambawita, Enrique Garcia-Ceja et al.
Pages 218-229
CatMeows: A Publicly-Available Dataset of Cat Vocalizations
Luca A. Ludovico, Stavros Ntalampiras, Giorgio Presti, Simona Cannas, Monica Battini, Silvana Mattiello
Pages 230-243
Search and Explore Strategies for Interactive Analysis of Real-Life Image Collections with Unknown and Unique Categories
Floris Gisolf, Zeno Geradts, Marcel Worring
Pages 244-255
Graph-Based Indexing and Retrieval of Lifelog Data
Manh-Duy Nguyen, Binh T. Nguyen, Cathal Gurrin
Pages 256-267
On Fusion of Learned and Designed Features for Video Data Analytics
Marek Dobranský, Tomáš Skopal
Pages 268-280
XQM: Interactive Learning on Mobile Phones
Alexandra M. Bagi, Kim I. Schild, Omar Shahbaz Khan, Jan Zahálka, Björn Þór Jónsson
Pages 281-293
A Multimodal Tensor-Based Late Fusion Approach for Satellite Image Search in Sentinel 2 Images
Ilias Gialampoukidis, Anastasia Moumtzidou, Marios Bakratsas, Stefanos Vrochidis, Ioannis Kompatsiaris
Pages 294-306
Canopy Height Estimation from Spaceborne Imagery Using Convolutional Encoder-Decoder
Leonidas Alagialoglou, Ioannis Manakos, Marco Heurich, Jaroslav Červenka, Anastasios Delopoulos
Pages 307-317
Implementation of a Random Forest Classifier to Examine Wildfire Predictive Modelling in Greece Using Diachronically Collected Fire Occurrence and Fire Mapping Data
Alexis Apostolakis, Stella Girtsou, Charalampos Kontoes, Ioannis Papoutsis, Michalis Tsoutsos
Pages 318-329
Mobile eHealth Platform for Home Monitoring of Bipolar Disorder
Joan Codina-Filbà, Sergio Escalera, Joan Escudero, Coen Antens, Pau Buch-Cardona, Mireia Farrús
Pages 330-341
Multimodal Sensor Data Analysis for Detection of Risk Situations of Fragile People in @home Environments
Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amieva, Laura Middleton, Max Bergelt
Pages 342-353
Towards the Development of a Trustworthy Chatbot for Mental Health Applications
Matthias Kraus, Philip Seldschopf, Wolfgang Minker
Pages 354-366
Fusion of Multimodal Sensor Data for Effective Human Action Recognition in the Service of Medical Platforms
Panagiotis Giannakeris, Athina Tsanousa, Thanasis Mavropoulos, Georgios Meditskos, Konstantinos Ioannidis, Stefanos Vrochidis et al.
Pages 367-378
SpotifyGraph: Visualisation of User’s Preferences in Music
Pavel Gajdusek, Ladislav Peska
Pages 379-384
A System for Interactive Multimedia Retrieval Evaluations
Luca Rossetto, Ralph Gasser, Loris Sauter, Abraham Bernstein, Heiko Schuldt
Pages 385-390
SQL-Like Interpretable Interactive Video Search
Jiaxin Wu, Phuong Anh Nguyen, Zhixin Ma, Chong-Wah Ngo
Pages 391-397
VERGE in VBS 2021
Stelios Andreadis, Anastasia Moumtzidou, Konstantinos Gkountakos, Nick Pantelidis, Konstantinos Apostolidis, Damianos Galanopoulos et al.
Pages 398-404
NoShot Video Browser at VBS2021
Christof Karisch, Andreas Leibetseder, Klaus Schoeffmann
Pages 405-409
Exquisitor at the Video Browser Showdown 2021: Relationships Between Semantic Classifiers
Omar Shahbaz Khan, Björn Þór Jónsson, Mathias Larsen, Liam Poulsen, Dennis C. Koelma, Stevan Rudinac et al.
Pages 410-416
VideoGraph – Towards Using Knowledge Graphs for Interactive Video Retrieval
Luca Rossetto, Matthias Baumgartner, Narges Ashena, Florian Ruosch, Romana Pernisch, Lucien Heitz et al.
Pages 417-422
IVIST: Interactive Video Search Tool in VBS 2021
Yoonho Lee, Heeju Choi, Sungjune Park, Yong Man Ro
Pages 423-428
Video Search with Collage Queries
Jakub Lokoč, Jana Bátoryová, Dominik Smrž, Marek Dobranský
Pages 429-434
Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr
Silvan Heller, Ralph Gasser, Cristina Illi, Maurizio Pasquinelli, Loris Sauter, Florian Spiess et al.
Pages 435-440
Competitive Interactive Video Retrieval in Virtual Reality with vitrivr-VR
Florian Spiess, Ralph Gasser, Silvan Heller, Luca Rossetto, Loris Sauter, Heiko Schuldt
Pages 441-447
An Interactive Video Search Tool: A Case Study Using the V3C1 Dataset
Abdullah Alfarrarjeh, Jungwon Yoon, Seon Ho Kim, Amani Abu Jabal, Akarsh Nagaraj, Chinmayee Siddaramaiah
Pages 448-454
Less is More – diveXplore 5.0 at VBS 2021
Andreas Leibetseder, Klaus Schoeffmann
Pages 455-460
SOMHunter V2 at Video Browser Showdown 2021
Patrik Veselý, František Mejzlík, Jakub Lokoč
Pages 461-466
W2VV++ BERT Model at VBS 2021
Ladislav Peška, Gregor Kovalčík, Tomáš Souček, Vít Škrhák, Jakub Lokoč
Pages 467-472
VISIONE at Video Browser Showdown 2021
Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo et al.
Pages 473-478
IVOS – The ITEC Interactive Video Object Search System at VBS2021
Anja Ressmann, Klaus Schoeffmann
Pages 479-483
Video Search with Sub-Image Keyword Transfer Using Existing Image Archives
Nico Hezel, Konstantin Schall, Klaus Jung, Kai Uwe Barthel
Pages 484-489
A VR Interface for Browsing Visual Spaces at VBS2021
Ly-Duyen Tran, Manh-Duy Nguyen, Thao-Nhu Nguyen, Graham Healy, Annalina Caputo, Binh T. Nguyen et al.
Pages 490-495