LipPrint: Using Lip Movements as a Silent Password
Main Article Content
Abstract
Authentication and confidentiality are critical elements in many systems, and passwords are used to protect these systems. One way to enter the password into the system is via voice. The problem with this type of password is that people listen to it easily. The general purpose of this study was to make the password based on lip movement instead of sound. To address this problem, a visual speech recognition (VSR) system was proposed that detects speech from lip movement as a silent password. The proposed system consists of preprocessing the captured video via the proposed algorithms and feature extraction and classification of spoken digits via a modified VGG16. Fairness, robustness, and privacy are also considered in the proposed system. Research has focused on Arabic digits because there is little research on Arabic visual speech recognition. The reported results of the model show remarkable performance, with a validation accuracy of 96.74%. It was concluded that the proposed system, which relies on analysing lip movement instead of sound, increases security by preventing others from hearing the spoken password.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
M. Ezz, A. M. Mostafa, and A. A. Nasr, “A silent password recognition framework based on lip analysis,” IEEE Access, vol. 8, pp. 55354–55371, 2020, doi: 10.1109/ACCESS.2020.2982359.
M. Hao, M. Mamut, N. Yadikar, A. Aysa, and K. Ubul, “A survey of research on lipreading technology,” IEEE Access, vol. 8, pp. 204518–204544, 2020, doi: 10.1109/ACCESS.2020.3036865.
J. S. Chung and A. Zisserman, “Learning to lip read words by watching videos,” Computer Vision and Image Understanding, vol. 173, pp. 76–85, 2018.
V. Estellers and J.-P. Thiran, “Multi-pose lipreading and audio-visual speech recognition,” EURASIP Journal on Advances in Signal Processing, vol. 2012, pp. 1–23, 2012.
N. S. Ghadban, J. Alkheir, and M. Saii, “Performance evaluation hybrid visual speech recognition features on Arabic isolated words,” International Journal of Computer Science Trends and Technology (IJCST), vol. 5, no. 5, pp. 47–50, 2017.
A. Al-Ghanim, A.-O. Nourah, R. Al-Haidary, S. Al-Zeer, S. Altammami, and H. A. Mahmoud, “I See What You Say (ISWYS): Arabic lip reading system,” in 2013 International Conference on Current Trends in Information Technology (CTIT), IEEE, Dec. 2013, pp. 11–17. doi: 10.1109/CTIT.2013.6749470.
N. F. Aljohani and E. S. Jaha, “Visual Lip-Reading for Quranic Arabic Alphabets and Words Using Deep Learning.,” Computer Systems Science & Engineering, vol. 46, no. 3, 2023.
L. A. Elrefaei, T. Q. Alhassan, and S. S. Omar, “An Arabic visual dataset for visual speech recognition,” Procedia Computer Science, vol. 163, pp. 400–409, 2019, doi: 10.1016/j.procs.2019.12.122.
“Machine Learning-Based Detection of Smartphone Malware: Challenges and Solutions,” Mesopotamian Journal of Cyber Security, pp. 134–157, Aug. 2023, doi: 10.58496/MJCS/2023/017.
S. S. M, H. D, and R. R. Vallem, “Cyber Security System Based on Machine Learning Using Logistic Decision Support Vector,” Mesopotamian Journal of Cyber Security, pp. 64–72, Mar. 2023, doi: 10.58496/MJCS/2023/011.
J. J. Hephzipah, R. R. Vallem, M. S. Sheela, and G. Dhanalakshmi, “An efficient cyber security system based on flow-based anomaly detection using Artificial neural network,” Mesopotamian Journal of Cyber Security, pp. 48–56, Mar. 2023, doi: 10.58496/MJCS/2023/009.
N. A. Bajao and J. Sarucam, “Threats Detection in the Internet of Things Using Convolutional neural networks, long short-term memory, and gated recurrent units,” Mesopotamian Journal of Cyber Security, pp. 22–29, Feb. 2023, doi: 10.58496/MJCS/2023/005.
T. Muhammad and H. Ghafory, “SQL Injection Attack Detection Using Machine Learning Algorithm,” Mesopotamian Journal of Cyber Security, pp. 5–17, Feb. 2022, doi: 10.58496/MJCS/2022/002.
M. A. Fadhel et al., “Comprehensive Systematic Review of Information Fusion Methods in Smart Cities and Urban Environments,” Information Fusion, p. 102317, 2024. doi: https://doi.org/10.1016/j.inffus.2024.102317
N. H. Alsulami, A. T. Jamal, and L. A. Elrefaei, “Deep learning-based approach for Arabic visual speech recognition,” Computers, Materials & Continua, vol. 71, no. 1, pp. 85–108, 2022, doi: https://doi.org/10.32604/cmc.2022.019450.
D. Khafaga, “Novel algorithm utilizing deep learning for enhanced Arabic lip reading recognition,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 11, pp. 811–816, 2021. doi: 10.14569/IJACSA.2021.0121192
W. Dweik, S. Altorman, and S. Ashour, “Read my lips: artificial intelligence word-level arabic lipreading system,” Egyptian Informatics Journal, vol. 23, no. 4, pp. 1–12, Dec. 2022, doi: 10.1016/j.eij.2022.06.001.
K. I. Alsaif and N. S. Allella, “Lips Reading Spoken Arabic Word Based on The Geometric Shape Features of The Lip,” pp. 624–634, 2023. doi: 10.32628/IJSRST2310164.
A. Baaloul, N. Benblidia, F. Z. Reguieg, M. Bouakkaz, and H. Felouat, “An arabic visual speech recognition framework with CNN and vision transformers for lipreading,” Multimedia Tools Application, pp. 1–35, 2024, doi: https://doi.org/10.1007/s11042-024-18237-5.
M. Cooke, J. Barker, S. Cunningham, and X. Shao, “An audio-visual corpus for speech perception and automatic speech recognition,” The Journal of the Acoustical Society of America, vol. 120, no. 5, pp. 2421–2424, 2006, doi: https://doi.org/10.1121/1.2229005.
Ü. Atila and F. Sabaz, “Turkish lip-reading using Bi-LSTM and deep learning models,” Engineering Science and Technology, an International Journal, vol. 35, Nov. 2022, doi: 10.1016/j.jestch.2022.101206.
M.-J. Chen and A. C. Bovik, “Fast structural similarity index algorithm,” J Real Time Image Process, vol. 6, no. 4, pp. 281–287, Dec. 2011, doi: 10.1007/s11554-010-0170-9.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2014. doi: https://doi.org/10.48550/arXiv.1409.1556.
I. Anina, Z. Zhou, G. Zhao, and M. Pietikäinen, “Ouluvs2: a multi-view audiovisual database for non-rigid mouth motion analysis,”IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), IEEE, 2015, pp. 1–5. doi: 10.1109/FG.2015.7163155.
E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, “CUAVE: a new audio-visual database for multimodal human-computer interface research,”IEEE International conference on acoustics, speech, and signal processing, IEEE, 2002. doi: 10.1109/ICASSP.2002.5745028.
E. Bailly-Bailliére et al., “The BANCA database and evaluation protocol,” in Audio-and Video-Based Biometric Person Authentication: 4th International Conference, Springer, 2003, pp. 625–638.
L. Alzubaidi et al., “Towards risk-free trustworthy artificial intelligence: Significance and requirements,” International Journal of Intelligent Systems, vol. 2023, 2023. doi: https://doi.org/10.1155/2023/4459198
M. A. Fadhel, L. Alzubaidi, Y. Gu, J. Santamaría, and Y. Duan, “Real-time diabetic foot ulcer classification based on deep learning & parallel hardware computational tools,” Multimedia Tools Application, pp. 1–26, 2024. doi: https://doi.org/10.1007/s11042-024-18304-x
F. K. H. Mihna, M. A. Habeeb, Y. L. Khaleel, Y. H. Ali, and L. A. E. Al-saeedi, “Using Information Technology for Comprehensive Analysis and Prediction in Forensic Evidence,” Mesopotamian Journal of CyberSecurity, vol. 2024, pp. 4–16, Mar. 2024, doi: 10.58496/MJCS/2024/002.
L. Alzubaidi et al., “Reliable deep learning framework for the ground penetrating radar data to locate the horizontal variation in levee soil compaction,” Engineering Application of Artificial Intelligence, vol. 129, p. 107627, 2024. doi: https://doi.org/10.1016/j.engappai.2023.107627
A. A. Saihood, M. A. Hasan, M. A. Fadhel, L. Alzubaid, A. Gupta, and Y. Gu, “Multiside graph neural network-based attention for local co-occurrence features fusion in lung nodule classification,” Expert System with Applications, vol. 252, p. 124149, 2024. doi: https://doi.org/10.1016/j.eswa.2024.124149
L. Alzubaidi et al., “MEFF–A model ensemble feature fusion approach for tackling adversarial attacks in medical imaging,” Intelligent Systems with Applications, vol. 22, p. 200355, 2024. doi: https://doi.org/10.1016/j.iswa.2024.200355
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp. 2818–2826. doi: https://doi.org/10.48550/arXiv.1512.00567
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018, pp. 4510–4520. doi: https://doi.org/10.48550/arXiv.1801.04381
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 4700–4708.
B. Zoph, V. Vasudevan, J. Shlens, and Q. V Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018, pp. 8697–8710. doi: https://doi.org/10.48550/arXiv.1707.07012
F. Chollet, “Xception: deep learning with depthwise separable convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1251–1258. doi: https://doi.org/10.48550/arXiv.1610.02357