Evaluating End-to-End ASR for Qur'an Recitation Using Whispers in Low Resource Settings


Authors

  • Abdullah Azzam Universitas Islam Negeri Sunan Gunung Djati, Bandung, Indonesia
  • Ichsan Taufik Universitas Islam Negeri Sunan Gunung Djati, Bandung, Indonesia
  • Aldy Rialdy Atmadja Universitas Islam Negeri Sunan Gunung Djati, Bandung, Indonesia

DOI:

https://doi.org/10.47065/bulletincsr.v5i4.561

Keywords:

End-to-end ASR; Recitation of the Qur'an; Whispering Models; Low-Resource Speech Recognition; Character Error Rate

Abstract

This study investigated the use of End-to-End Automatic Speech Recognition (E2E ASR) for Qur'an recitation under low resource conditions using the Whisper model. This study follows the CRISP-DM methodology, starting with defining the research gap and preparing a curated dataset of 200 verses from Juz 30. These verses were chosen because of their short and consistent structure, allowing for efficient experimentation. Audio and transcription pairs are verified and cleaned to ensure alignment and quality. The modeling was done using Whisper in Google Colaboratory, leveraging its pre-trained architecture to reduce training time and computing costs. Evaluations use the Character Error Rate (CER) metric to measure transcription accuracy. The results showed that Whisper achieved an average CER of 0.142, corresponding to a transcription accuracy of about 85%. However, the average processing time per father is 11 seconds, almost double the time it takes for a human readout. Although Whisper provides strong accuracy for Arabic transcription, its runtime efficiency remains a challenge in real-time applications. This research contributes reproducible channels, validated datasets, and performance benchmarks for future studies of the Qur'anic ASR under computational constraints.

Downloads

Download data is not yet available.

References

A. Rifani, “BAHASA AL-QUR’AN SEBAGAI BAGIAN DALAM IJTIHADIYYAH,” 2019. [Online]. Available: https://jurnal.uin-antasari.ac.id/index.php/jils/issue/view/472

N. Nurhanifah, “URGENSI PENDIDIKAN AL-QUR’AN: KAJIAN PROBLEMATIKA KETIDAKMAMPUAN MEMBACA AL-QUR’AN DAN SOLUSINYA,” JUMPER: Journal of Educational Multidisciplinary Research, vol. 2, no. 1, pp. 102–114, Jan. 2023, doi: 10.56921/jumper.v2i1.73.

Zulfitria, “PERANAN PEMBELAJARAN TAHFIDZ AL-QURAN DALAMPENDIDIKAN KARAKTER DI SEKOLAH DASAR,” Naturalistic: Jurnal Kajian Penelitian Pendidikan dan Pembelajaran 1, no. 2, pp. 124–134, Apr. 2017.

S. Susanto and M. A. Muhaidori, “The Role of Tahfidz Al-Quran Learning in Assisting Religious Studies,” International Journal of Language and Ubiquitous Learning, vol. 2, no. 2, Jul. 2024, doi: 10.70177/ijlul.v2i2.1150.

N. M. Mustafa, Z. Mohd Zaki, K. A. Mohamad, M. Basri, and S. Ariffin, “Development and Alpha Testing of EzHifz Application: Al-Quran Memorization Tool,” Advances in Human-Computer Interaction, vol. 2021, 2021, doi: 10.1155/2021/5567001.

R. A. Rajagede and R. P. Hastuti, “Al-Quran recitation verification for memorization test using Siamese LSTM network,” Communications in Science and Technology, vol. 6, no. 1, pp. 35–40, 2021, doi: 10.21924/CST.6.1.2021.344.

D. Wang, X. Wang, and S. Lv, “An overview of end-to-end automatic speech recognition,” 2019, MDPI AG. doi: 10.3390/sym11081018.

S. Alharbi et al., “Automatic Speech Recognition: Systematic Literature Review,” 2021, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/ACCESS.2021.3112535.

S. Al-Fadhli, H. Al-Harbi, and A. Cherif, “Speech Recognition Models for Holy Quran Recitation Based on Modern Approaches and Tajweed Rules: A Comprehensive Overview,” IJACSA) International Journal of Advanced Computer Science and Applications, vol. 14, no. 12, p. 2023, 2023, [Online]. Available: www.ijacsa.thesai.org

M. Hadwan, H. A. Alsayadi, and S. AL-Hagree, “An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters,” Computers, Materials and Continua, vol. 74, no. 2, pp. 3471–3487, 2023, doi: 10.32604/cmc.2023.033457.

Y. He et al., “Streaming End-to-end Speech Recognition For Mobile Devices,” Nov. 2018, [Online]. Available: http://arxiv.org/abs/1811.06621

R. Prabhavalkar, T. Hori, T. N. Sainath, R. Schluter, and S. Watanabe, “End-to-End Speech Recognition: A Survey,” IEEE/ACM Trans Audio Speech Lang Process, vol. 32, pp. 325–351, 2024, doi: 10.1109/TASLP.2023.3328283.

J. Li, “Recent Advances in End-to-End Automatic Speech Recognition,” Redmond, Feb. 2022. doi: 10.1561/116.00000050_supp.

N. Sethiya and C. K. Maurya, “End-to-End Speech-to-Text Translation: A Survey,” Indore: Indian Institute of Technology, Jun. 2024.

D. Ferdiansyah, C. Sri Kusuma Aditya, J. Raya Tlogomas No, K. Lowokwaru, K. Malang, and J. Timur, “Implementasi Automatic Speech Recognition Bacaan Al-Qur’an Menggunakan Metode Wav2Vec 2.0 dan OpenAI-Whisper,” JURNAL TEKNIK ELEKTRO DAN KOMPUTER TRIAC, vol. 11, no. 1, pp. 2615–7764, 2024, [Online]. Available: https://journal.trunojoyo.ac.id/triac

A. Moustafa and S. A. Aly, “Towards an Efficient Voice Identification Using Wav2Vec2.0 and HuBERT Based on the Quran Reciters Dataset,” Nov. 2021, [Online]. Available: http://arxiv.org/abs/2111.06331

A. Rahman, M. M. Kabir, M. F. Mridha, M. Alatiyyah, H. F. Alhasson, and S. S. Alharbi, “Arabic Speech Recognition: Advancement and Challenges,” IEEE Access, vol. 12, pp. 39689–39716, 2024, doi: 10.1109/ACCESS.2024.3376237.

A. A. Abdelhamid, H. A. Alsayadi, and I. Hegazy, “End-to-End Arabic Speech Recognition: A Review,” Oct. 2020. [Online]. Available: https://www.researchgate.net/publication/344799361

A. Purbasari, F. R. Rinawan, A. Zulianto, A. I. Susanti, and H. Komara, “CRISP-DM for Data Quality Improvement to Support Machine Learning of Stunting Prediction in Infants and Toddlers,” in Proceedings - 2021 8th International Conference on Advanced Informatics: Concepts, Theory, and Application, ICAICTA 2021, Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/ICAICTA53211.2021.9640294.

J. Brzozowska, J. Pizo?, G. Baytikenova, A. Gola, A. Zakimova, and K. Piotrowska, “DATA ENGINEERING IN CRISP-DM PROCESS PRODUCTION DATA – CASE STUDY,” Applied Computer Science, vol. 19, no. 3, pp. 83–95, 2023, doi: 10.35784/acs-2023-26.

C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 526–534. doi: 10.1016/j.procs.2021.01.199.

A. Rianti et al., “CRISP-DM: Metodologi Proyek Data Science,” Prosiding Seminar Nasional Teknologi Informasi dan Bisnis (SENATIB), 2023.

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.04356

S. Alrumiah and A. Alshargabi, “A Deep Diacritics-Based Recognition Model for Arab,” IEEE Access, vol. 10, 2022.

S. Fradj, “Speaker Recognition and Automatic Speech Recognition ,A personal project exploring methods and techniques in Speaker Recognition and Automatic Speech Recognition,” Tunis Business School, Mar. 2025, doi: 10.5281/zenodo.15102949.

Q. A. Obaidah et al., “A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.04280

Thennal, D. Gopinath, J. James, and M. Ashraf, “Advocating Character Error Rate for Multilingual ASR Evaluation,” Cornell University, Oct. 2024. doi: https://doi.org/10.48550/arXiv.2410.07400.

A. A. Sodhar, T. H. Ansari, and A. Q. Channa, “Introduction and history of Qur’an recitation,” Al Khadim Research Journal of Islamic Culture and Civilization, vol. V, no. 3, pp. 183–205, 2024, [Online]. Available: https://www.arjicc.com

A. N. Farahdiba et al., “Bringing the Qur’an to life: Teaching students the art of reciting the Qur’an,” Jurnal Pembelajaran Pemberdayaan Masyarakat (JP2M), vol. 5, no. 2, pp. 295–305, Jun. 2024, doi: 10.33474/jp2m.v5i2.21704.

A. Andreyev, “Quantization for OpenAI’s Whisper Models: A Comparative Analysis,” 2025.

C. Graham and N. Roll, “Evaluating OpenAI’s Whisper ASR: Performance analysis across diverse accents and speaker traits,” JASA Express Lett, vol. 4, no. 2, Feb. 2024, doi: 10.1121/10.0024876.

Y. Liu, X. Yang, and D. Qu, “Exploration of Whisper fine-tuning strategies for low-resource ASR,” EURASIP J Audio Speech Music Process, vol. 2024, no. 1, Dec. 2024, doi: 10.1186/s13636-024-00349-3.

N. San et al., “Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.02302

A. Waheed, H. Atwany, R. Singh, and B. Raj, “On the Robust Approximation of ASR Metrics,” 2025.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Evaluating End-to-End ASR for Qur'an Recitation Using Whispers in Low Resource Settings

Dimensions Badge

ARTICLE HISTORY

Published: 2025-06-30

Abstract View: 52 times
PDF Download: 87 times

How to Cite

Abdullah Azzam, Ichsan Taufik, & Aldy Rialdy Atmadja. (2025). Evaluating End-to-End ASR for Qur’an Recitation Using Whispers in Low Resource Settings. Bulletin of Computer Science Research, 5(4), 778-787. https://doi.org/10.47065/bulletincsr.v5i4.561

Issue

Section

Articles