Evaluating End-to-End ASR for Qur'an Recitation Using Whispers in Low Resource Settings
DOI:
https://doi.org/10.47065/bulletincsr.v5i4.561Keywords:
End-to-end ASR; Recitation of the Qur'an; Whispering Models; Low-Resource Speech Recognition; Character Error RateAbstract
This study investigated the use of End-to-End Automatic Speech Recognition (E2E ASR) for Qur'an recitation under low resource conditions using the Whisper model. This study follows the CRISP-DM methodology, starting with defining the research gap and preparing a curated dataset of 200 verses from Juz 30. These verses were chosen because of their short and consistent structure, allowing for efficient experimentation. Audio and transcription pairs are verified and cleaned to ensure alignment and quality. The modeling was done using Whisper in Google Colaboratory, leveraging its pre-trained architecture to reduce training time and computing costs. Evaluations use the Character Error Rate (CER) metric to measure transcription accuracy. The results showed that Whisper achieved an average CER of 0.142, corresponding to a transcription accuracy of about 85%. However, the average processing time per father is 11 seconds, almost double the time it takes for a human readout. Although Whisper provides strong accuracy for Arabic transcription, its runtime efficiency remains a challenge in real-time applications. This research contributes reproducible channels, validated datasets, and performance benchmarks for future studies of the Qur'anic ASR under computational constraints.
Downloads
References
A. Rifani, “BAHASA AL-QUR’AN SEBAGAI BAGIAN DALAM IJTIHADIYYAH,” 2019. [Online]. Available: https://jurnal.uin-antasari.ac.id/index.php/jils/issue/view/472
N. Nurhanifah, “URGENSI PENDIDIKAN AL-QUR’AN: KAJIAN PROBLEMATIKA KETIDAKMAMPUAN MEMBACA AL-QUR’AN DAN SOLUSINYA,” JUMPER: Journal of Educational Multidisciplinary Research, vol. 2, no. 1, pp. 102–114, Jan. 2023, doi: 10.56921/jumper.v2i1.73.
Zulfitria, “PERANAN PEMBELAJARAN TAHFIDZ AL-QURAN DALAMPENDIDIKAN KARAKTER DI SEKOLAH DASAR,” Naturalistic: Jurnal Kajian Penelitian Pendidikan dan Pembelajaran 1, no. 2, pp. 124–134, Apr. 2017.
S. Susanto and M. A. Muhaidori, “The Role of Tahfidz Al-Quran Learning in Assisting Religious Studies,” International Journal of Language and Ubiquitous Learning, vol. 2, no. 2, Jul. 2024, doi: 10.70177/ijlul.v2i2.1150.
N. M. Mustafa, Z. Mohd Zaki, K. A. Mohamad, M. Basri, and S. Ariffin, “Development and Alpha Testing of EzHifz Application: Al-Quran Memorization Tool,” Advances in Human-Computer Interaction, vol. 2021, 2021, doi: 10.1155/2021/5567001.
R. A. Rajagede and R. P. Hastuti, “Al-Quran recitation verification for memorization test using Siamese LSTM network,” Communications in Science and Technology, vol. 6, no. 1, pp. 35–40, 2021, doi: 10.21924/CST.6.1.2021.344.
D. Wang, X. Wang, and S. Lv, “An overview of end-to-end automatic speech recognition,” 2019, MDPI AG. doi: 10.3390/sym11081018.
S. Alharbi et al., “Automatic Speech Recognition: Systematic Literature Review,” 2021, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/ACCESS.2021.3112535.
S. Al-Fadhli, H. Al-Harbi, and A. Cherif, “Speech Recognition Models for Holy Quran Recitation Based on Modern Approaches and Tajweed Rules: A Comprehensive Overview,” IJACSA) International Journal of Advanced Computer Science and Applications, vol. 14, no. 12, p. 2023, 2023, [Online]. Available: www.ijacsa.thesai.org
M. Hadwan, H. A. Alsayadi, and S. AL-Hagree, “An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters,” Computers, Materials and Continua, vol. 74, no. 2, pp. 3471–3487, 2023, doi: 10.32604/cmc.2023.033457.
Y. He et al., “Streaming End-to-end Speech Recognition For Mobile Devices,” Nov. 2018, [Online]. Available: http://arxiv.org/abs/1811.06621
R. Prabhavalkar, T. Hori, T. N. Sainath, R. Schluter, and S. Watanabe, “End-to-End Speech Recognition: A Survey,” IEEE/ACM Trans Audio Speech Lang Process, vol. 32, pp. 325–351, 2024, doi: 10.1109/TASLP.2023.3328283.
J. Li, “Recent Advances in End-to-End Automatic Speech Recognition,” Redmond, Feb. 2022. doi: 10.1561/116.00000050_supp.
N. Sethiya and C. K. Maurya, “End-to-End Speech-to-Text Translation: A Survey,” Indore: Indian Institute of Technology, Jun. 2024.
D. Ferdiansyah, C. Sri Kusuma Aditya, J. Raya Tlogomas No, K. Lowokwaru, K. Malang, and J. Timur, “Implementasi Automatic Speech Recognition Bacaan Al-Qur’an Menggunakan Metode Wav2Vec 2.0 dan OpenAI-Whisper,” JURNAL TEKNIK ELEKTRO DAN KOMPUTER TRIAC, vol. 11, no. 1, pp. 2615–7764, 2024, [Online]. Available: https://journal.trunojoyo.ac.id/triac
A. Moustafa and S. A. Aly, “Towards an Efficient Voice Identification Using Wav2Vec2.0 and HuBERT Based on the Quran Reciters Dataset,” Nov. 2021, [Online]. Available: http://arxiv.org/abs/2111.06331
A. Rahman, M. M. Kabir, M. F. Mridha, M. Alatiyyah, H. F. Alhasson, and S. S. Alharbi, “Arabic Speech Recognition: Advancement and Challenges,” IEEE Access, vol. 12, pp. 39689–39716, 2024, doi: 10.1109/ACCESS.2024.3376237.
A. A. Abdelhamid, H. A. Alsayadi, and I. Hegazy, “End-to-End Arabic Speech Recognition: A Review,” Oct. 2020. [Online]. Available: https://www.researchgate.net/publication/344799361
A. Purbasari, F. R. Rinawan, A. Zulianto, A. I. Susanti, and H. Komara, “CRISP-DM for Data Quality Improvement to Support Machine Learning of Stunting Prediction in Infants and Toddlers,” in Proceedings - 2021 8th International Conference on Advanced Informatics: Concepts, Theory, and Application, ICAICTA 2021, Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/ICAICTA53211.2021.9640294.
J. Brzozowska, J. Pizo?, G. Baytikenova, A. Gola, A. Zakimova, and K. Piotrowska, “DATA ENGINEERING IN CRISP-DM PROCESS PRODUCTION DATA – CASE STUDY,” Applied Computer Science, vol. 19, no. 3, pp. 83–95, 2023, doi: 10.35784/acs-2023-26.
C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 526–534. doi: 10.1016/j.procs.2021.01.199.
A. Rianti et al., “CRISP-DM: Metodologi Proyek Data Science,” Prosiding Seminar Nasional Teknologi Informasi dan Bisnis (SENATIB), 2023.
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.04356
S. Alrumiah and A. Alshargabi, “A Deep Diacritics-Based Recognition Model for Arab,” IEEE Access, vol. 10, 2022.
S. Fradj, “Speaker Recognition and Automatic Speech Recognition ,A personal project exploring methods and techniques in Speaker Recognition and Automatic Speech Recognition,” Tunis Business School, Mar. 2025, doi: 10.5281/zenodo.15102949.
Q. A. Obaidah et al., “A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.04280
Thennal, D. Gopinath, J. James, and M. Ashraf, “Advocating Character Error Rate for Multilingual ASR Evaluation,” Cornell University, Oct. 2024. doi: https://doi.org/10.48550/arXiv.2410.07400.
A. A. Sodhar, T. H. Ansari, and A. Q. Channa, “Introduction and history of Qur’an recitation,” Al Khadim Research Journal of Islamic Culture and Civilization, vol. V, no. 3, pp. 183–205, 2024, [Online]. Available: https://www.arjicc.com
A. N. Farahdiba et al., “Bringing the Qur’an to life: Teaching students the art of reciting the Qur’an,” Jurnal Pembelajaran Pemberdayaan Masyarakat (JP2M), vol. 5, no. 2, pp. 295–305, Jun. 2024, doi: 10.33474/jp2m.v5i2.21704.
A. Andreyev, “Quantization for OpenAI’s Whisper Models: A Comparative Analysis,” 2025.
C. Graham and N. Roll, “Evaluating OpenAI’s Whisper ASR: Performance analysis across diverse accents and speaker traits,” JASA Express Lett, vol. 4, no. 2, Feb. 2024, doi: 10.1121/10.0024876.
Y. Liu, X. Yang, and D. Qu, “Exploration of Whisper fine-tuning strategies for low-resource ASR,” EURASIP J Audio Speech Music Process, vol. 2024, no. 1, Dec. 2024, doi: 10.1186/s13636-024-00349-3.
N. San et al., “Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.02302
A. Waheed, H. Atwany, R. Singh, and B. Raj, “On the Robust Approximation of ASR Metrics,” 2025.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Evaluating End-to-End ASR for Qur'an Recitation Using Whispers in Low Resource Settings
ARTICLE HISTORY
How to Cite
Issue
Section
Copyright (c) 2025 Abdullah Azzam, Ichsan Taufik, Aldy Rialdy Atmadja

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).