Implementasi Langchain dan Large Language Models Dalam Automatic Question Generation Untuk Computer Assisted Test

Novri Rahman; Nazruddin Safaat Harahap; Muhammad  Affandes; Pizaini Pizaini

doi:10.47065/bulletincsr.v5i4.558

Authors

Novri Rahman Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Nazruddin Safaat Harahap Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Muhammad Affandes Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Pizaini Pizaini Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia

DOI:

https://doi.org/10.47065/bulletincsr.v5i4.558

Keywords:

Automatic Question Generation; Computer Assisted Test; Large Language Models; LangChain; GPT-4o

Abstract

The advancement of Artificial Intelligence (AI), particularly Large Language Models (LLM), presents new opportunities in transforming educational assessment systems. This study aims to implement the LangChain framework integrated with LLM for an Automatic Question Generation (AQG) system within a Computer Assisted Test (CAT) platform, using eleventh-grade Biology subject matter as a case study. The methodology includes data collection from PDF-based instructional materials, text embedding using Facebook AI Similarity Search (FAISS) as the knowledge base, and automatic question generation through the GPT-4o model. The system is developed using a microservices architecture comprising frontend and backend services built with the Next.js, FastAPI, and Express.js frameworks. System evaluation was conducted using the User Acceptance Test (UAT) and the DeepEval framework. The evaluation results show a teacher satisfaction rate of 92.7% and a positive response from students at 67.5%. Meanwhile, the DeepEval assessment reported average scores of 3,69% for hallucination, 97,44% for contextual precision, 83,30% for contextual relevancy, 70,63% for answer relevancy, and 92,47% for prompt alignment. These findings indicate that the integration of LangChain and LLM is effective in generating contextually accurate and relevant questions, although improvements are still needed in answer relevancy. This study is expected to provide an efficient solution for digital-based educational assessment and contribute to future developments in educational AI.

Downloads

Download data is not yet available.

References

L. Chen, P. Chen, and Z. Lin, “Artificial Intelligence in Education: A Review,” IEEE Access, vol. 8, pp. 75264–75278, 2020, doi: 10.1109/ACCESS.2020.2988510.

M. Holland and K. Chaudhari, “Large language model based agent for process planning of fiber composite structures,” Manuf Lett, vol. 40, pp. 100–103, Jul. 2024, doi: 10.1016/j.mfglet.2024.03.010.

F. Hans-Georg, F. Peter, and K. Julius, “Conceptual Modeling and Large Language Models: Impressions From First ExperimentsWith ChatGPT,” Enterprise Modelling and Information Systems Architectures, vol. 18, Jan. 2023, doi: 10.18417/emisa.18.3.

S. A. M. Hogenboom, F. F. J. Hermans, and H. L. J. Van der Maas, “Computerized adaptive assessment of understanding of programming concepts in primary school children,” Computer Science Education, vol. 32, no. 4, pp. 418–448, 2022, doi: 10.1080/08993408.2021.1914461.

A. Maharani, R. Habib Adibarata, T. Anggara, and Y. Hanoselina, “Efektivitas Penggunaan Sistem Cat Dalam Penerimaan Pegawai Negeri Sipil Di Upt Bkn Padang,” Jurnal Ilmu Manajemen, Bisnis dan Ekonomi, vol. 2, no. 3, 2024, doi: doi.org/10.59971/jimbe.v2i3.359.

K. B. Utomo, A. Azizah, and M. A. Pangestu, “Peran Computer Assited Test dalam Implementasi Penilaian di SD Negeri 005 Palaran,” Jurnal Ilmu Siber dan Teknologi Digital, vol. 1, no. 1, pp. 29–39, Nov. 2022, doi: 10.35912/jisted.v1i1.1529.

E. P. Saputra, R. N. Alfiyah, and I. Indriyanti, “Computer Assessment Test at the Association of Indonesian Independent Housing Experts with Waterfall Model,” Jurnal CoreIT: Jurnal Hasil Penelitian Ilmu Komputer dan Teknologi Informasi, vol. 9, no. 1, p. 29, Jun. 2023, doi: 10.24014/coreit.v9i1.11483.

R. Setiawan, “Optimasi Pengalaman Pengguna Dan Prototyping Untuk Penilaian Otomatis Dan Pencegahan Kecurangan,” bit-Tech, vol. 7, no. 2, pp. 299–306, Dec. 2024, doi: 10.32877/bt.v7i2.1758.

I. A. Buana, M. Yunus, and S. Suratman, “Implementasi Sistem Computer-Based Test (CBT) Dalam Pengelolaan Ujian di MAN Insan Cendekia Paser,” Jurnal Tarbiyah dan Ilmu Keguruan Borneo, vol. 5, no. 2, pp. 219–228, Mar. 2024, doi: 10.21093/jtikborneo.v5i2.7822.

S. Izadi and M. Forouzanfar, “Error Correction and Adaptation in Conversational AI: A Review of Techniques and Applications in Chatbots,” AI (Switzerland), vol. 5, no. 2, pp. 803–841, Jun. 2024, doi: 10.3390/ai5020041.

B. Ogunleye, K. I. Zakariyyah, O. Ajao, O. Olayinka, and H. Sharma, “A Systematic Review of Generative AI for Teaching and Learning Practice,” Educ Sci (Basel), vol. 14, no. 6, Jun. 2024, doi: 10.3390/educsci14060636.

N. S. Harahap, A. Saad, and H. Ubaidullah, “Comprehensive Bibliometric Literature Review of Chatbot Research: Trends, Frameworks, and Emerging Applications,” IJACSA) International Journal of Advanced Computer Science and Applications, vol. 16, no. 1, p. 2025, doi: 10.14569/IJACSA.2025.0160185.

G. Roffo, “Exploring Advanced Large Language Models with LLMsuite,” Arxiv, Jul. 2024, doi: 10.13140/RG.2.2.11774.80963.

R. P. Kiran, S. Khaiyum, A. R. Palandye, and A. S. D, “Leveraging LLaMA3 and LangChain for Rapid AI Application Development,” J. Electrical Systems, vol. 20, no. 10, pp. 2146–2153, 2024, doi: 10.52783/jes.5539.

M. I. Syah, “Penerapan Retrieval Augemented Generation Menggunakan Langchain Dalam Pengembangan Sistem Tanya Jawab Hadis Berbasis Web,” Zonasi, vol. 6, no. 2, 2024, doi: https://doi.org/10.31849/zn.v6i2.19940.

L. Pusch and T. O. F. Conrad, “Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering,” ArXiv, Nov. 2024, doi: doi.org/10.48550/arXiv.2409.04181.

S. Maity, A. Deroy, and S. Sarkar, “Leveraging In-Context Learning and Retrieval-Augmented Generation for Automatic Question Generation in Educational,” Proceedings of ACM Conference, 2025, doi: 10.48550/arXiv.2501.17397.

S. Shahriar et al., “Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency,” Applied Sciences (Switzerland), vol. 14, no. 17, Sep. 2024, doi: 10.3390/app14177782.

V. Patel, “Analyzing the Impact of Next.JS on Site Performance and SEO,” International Journal of Computer Applications Technology and Research, vol. 12, no. 10, pp. 24–27, 2023, doi: 10.7753/ijcatr1210.1004.

H. A. Jartarghar, G. R. Salanke, A. K. A.R, and S. Dalali, “React Apps with Server-Side Rendering: Next.js,” Journal of Telecommunication, Electronic and Computer Engineering, vol. 14, no. 4, Dec. 2022, doi: 10.54554/jtec.2022.14.04.005.

A. N. Safitri and I. Harkespan, “Pengembangan Web Service Menggunakan Framework Fastapi Untuk Meningkatkan Kemudahan Integrasi Sistem Informasi Akademik Multiplatform,” Jurnal Teknoif Teknik Informatika Institut Teknologi Padang, vol. 12, no. 2, pp. 149–157, Oct. 2024, doi: 10.21063/jtif.2024.V12.2.149-157.

A. T. Saputro and M. Novita, “Comparative Analysis of Express and Hono Framework Performance in Simple Registration Application,” sinkron, vol. 9, no. 1, pp. 406–412, Jan. 2025, doi: 10.33395/sinkron.v9i1.14333.

P. Pujianto, M. Mujito, D. Prabowo, and B. H. Prasetyo, “Pemilihan Warga Penerima Bantuan Program Keluarga Harapan (PKH) Menggunakan Metode Simple Additive Weighting (SAW) dan User Acceptance Testing (UAT),” Jurnal Informatika Universitas Pamulang, vol. 5, no. 3, p. 379, Sep. 2020, doi: 10.32493/informatika.v5i3.6671.

B. Simamora, “Skala Likert, Bias Penggunaan dan Jalan Keluarnya,” Jurnal Manajemen, vol. 12, no. 1, pp. 84–93, Nov. 2022, doi: 10.46806/jman.v12i1.978.

T. Dharmawan and A. Witanti, “Evaluasi Llama3.2 3b Untuk Menghasilkan Soal Otomatis Dengan Deepeval Berdasarkan Metrik Answer Relevancy Dan Hallucination,” Jurnal Informatika Teknologi dan Sains, vol. 7, no. 1, pp. 242–248, 2025, doi: 10.51401/jinteks.v7i1.5423.

A. B. Permadi, N. H Safaat, L. Handayani, and Yusra, “Implementasi Question Answering System Tafsir Al-Azhar Menggunakan Langchain Dan Large Language Model Berbasis Chatbot Telegram,” Jurnal Teknoif Teknik Informatika Institut Teknologi Padang, vol. 12, no. 1, pp. 62–69, Apr. 2024, doi: 10.21063/jtif.2024.v12.1.62-69.

T. Dharmawan and A. Witanti, “Evaluasi Llama3.2 3b Untuk Menghasilkan Soal Otomatis Dengan Deepeval Berdasarkan Metrik Answer Relevancy Dan Hallucination,” Jurnal Informatika Teknologi dan Sains, vol. 7, no. 1, pp. 242–248, 2025, doi: 10.51401/jinteks.v7i1.5423.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Implementasi Langchain dan Large Language Models Dalam Automatic Question Generation Untuk Computer Assisted Test

Implementasi Langchain dan Large Language Models Dalam Automatic Question Generation Untuk Computer Assisted Test

Authors

DOI:

Keywords:

Abstract

Downloads

References

ARTICLE HISTORY

How to Cite

Issue

Section

Most read articles by the same author(s)