Large Language Models for JSON-Based Function Call Planning from Indonesian Natural Language: A Restaurant Search Chatbot Case Study

Mohammad Mauludin; Joan Santoso; Hartarto Junaedi

doi:10.32664/smatika.v16i01.2216

Authors

Mohammad Mauludin Institut Sains dan Teknologi Terpadu Surabaya
Joan Santoso Institut Sains dan Teknologi Terpadu Surabaya
Hartarto Junaedi Institut Sains dan Teknologi Terpadu Surabaya

DOI:

https://doi.org/10.32664/smatika.v16i01.2216

Keywords:

Large Language Models, Function Call Planning, JSON Generation, Indonesian Natural Language, Conversational Agents

Abstract

Large Language Models are increasingly adopted as planning components that translate natural language into structured representations for tool invocation, enabling executable interaction with backend systems through JSON based function calling. However, empirical studies focusing on Indonesian natural language remain limited. This paper presents a restaurant search chatbot case study that investigates JSON based function call planning from Indonesian user queries, with emphasis on the upstream planning task rather than conversational response generation. A synthetic dataset of 33,470 Indonesian restaurant search queries paired with ground truth JSON plans was constructed based on a predefined tool set and database schema. Supervised fine tuning with parameter efficient adaptation was applied to a pretrained language model. The fine tuned Mistral 7B model was evaluated using multiple metrics measuring JSON structural validity, tool sequence correctness, and parameter accuracy at different granularities. The results show strong performance, achieving a JSON structure validity rate of 0.97, tool sequence exact match accuracy of 0.92, column level accuracy of 0.97, and value level accuracy of 0.94. More stringent evaluation at the session level reveals remaining challenges in composing all parameters correctly within a single planning instance. Overall, the findings demonstrate that with carefully designed datasets and strict supervision, Large Language Models can reliably perform structured JSON based function call planning from Indonesian natural language, providing a practical foundation for extending this approach to other structured application domains where execution correctness is critical.

References

[1] C. Qu et al., “Tool Learning with Large Language Models: A Survey,” Front. Comput. Sci., vol. 19, no. 8, pp. 1–33, Nov. 2024, doi: 10.1007/s11704-024-40678-2.

[2] C. Raffel et al., “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” Journal of Machine Learning Research, vol. 21, pp. 1–67, Oct. 2019, Accessed: Dec. 22, 2025. [Online]. Available: https://arxiv.org/abs/1910.10683v4

[3] E. J. Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” Jun. 2021, Accessed: Dec. 22, 2025. [Online]. Available: https://arxiv.org/abs/2106.09685v2

[4] J. Lian, Y. Lei, X. Huang, J. Yao, W. Xu, and X. Xie, “RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems,” WWW 2024 Companion - Companion Proceedings of the ACM Web Conference, vol. 1, pp. 1031–1034, Mar. 2024, doi: 10.1145/3589335.3651242.

[5] J. Wei et al., “Finetuned Language Models Are Zero-Shot Learners,” ICLR 2022 - 10th International Conference on Learning Representations, Sep. 2021, Accessed: Dec. 22, 2025. [Online]. Available: https://arxiv.org/abs/2109.01652v5

[6] S. AlFahryan and S. Suryayusra, “Pengembangan Website Chatbot Untuk Kampus Bina Darma,” SMATIKA JURNAL, vol. 13, no. 02, pp. 304–317, Dec. 2023, doi: 10.32664/SMATIKA.V13I02.930.

[7] K. Christakopoulou, F. Radlinski, and K. Hofmann, “Towards conversational recommender systems,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 815–824, Aug. 2016, doi: 10.1145/2939672.2939746.

[8] L. Ouyang et al., “Training language models to follow instructions with human feedback”.

[9] L. Wang et al., “A Survey on Large Language Model based Autonomous Agents,” Front. Comput. Sci., vol. 18, no. 6, pp. 1–42, Mar. 2025, doi: 10.1007/s11704-024-40231-1.

[10] OpenAI et al., “GPT-4 Technical Report,” Mar. 2023, Accessed: Dec. 21, 2025. [Online]. Available: https://arxiv.org/abs/2303.08774v6

[11] S. Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models,” 11th International Conference on Learning Representations, ICLR 2023, Oct. 2022, Accessed: Dec. 21, 2025. [Online]. Available: https://arxiv.org/abs/2210.03629v3

[12] T. Auliarachman and A. Purwarianti, “Coreference Resolution System for Indonesian Text with Mention Pair Method and Singleton Exclusion using Convolutional Neural Network,” Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, ICAICTA 2019, Sep. 2020, doi: 10.1109/ICAICTA.2019.8904261.

[13] T. B. Brown et al., “Language Models are Few-Shot Learners,” Adv. Neural Inf. Process. Syst., vol. 2020-December, May 2020, Accessed: Dec. 21, 2025. [Online]. Available: https://arxiv.org/abs/2005.14165v4

[14] T. Schick et al., “Toolformer: Language Models Can Teach Themselves to Use Tools,” Adv. Neural Inf. Process. Syst., vol. 36, Feb. 2023, Accessed: Dec. 21, 2025. [Online]. Available: https://arxiv.org/abs/2302.04761v1

[15] A. Abdullah, Jumadi, and D. Firdaus, “Implementasi Algoritma Bidirectional Encoder Representations From Transformer Pada Speech To Text Untuk Notulensi Rapat,” SMATIKA JURNAL, vol. 15, no. 02, pp. 423–431, Dec. 2025, doi: 10.32664/SMATIKA.V15I02.1725.

[16] Y. Lu et al., “Learning to Generate Structured Output with Schema Reinforcement Learning,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 4905–4918, 2025, doi: 10.18653/V1/2025.ACL-LONG.243.

[17] Y. Qin et al., “ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs,” 12th International Conference on Learning Representations, ICLR 2024, Jul. 2023, Accessed: Dec. 22, 2025. [Online]. Available: https://arxiv.org/abs/2307.16789v2

[18] M. A. Albany, I. Taufik, and I. Budiman, “Implementasi Algoritma BERT untuk Question and Answer System Terkait Hadist dalam Bentuk Virtual Youtuber,” SMATIKA JURNAL, vol. 15, no. 02, pp. 408–422, Dec. 2025, doi: 10.32664/SMATIKA.V15I02.1704.

[19] Y. Wang et al., “Self-Instruct: Aligning Language Models with Self-Generated Instructions,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 13484–13508, Dec. 2022, doi: 10.18653/v1/2023.acl-long.754.

[20] H. Suhendar et al., “Analisis Sentimen Hasil Transkripsi Audio Berbahasa Indonesia Menggunakan T5 (Text-to-Text Transfer Transformer),” SMATIKA JURNAL, vol. 15, no. 01, pp. 115–125, Jun. 2025, doi: 10.32664/SMATIKA.V15I01.1521.