LLMs are A Dead End in Search for General Machine Intelligence: A Review

Noor Chauhan; Diveyam Mishra; Mustafa Akolawala; Prof. M.P.S. Chawla; Asst. Prof. Khushboo Nagar

doi:10.35940/ijsce.F3707.16010326

PDF

Published: 30-03-2026

DOI: https://doi.org/10.35940/ijsce.F3707.16010326

Keywords:

Large Language Model, General Machine Intelligence, Generative Pretrained Transformer, Machine Common Sense, Artificial Intelligence

Noor Chauhan

Department of Artificial Intelligence and Data Science, University of Mumbai, Mumbai (Maharashtra), India.

https://orcid.org/0009-0007-9521-9943

Diveyam Mishra

Department of Electrical Engineering, Shri Govindram Seksaria Institute of Technology and Science, Indore (Madhya Pradesh), India.

https://orcid.org/0009-0008-6683-4493

Mustafa Akolawala

Department of Computer Science, University of Mumbai, Mumbai (Maharashtra), India.

https://orcid.org/0009-0008-5676-2967

Prof. M.P.S. Chawla

Department of Electrical Engineering, Shri Govindram Seksaria Institute of Technology and Science, Indore (Madhya Pradesh), India.

Asst. Prof. Khushboo Nagar

Assistant Professor, Department of Electrical Engineering, Shri Govindram Seksaria Institute of Technology and Science, Indore (Madhya Pradesh), India.

https://orcid.org/0000-0002-4111-8983

Abstract

This extensive review of large language models (LLMs) aims to highlight the importance of scaling the current generation of large language models toward artificial general intelligence, which is a dead end, while also considering the risks of unregulated use of such models. Through this, it is aimed to explicitly highlight the intelligence factor of current large language models and their malicious manipulative ability. While many large language model organisations compete to achieve better results by scaling up their models, this ultimately leads to the models' collapse. It is too early to understand the development and benefits of large language models; many have cited LLMs as the primary means of achieving general intelligence agents. To counter this, this paper gathers and evaluates resources from multiple research articles and tests several frequently used LLMs, highlighting their importance in different scenarios. As these models are trained on a wide variety of data, they exhibit domain-independent intelligent behaviour but fail to exhibit causal intelligent behaviour.

Downloads

Download data is not yet available.

Issue

Vol. 16 No. 1 (2026): Volume-16 Issue-1, March 2026

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

CC-BY-NC-ND 4.0

How to Cite

[1]

Noor Chauhan, Diveyam Mishra, Mustafa Akolawala, Prof. M.P.S. Chawla, and Asst. Prof. Khushboo Nagar, “LLMs are A Dead End in Search for General Machine Intelligence: A Review”, IJSCE, vol. 16, no. 1, pp. 1–9, Mar. 2026, doi: 10.35940/ijsce.F3707.16010326.

References

OpenAI et al., "GPT-4 Technical Report," arXiv preprint arXiv:2303.08774, Mar. 2023. [Online]. Available:

https://arxiv.org/abs/2303.08774

Jones, C.R., Bergen, B.K.: Does GPT-4 pass the Turing test? (2024). https://arxiv.org/abs/2310.20216

S. Court and M. Elsner, "Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem," arXiv preprint arXiv:2406.15625, 2024. [Online]. Available: https://arxiv.org/abs/2406.15625

LeCun, Y.: A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review 62(1), 1–62 (2022).https://openreview.net/forum?id=BZ5a1r-kVsf

Wang, R., Todd, G., Xiao, Z., Yuan, X., Cˆot´e, M.-A., Clark, P., Jansen, P.: Can Language Models Serve as Text-Based World Simulators? (2024). https://arxiv.org/abs/2406.06485

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024).

https://arxiv.org/abs/2407.21783

Brown et al. (2020): Language Models are Few-Shot Learners. In: NeurIPS 2020 Proceedings. https://arxiv.org/abs/2005.14165

Kamoi, R., Zhang, Y., Zhang, N., Han, J., Zhang, R.: When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (2024). https://arxiv.org/abs/2406.01297

Wei, J., Zhang, Y., Zhang, L.Y., Ding, M., Chen, C., Ong, K.-L., Zhang, J., Xiang, Y.: Memorisation in deep learning: A survey (2024).https://arxiv.org/abs/2406.03880

Blank, I.A.: What are large language models supposed to model? Trends in Cognitive Sciences 27(11), 987–989 (2023). DOI: https://doi.org/10.1016/j.tics.2023.08.006

Paech, S.J.: EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models https://arxiv.org/abs/2312.06281

Nyamsuren, E., Taatgen, N.: Human reasoning module. Biologically Inspired Cognitive Architectures 8 (2014).DOI: https://doi.org/10.1016/j.bica.2014.02.002

Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A.F., Ippolito, D., Choquette-Choo, C.A., Wallace, E., Tram`er, F., Lee, K.: Scalable extraction of training data from (production) language models. ArXiv (2023). DOI: https://doi.org/10.48550/arXiv.2311.17035

Chollet, F.: On the measure of intelligence. http://arxiv.org/abs/1911.01547

Han, S.J., Ransom, K.J., Perfors, A., Kemp, C.: Inductive reasoning in humans and large language models. Cognitive Systems Research 83, 101155 (2024). DOI: https://doi.org/10.1016/j.cogsys.2023.101155

Houser, K.: LLMs are a dead end to AGI, says Franc¸ois Chollet (2024). https://www.freethink.com/robots-ai/arc-prize-agi

Opiel-ka, G., Rosenbusch, H., Vijverberg, V., Stevenson, C.E.: Do Large Language Models Solve ARC Visual Analogies Like People Do? (2024). https://arxiv.org/abs/2403.09734

Rinaldi, L., Karmiloff-Smith, A.: Intelligence as a developing function: A neuro-constructivist approach. Journal of Intelligence 5 (2017). https://www.mdpi.com/2079-3200/5/2/18

Fang, M., Deng, S., Zhang, Y., Shi, Z., Chen, L., Pechenizkiy, M., Wang, J.: Large language models are neurosymbolic reasoners. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 17985–17993 (2024). https://ojs.aaai.org/index.php/AAAI/article/view/29712

Wu, F., Zhang, N., Jha, S., McDaniel, P.D., Xiao, C.: A new era in LLM security: Exploring security concerns in real-world LLM-based systems. https://arxiv.org/abs/2402.18649

Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., Zhang, Y.: A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 100211 (2024) DOI: https://doi.org/10.1016/j.hcc.2024.100211

X. Chang, G. Dai, H. Di, and H. Ye, “Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection.” 2025. https://arxiv.org/abs/2504.16125

Narayanan, A.: Indirect prompt injection via hidden instructions on a webpage (2023).

https://x.com/random_walker/status/1636923058370891778

Xu, Z., Liu, Y., Deng, G., Li, Y., Picek, S.: Llm jailbreak attack versus defence techniques–a comprehensive study. arXiv preprint arXiv:2402.13457 (2024). https://arxiv.org/abs/2402.13457

Shen, X., Chen, Z., Backes, M., Shen, Y., Zhang, Y.: ”Do Anything Now”: Characterising and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models (2024). https://arxiv.org/abs/2308.03825

Liu, X., Xu, N., Chen, M., Xiao, C.: AutoDAN: Generating stealthy jailbreak prompts on aligned large language models. In: The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=7Jwpw4qKkb

Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J.Z., Fredrikson, M.: Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043 (2023).

https://arxiv.org/abs/2307.15043

Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, N., Anderson, R., Gal, Y.: AI models collapse when trained on recursively generated data: nature 631(8022), 755–759 (2024). DOI: https://doi.org/10.1038/s41586-024-07566-y

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., et al.: A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232 (2023).https://arxiv.org/abs/2311.05232

Cohen, S., Bitton, R., Nassi, B.: Here comes the AI worm: Unleashing zero-click worms that target genai-powered applications. ArXiv abs/2403.02817 (2024) https://arxiv.org/abs/2403.02817

Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M.: Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection (2023). https://arxiv.org/abs/2302.12173

Abdelaziz, I., Basu, K., Agarwal, M., Kumaravel, S., Stallone, M., Panda, R., Rizk, Y., Bhargav, G., Crouse, M., Gunasekara, C., Ikbal, S., Joshi, S., Karanam, H., Kumar, V., Munawar, A., Neelam, S., Raghu, D., Sharma, U., Soria, A.M., Sreedhar, D., Venkateswaran, P., Unuvar, M., Cox, D., Roukos, S., Lastras, L., Kapanipathi, P.: Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks (2024). https://arxiv.org/abs/2407.00121

Chen, W., Li, Z., Ma, M.: Octopus: On-device language model for function calling of software APIs (2024).

https://arxiv.org/abs/2404.01549

Wang, Y., Yu, J., Yao, Z., Zhang, J., Xie, Y., Tu, S., Fu, Y., Feng, Y., Zhang, J., Zhang, J., Huang, B., Li, Y., Yuan, H., Hou, L., Li, J., Tang, J.: A Solution-based LLM API-using Methodology for Academic Information Seeking (2024). https://arxiv.org/abs/2405.15165

Villalobos, P., Ho, A., Sevilla, J., Besiroglu, T., Heim, L., Hobbhahn, M.: Position: Will we run out of data? limits of LLM scaling based on human-generated data. In: Forty-first International Conference on Machine Learning (2024).

https://openreview.net/forum?id=ViZcgDQjyG

Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Sleight, H., Hughes, J., Korbak, T., Agrawal, R.,

Pai, D., Gromov, A., Roberts, D.A., Yang, D., Donoho, D.L., Koyejo, S.: Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data (2024). https://arxiv.org/abs/2404.01413

Mart´ınez, G., Watson, L., Reviriego, P., Hern´andez, J.A., Juarez, M., Sarkar, R.: Towards Understanding

the Interplay of Generative Artificial Intelligence and the Internet (2023). https://arxiv.org/abs/2306.06130

Zhang, Q., Zeng, B., Zhou, C., Go, G., Shi, H., Jiang, Y.: Human-imperceptible retrieval-poisoning attacks in LLM-powered applications. In: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pp. 502–506 (2024) DOI: https://dl.acm.org/doi/10.1145/3663529.3663793

Long, L., Wang, R., Xiao, R., Zhao, J., Ding, X., Chen, G., Wang, H.: On LLMs-driven synthetic data generation, curation, and evaluation: A survey. arXiv preprint arXiv:2406.15126 (2023) https://arxiv.org/abs/2406.15126

Yan, B., Li, K., Xu, M., Dong, Y., Zhang, Y., Ren, Z., Cheng, X.: On protecting the data privacy of large language models (llms): A survey. arXiv preprint arXiv:2403.05156 (2024). https://arxiv.org/abs/2403.05156

Inan, H., Upasani, K., Chi, J., Rungta, R., Iyer, K., Mao, Y., Tontchev, M., Hu, Q., Fuller, B., Testuggine, D., et al.: Llama guard: LLM-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674 (2023) https://arxiv.org/abs/2312.06674

Pal, M.: Meta faces backlash over WhatsApp jokes hurting religious sentiments. Times Now (2024)

https://www.timesnownews.com/technology-science/meta-ai-faces-backlash-on-whatsapp-jokes-hurting-religious-sentiments-article-109600087

Lukas, N., Salem, A., Sim, R., Tople, S., Wutschitz, L., Zanella-B´eguelin, S.: Analysing leakage of personally identifiable information in language models. In: 2023 IEEE Symposium on Security and Privacy (SP), pp. 346–363 (2023). IEEE.DOI: https://doi.org/10.1109/SP46215.2023.10179418

He, F., Zhu, T., Ye, D., Liu, B., Zhou, W., Yu, P.S.: The emerging security and privacy of LLM agents: A survey with case studies. arXiv preprint arXiv:2407.19354 (2024). https://arxiv.org/abs/2407.19354

Majeed, A., Hwang, S.O.: Reliability issues of llms: ChatGPT, a case study. IEEE Reliability Magazine, 1–11 (2024).

DOI: https://doi.org/10.1109/MRL.2024.3420849

Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of stochastic parrots: Can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21, pp. 610–623. Association for Computing Machinery, New York, NY, USA (2021).

DOI: https://doi.org/10.1145/3442188.3445922

Arkoudas, K.: ChatGPT is no stochastic parrot. But it also claims that 1 is greater than 1. Philosophy & Technology 36(3), 54 (2023)DOI: https://doi.org/10.1007/s13347-023-00640-3

[48] Hicks, M.T., Humphries, J., Slater, J.: ChatGPT is bullshit. Ethics and Information Technology 26(2), 38 (2024).DOI: https://doi.org/10.1007/s10676-024-09702-3

Nejjar, M., Zacharias, L., Stiehle, F., Weber, I.: Llms for science: Usage for code generation and data analysis. arXiv preprint arXiv:2311.16733 (2023). https://arxiv.org/abs/2311.16733

He, Y., Wang, E., Rong, Y., Cheng, Z., Chen, H.: Security of AI agents. arXiv preprint arXiv:2406.08689 (2024).

https://arxiv.org/abs/2406.08689

Hasani, R., Lechner, M., Wang, T.-H., Chahine, M., Amini, A., Rus, D.: Liquid structural state-space models. arXiv preprint arXiv:2209.12951 (2022) https://arxiv.org/abs/2209.12951

Article Sidebar

Main Article Content