LLMs are A Dead End in Search for General Machine Intelligence: A Review
Main Article Content
Abstract
This extensive review of large language models (LLMs) aims to highlight the importance of scaling the current generation of large language models toward artificial general intelligence, which is a dead end, while also considering the risks of unregulated use of such models. Through this, it is aimed to explicitly highlight the intelligence factor of current large language models and their malicious manipulative ability. While many large language model organisations compete to achieve better results by scaling up their models, this ultimately leads to the models' collapse. It is too early to understand the development and benefits of large language models; many have cited LLMs as the primary means of achieving general intelligence agents. To counter this, this paper gathers and evaluates resources from multiple research articles and tests several frequently used LLMs, highlighting their importance in different scenarios. As these models are trained on a wide variety of data, they exhibit domain-independent intelligent behaviour but fail to exhibit causal intelligent behaviour.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
OpenAI et al., "GPT-4 Technical Report," arXiv preprint arXiv:2303.08774, Mar. 2023. [Online]. Available:
https://arxiv.org/abs/2303.08774
Jones, C.R., Bergen, B.K.: Does GPT-4 pass the Turing test? (2024). https://arxiv.org/abs/2310.20216
S. Court and M. Elsner, "Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem," arXiv preprint arXiv:2406.15625, 2024. [Online]. Available: https://arxiv.org/abs/2406.15625
LeCun, Y.: A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review 62(1), 1–62 (2022).https://openreview.net/forum?id=BZ5a1r-kVsf
Wang, R., Todd, G., Xiao, Z., Yuan, X., Cˆot´e, M.-A., Clark, P., Jansen, P.: Can Language Models Serve as Text-Based World Simulators? (2024). https://arxiv.org/abs/2406.06485
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024).
https://arxiv.org/abs/2407.21783
Brown et al. (2020): Language Models are Few-Shot Learners. In: NeurIPS 2020 Proceedings. https://arxiv.org/abs/2005.14165
Kamoi, R., Zhang, Y., Zhang, N., Han, J., Zhang, R.: When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (2024). https://arxiv.org/abs/2406.01297
Wei, J., Zhang, Y., Zhang, L.Y., Ding, M., Chen, C., Ong, K.-L., Zhang, J., Xiang, Y.: Memorisation in deep learning: A survey (2024).https://arxiv.org/abs/2406.03880
Blank, I.A.: What are large language models supposed to model? Trends in Cognitive Sciences 27(11), 987–989 (2023). DOI: https://doi.org/10.1016/j.tics.2023.08.006
Paech, S.J.: EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models https://arxiv.org/abs/2312.06281
Nyamsuren, E., Taatgen, N.: Human reasoning module. Biologically Inspired Cognitive Architectures 8 (2014).DOI: https://doi.org/10.1016/j.bica.2014.02.002
Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A.F., Ippolito, D., Choquette-Choo, C.A., Wallace, E., Tram`er, F., Lee, K.: Scalable extraction of training data from (production) language models. ArXiv (2023). DOI: https://doi.org/10.48550/arXiv.2311.17035
Chollet, F.: On the measure of intelligence. http://arxiv.org/abs/1911.01547
Han, S.J., Ransom, K.J., Perfors, A., Kemp, C.: Inductive reasoning in humans and large language models. Cognitive Systems Research 83, 101155 (2024). DOI: https://doi.org/10.1016/j.cogsys.2023.101155
Houser, K.: LLMs are a dead end to AGI, says Franc¸ois Chollet (2024). https://www.freethink.com/robots-ai/arc-prize-agi
Opiel-ka, G., Rosenbusch, H., Vijverberg, V., Stevenson, C.E.: Do Large Language Models Solve ARC Visual Analogies Like People Do? (2024). https://arxiv.org/abs/2403.09734
Rinaldi, L., Karmiloff-Smith, A.: Intelligence as a developing function: A neuro-constructivist approach. Journal of Intelligence 5 (2017). https://www.mdpi.com/2079-3200/5/2/18
Fang, M., Deng, S., Zhang, Y., Shi, Z., Chen, L., Pechenizkiy, M., Wang, J.: Large language models are neurosymbolic reasoners. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 17985–17993 (2024). https://ojs.aaai.org/index.php/AAAI/article/view/29712
Wu, F., Zhang, N., Jha, S., McDaniel, P.D., Xiao, C.: A new era in LLM security: Exploring security concerns in real-world LLM-based systems. https://arxiv.org/abs/2402.18649
Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., Zhang, Y.: A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 100211 (2024) DOI: https://doi.org/10.1016/j.hcc.2024.100211
X. Chang, G. Dai, H. Di, and H. Ye, “Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection.” 2025. https://arxiv.org/abs/2504.16125
Narayanan, A.: Indirect prompt injection via hidden instructions on a webpage (2023).
https://x.com/random_walker/status/1636923058370891778
Xu, Z., Liu, Y., Deng, G., Li, Y., Picek, S.: Llm jailbreak attack versus defence techniques–a comprehensive study. arXiv preprint arXiv:2402.13457 (2024). https://arxiv.org/abs/2402.13457
Shen, X., Chen, Z., Backes, M., Shen, Y., Zhang, Y.: ”Do Anything Now”: Characterising and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models (2024). https://arxiv.org/abs/2308.03825
Liu, X., Xu, N., Chen, M., Xiao, C.: AutoDAN: Generating stealthy jailbreak prompts on aligned large language models. In: The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=7Jwpw4qKkb
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J.Z., Fredrikson, M.: Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043 (2023).
https://arxiv.org/abs/2307.15043
Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, N., Anderson, R., Gal, Y.: AI models collapse when trained on recursively generated data: nature 631(8022), 755–759 (2024). DOI: https://doi.org/10.1038/s41586-024-07566-y
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., et al.: A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232 (2023).https://arxiv.org/abs/2311.05232
Cohen, S., Bitton, R., Nassi, B.: Here comes the AI worm: Unleashing zero-click worms that target genai-powered applications. ArXiv abs/2403.02817 (2024) https://arxiv.org/abs/2403.02817
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M.: Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection (2023). https://arxiv.org/abs/2302.12173
Abdelaziz, I., Basu, K., Agarwal, M., Kumaravel, S., Stallone, M., Panda, R., Rizk, Y., Bhargav, G., Crouse, M., Gunasekara, C., Ikbal, S., Joshi, S., Karanam, H., Kumar, V., Munawar, A., Neelam, S., Raghu, D., Sharma, U., Soria, A.M., Sreedhar, D., Venkateswaran, P., Unuvar, M., Cox, D., Roukos, S., Lastras, L., Kapanipathi, P.: Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks (2024). https://arxiv.org/abs/2407.00121
Chen, W., Li, Z., Ma, M.: Octopus: On-device language model for function calling of software APIs (2024).
https://arxiv.org/abs/2404.01549
Wang, Y., Yu, J., Yao, Z., Zhang, J., Xie, Y., Tu, S., Fu, Y., Feng, Y., Zhang, J., Zhang, J., Huang, B., Li, Y., Yuan, H., Hou, L., Li, J., Tang, J.: A Solution-based LLM API-using Methodology for Academic Information Seeking (2024). https://arxiv.org/abs/2405.15165
Villalobos, P., Ho, A., Sevilla, J., Besiroglu, T., Heim, L., Hobbhahn, M.: Position: Will we run out of data? limits of LLM scaling based on human-generated data. In: Forty-first International Conference on Machine Learning (2024).
https://openreview.net/forum?id=ViZcgDQjyG
Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Sleight, H., Hughes, J., Korbak, T., Agrawal, R.,
Pai, D., Gromov, A., Roberts, D.A., Yang, D., Donoho, D.L., Koyejo, S.: Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data (2024). https://arxiv.org/abs/2404.01413
Mart´ınez, G., Watson, L., Reviriego, P., Hern´andez, J.A., Juarez, M., Sarkar, R.: Towards Understanding
the Interplay of Generative Artificial Intelligence and the Internet (2023). https://arxiv.org/abs/2306.06130
Zhang, Q., Zeng, B., Zhou, C., Go, G., Shi, H., Jiang, Y.: Human-imperceptible retrieval-poisoning attacks in LLM-powered applications. In: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pp. 502–506 (2024) DOI: https://dl.acm.org/doi/10.1145/3663529.3663793
Long, L., Wang, R., Xiao, R., Zhao, J., Ding, X., Chen, G., Wang, H.: On LLMs-driven synthetic data generation, curation, and evaluation: A survey. arXiv preprint arXiv:2406.15126 (2023) https://arxiv.org/abs/2406.15126
Yan, B., Li, K., Xu, M., Dong, Y., Zhang, Y., Ren, Z., Cheng, X.: On protecting the data privacy of large language models (llms): A survey. arXiv preprint arXiv:2403.05156 (2024). https://arxiv.org/abs/2403.05156
Inan, H., Upasani, K., Chi, J., Rungta, R., Iyer, K., Mao, Y., Tontchev, M., Hu, Q., Fuller, B., Testuggine, D., et al.: Llama guard: LLM-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674 (2023) https://arxiv.org/abs/2312.06674
Pal, M.: Meta faces backlash over WhatsApp jokes hurting religious sentiments. Times Now (2024)
Lukas, N., Salem, A., Sim, R., Tople, S., Wutschitz, L., Zanella-B´eguelin, S.: Analysing leakage of personally identifiable information in language models. In: 2023 IEEE Symposium on Security and Privacy (SP), pp. 346–363 (2023). IEEE.DOI: https://doi.org/10.1109/SP46215.2023.10179418
He, F., Zhu, T., Ye, D., Liu, B., Zhou, W., Yu, P.S.: The emerging security and privacy of LLM agents: A survey with case studies. arXiv preprint arXiv:2407.19354 (2024). https://arxiv.org/abs/2407.19354
Majeed, A., Hwang, S.O.: Reliability issues of llms: ChatGPT, a case study. IEEE Reliability Magazine, 1–11 (2024).
DOI: https://doi.org/10.1109/MRL.2024.3420849
Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of stochastic parrots: Can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21, pp. 610–623. Association for Computing Machinery, New York, NY, USA (2021).
DOI: https://doi.org/10.1145/3442188.3445922
Arkoudas, K.: ChatGPT is no stochastic parrot. But it also claims that 1 is greater than 1. Philosophy & Technology 36(3), 54 (2023)DOI: https://doi.org/10.1007/s13347-023-00640-3
[48] Hicks, M.T., Humphries, J., Slater, J.: ChatGPT is bullshit. Ethics and Information Technology 26(2), 38 (2024).DOI: https://doi.org/10.1007/s10676-024-09702-3
Nejjar, M., Zacharias, L., Stiehle, F., Weber, I.: Llms for science: Usage for code generation and data analysis. arXiv preprint arXiv:2311.16733 (2023). https://arxiv.org/abs/2311.16733
He, Y., Wang, E., Rong, Y., Cheng, Z., Chen, H.: Security of AI agents. arXiv preprint arXiv:2406.08689 (2024).
https://arxiv.org/abs/2406.08689
Hasani, R., Lechner, M., Wang, T.-H., Chahine, M., Amini, A., Rus, D.: Liquid structural state-space models. arXiv preprint arXiv:2209.12951 (2022) https://arxiv.org/abs/2209.12951