"Limitless" Language Models: Exploring the AI Horizon in the Era of Nothing but Blue Skies

Photo: S.E.A. Aquarium

This year, I had the opportunity to participate in EMNLP23 held in Singapore, where I presented two papers. Numerous compelling discussions and topics took centre stage, predominantly revolving around Large Language Models (LLMs). A year following the official launch of ChatGPT, LLMs steadily captivated the interest of AI scientists. It seems that the field of AI is entering a new era, marked by a distinct research agenda diverging from the path the AI community has followed in the last years. In EMNLP23, I have come across numerous inspiring works by NLP experts, demonstrating that LLMs are poised to tackle a wide range of real-world problems. On the inaugural day of my attendance, as I scrolled into the EMNLP 2023 updates on Twitter, I chanced upon Google's official unveiling of Gemini AI. The newly released generative model asserts its superiority over human experts in Massive Multitask Language Understanding (MMLU) and showcases impressive multi-modal capabilities, as highlighted in Google's advertisement. For a moment I thought, This sounds exciting! It seems that AI scientists have made it! We are now stepping into the Era of Nothing but Blue Skies in the AI field!

Wait a minute! Are we truly in the realm of reality, or are we stepping into an era of technological hallucination?. Lately, numerous voices in the AI community have expressed concerns about the hallucination (i.e., the generative output is factually incorrect) and memorisation (i.e., the model reproduces specific examples from the training data verbatim) problems. Indeed, LLMs, such as chatGPT, present a two-fold issue of balancing creativity versus reproducibility. As this week tweeted by Andrej Karpathy, In some sense, hallucination is all LLMs do. They are dream machines. Hallucination is not a bug, it is LLM's greatest feature. All that said, I realize that what people *actually* mean is they don't want an LLM Assistant (a product like ChatGPT etc.) to hallucinate. An LLM Assistant is a lot more complex system than just the LLM itself, even if one is at the heart of it.. In essence, I think that what Andrej Karpathy means is that the LLM itself does not have a "hallucination problem" as hallucination is considered its key feature, while the research focus should be on addressing this issue in LLM Assistants. On the other hand, the memorisation problem in LLMs often arises from the models having a large number of parameters and the potential to overfit the training data. A pertinent study on memorization was spotted at the theme track "Large Language Models and the Future of NLP," titled Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU.. Researchers should work on developing strategies to mitigate memorisation issues and enhance the generalization ability of LLMs.

Nevertheless, the research potentials extends beyond the hallucination and memorization effects of LLMs. Researchers also need to tackle other significant challenges, such as reproducibility, explainability, evaluation, security, and training efficiency. An interesting discussion unfolded during the workshop The Big Picture: Crafting a Research Narrative (BigPicture), collocated with EMNLP23: Have we truly made it? After two decades, we find ourselves still utilizing backpropagation algorithms for model training instead of employing quantum algorithms to accelerate the learning process.. The truth is that this quantum research field is still evolving, and it's essential to stay updated on the latest advancements in both quantum computing and AI.

This year in EMNLP23, many researchers have shown a preference for Chain-of-Thought (CoT) prompting (a logical sequence of reasoning that leads to a particular decision or output) as a method to address the challenges of explainability and interpretability. While a CoT can contribute to the explainability and interpretability of a model, it is not necessarily sufficient on its own. Explainability and interpretability in the context of LLMs often involve more comprehensive methods. In the engaging keynote speech delivered by Christopher D. Manning, it was emphasized that We need to adopt a similar approach to program developers when addressing bugs in code—actively engaging with specific components and employing creativity to identify errors in LLMs, exemplifying the work of Backpack Language Models by John Hewitt et al.

Ensuring the robustness and reliability of Large Language Models (LLMs) involves addressing security concerns, particularly in the context of adversarial attacks. At the EMNLP awards ceremony, I had the pleasure of attending the presentation of the best paper in the theme, delivered by Sander Schulhoff. Both his work and the paper's title, Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition, are truly captivating.

To conclude, it seems to me that the rise of Large Language Models (LLMs) signifies the start of a new era, bringing along a rejuvenated research agenda. The current developments in AI are truly exhilarating, particularly when we reflect on the fact that as recently as 2016, the predominant tools for the NLP and AI community were LSTM and CNN architectures. Numerous research directions are now open for exploration, offering ample opportunities to progress toward the envisioned outcomes since the introduction of LLMs.