Friday, November 3, 2023

The Limitations of Generative AI LLMs: A Personal Example


In today's digital realm, generative AI Language Models (LLMs) like ChatGPT by OpenAI and Google Bard have become focal points of discussion. These models, equipped with the ability to craft human-like text, offer a myriad of applications. Yet, understanding their constraints remains paramount.

It's crucial to grasp that these models don't genuinely "think." They generate text by mimicking human language patterns from vast datasets, not from comprehension or consciousness. Contrary to popular belief, LLMs don't actively "search" the internet for real-time answers. They've been trained on extensive data, but they don’t browse the web live. Faced with unfamiliar topics, they make educated guesses based on previous patterns.

Let's turn to a personal experience. I'm an avid supporter of Fresno State Football, and my YouTube channel boasts four decades of game footage and highlights. Leveraging AI, I've crafted game recaps and summaries for each video description. An observable trend is that the AI's accuracy correlates with a game's media coverage. The more widespread the reporting, the more accurate the AI summary, although my expertise often catches occasional inaccuracies.

A case in point is a recap I requested from Google Bard for the 1986 NCAA Football game between Fresno State and University of Nevada-Las Vegas (UNLV). While the game had national coverage, it didn’t dominate ratings and missed many viewers and journalists, especially outside the Pacific time zone. During this engagement, Bard's recap showed marked inaccuracies. 

For example, in paragraph 1, Bard inaccurately labeled the game's conference as the Big West Conference. In 1986, both schools were part of the Pacific Coast Athletic Association (PCAA), which later was rebranded as the Big West in 1988. Furthermore, in paragraph 3, Bard mistakenly identified Jeff Tedford as Fresno State's quarterback for that game, even though he had vacated the position in 1982. Another error was with respect to UNLV's Ickey Woods and Charles Williams. While Woods was mischaracterized as the quarterback, he actually played as a running back. Intriguingly, Charles Williams, who began playing in 2017, was incorrectly cited in the 1986 account. A notable tidbit is that both Woods and Williams hail from Fresno.

These oversights illuminate AI’s tendency to piece together plausible, quasi-relevant information when faced with data gaps. The tidbits about the conference, Jeff Tedford, Ickey Woods and Charles Williams are all "semi-in-the-ballpark" information. It’s as if you threw a ball for a dog, and it earnestly brought back a stick; close, but not quite accurate.

The underlying message here is the imperative of scrutinizing AI outputs. LLMs, while powerful, can occasionally deliver out-of-context or misleading information. Critical assessment of AI responses is as essential as vetting any unfamiliar source.

Generative AI LLMs, revolutionary as they are, come with their set of challenges. Approaching their outputs with discernment is vital, especially in education. Teachers should rigorously vet AI-derived content, and students must be taught to assess the reliability of AI-generated information. In doing so, we foster a balanced approach, benefiting from AI while upholding the veracity of the information at hand.

1 comment:

  1. Very interesting post, Adam. Not many people have the encyclopedic knowledge about Fresno State football that you do. I don't know how teachers are going to find the time to vet AI-derived content. Most teachers are only using LLMs because they don't have the time to write clearly or think a complex issue through themselves. I find the personal examples fascinating.