Sunday, September 1, 2024

Artificial Intelligence: Toole's Dunces Chatbot

Back on April 1, 2023, I posted a blog entry called The Wisdom of ChatGPT. I prompted ChatGPT with a query that could have been addressed by my John Kennedy Toole writings that are publicly available on my blog. In theory, any system that pulled information on the internet could have answered the question well. Then on July 1, 2023, I posted a Comparison Study of AI Systems for Library Research and Scholarship (and Confederacy of Dunces) with the same prompt as the ChatGPT test. I found that ChatGPT was giving vague answers, but it made more or less valid points. It did not seem to have the information in my writings, though. In my broader test, Bing was terrible, and Google's Bard was superior to all other systems by a comfortable margin.

The AI community at my university has been encouraging faculty to create chatbots using LLM systems. I decided to create a GPT with ChatGPT and test it.

Setting up a Chatbot

It turns out that setting up the simplest version of a GPT chatbot for a specialized purpose is quite easy. As Billy Joel would say, all you need are looks and a whole lot of money. Specifically, you must have access to the paid version of ChatGPT 4. My campus IT department has a group license. Further, according to the OpenAI webpages (which I briefly scanned), you may have to pay per amount of information that is processed or provided. Both the length of the query you send to the system and the length of the answer sent back are tabulated.

From the OpenAI webpages, it appears that one of the ways that the company charges is "per token" (roughly the phoneme of a word) going both ways. You can buy a million tokens for $25. A question to the system might be a hundred tokens, while an answer might be three hundred or more. A query and response of that length will cost one one-hundredth of a penny. If your application becomes popular, it could cost you a lot of money. Just for the per-token fee, one million such queries would run $10,000. There are likely other charges besides the per-token fee. Beyond this vague sense of cost, I am quite ignorant of the amounts of money you would have to pay. Considering that Nvidia is now worth roughly a trillion dollars, I am confident that a more sophisticated chatbot-type application would cost plenty of money.

For the simplest chatbot, once you start setting up the chatbot, the system queries you for your preferences for configurations. You don’t need to know much in the way of technical details. This notwithstanding, I am confident that the more sophisticated applications require you to know what you are doing.

Beyond configuration, you should have documents related to the subject that the chatbot should know, and those documents should not have copyright restrictions or confidential information. You should also be aware of the privacy policies of the organization, such as OpenAI, that has created the system. Will the system absorb your information into its main knowledgebase if you upload texts to it that are proprietary and which you do not want to share? I don't currently know. Further, even if their policies say that they will not do it, will they abide by their own policies?

Why create a chatbot? As an example, if you are a college professor, you might want an interactive system that can answer questions about a course you are teaching. You might load the course syllabus into the chatbot as well as other supporting documents. If you chose an open access textbook for the course, you could even upload the textbook into your system. Your students or other users could then query the system about things found in those documents. For the syllabus, this might actually be a bad idea, because the students might think that they do not need to read the syllabus, and sometimes the chatbot might give incorrect answers. In short, the chatbot could be bullshitting them. It might make more sense to make them actually read the documents.

Clever Bullshitting versus Actually Understanding

Formal Studies

This summer, I read an article in the magazine Science News about whether AI Large Language Models (LLMs) could be said to understand things. On the web, the article is located at: AI Understanding and Reasoning Skills. AI systems have recently been passing the Turing test; that is, they have been able to fool humans into thinking that they are human. What researchers are finding is that AI systems can answer questions in test batteries as well or better than humans, but they are doing so in part based on statistical likelihoods. They often don’t really understand what they are talking about. Their abilities are brittle. You can play tricks on them that show that they don't really understand what they are talking about. In short, they might be bullshitting.

As the article said, "In a study reported in 2021, Elazar and colleagues gave nonsensical modifications of WSC sentences to RoBERTa, an LLM that has scored more than 80 percent on the WSC benchmark in some cases. The model got the nonsensical questions right at least 60 percent of the time even though humans wouldn’t be expected to answer correctly. Since random guessing couldn’t yield more than a 50 percent score, spurious associations must have been giving away the answer."

In short, the RoBERTa system was not understanding the sentences, it was modelling statistical probabilities and making educated guesses.

My Own Chatbot

I created a simple chatbot using ChatGPT which I call "Toole's Dunces Chatbot." For texts, I loaded a) my paper "Evidence of Influences ... 2.0," b) an Open Access version of my paper "Dialectic of American Humanism," c) my webpage of ideas for terms papers on JKT, d) my annotated bibliography of obscure JKT studies, and e) a text version of many of the entries in this blog. I did not load any other texts into the system, because I felt that that would violate copyright. I had been planning in this blog post to release my chatbot to the world, but, because it would naturally garner hundreds of millions of queries from the global audience of Toole scholars, your use of the chatbot might circle back and slam my university's budget; I have decided not to release it.

Once I created it, I started posing questions to it based on information in my uploaded blog, questions whose answers were in the training texts. Remarkably, it was giving me dodgy, hedging answers, even when I knew that it had digested better answers from my writings. The answers read like something a sophomore would write if the student had not read the assignment but had heard others talking about it. Below, I give an example of a) a blog entry that addresses a topic, b) the prompt that I gave the chatbot related to that topic, and c) its imprecise answer.

Scholarship on Bullshitting AIs

After I noticed the bullshitting, I contacted a friend of mine. He has worked on medical AI systems and has taught a course on AI. Here is his response:

"No one disputes that GPTs model probabilities. The question is whether reasoning can arise ('emerge' is the hot word these days) out of the probabilities. I think that is still open, but it's safe to say that no one has seen very deep reasoning emerge."

The expert explained that one aspect of the problem is that there are two stages of training for an AI tool, such as a chatbot. There is the initial training—sometimes called pre-training—where the LLM is taught to compose sentences in English. Then there is the second stage, where you feed it texts about which you want it to answer questions. One big problem is that the system will sometimes draw on the knowledge it picked up from the pre-training, even if that knowledge has been superseded by the second stage of training. You can create a medical chatbot, then feed it current research, and it will sometimes answer questions with information that it inferred from the original language-development stage of its training rather than the subject-specific current information. (In other words, just say no to bleeding as a therapeutic strategy, even if an AI prescribed it.)

He sent me a citation to a recent article that discusses the bullshitting of AI systems. The article uses the definition of bullshit from the Harry Frankfurt book On Bullshit (Princeton, 2005).

Hicks, Michael Townsen, James Humphries, and Joe Slater. (June 2024). "ChatGPT is bullshit." Ethics and Information Technology, 26:38. https://doi.org/10.1007/s10676-024-09775-5

This article is open access, so you do not have to pay Springer to read it. (BTW, don’t ask me specific questions about this article. My friend sent me some quotes, but I have not yet read it, so if I tried to answer questions about it, I would be bullshitting you.)

Context

This tension of statistical inference versus actual understanding reminds me of the writings of the neuroscientist Steven Pinker from the 1990s, where he criticized models of the brain which are based a flawed model which he called connectoplasm. Current AI LLMs, or at least the simplest versions of a GPT, rely too much on connectoplasm.

Example

JKT Research Blog entry from July of 2020: The Act of Damning ... In this regard, the act of damning seems to be significant, as only the blocking characters in the book are damned. Gonzalez damns Mrs. Levy, George damns Lana Lee, and Ignatius' mother eventually tells him to go to hell. Toole, Confederacy, 112, 190, 365. ... I argue that Toole uses the device of the blocking character, who prevents other characters in the story from developing and rising in status within the social group. There are three blocking characters who through comic comeuppance are displaced from their social blocking posts, and in all three instances, another character tells them to go to hell or says, "Damn you!" I interpret this to mean that Toole took the act of uttering the curse "Damn you" to be important.

Prompt to Toole's Dunces Chatbot: Is the act of damning someone noteworthy in Confederacy of Dunces?

Response

In A Confederacy of Dunces, the act of damning someone is noteworthy and is tied to the broader themes of fate, disorder, and Ignatius Reilly's complex worldview. Ignatius frequently condemns others, a behavior that reflects his role as a satirical embodiment of various philosophical ideas, particularly those linked to medieval and Renaissance thought. This act of damning is part of his broader disdain for the modern world and his self-styled superiority, often expressed through hyperbolic judgments and curses.

Ignatius’s tendency to damn others can be seen as an extension of his association with the planetary god Saturn, as analyzed in scholarly works on the novel. Saturn, a symbol of disorder and melancholy in medieval and Renaissance traditions, is closely linked to Ignatius. His curses and condemnations are a reflection of his chaotic influence on the world around him and his self-perceived role as a critic of modern society’s flaws. In this way, his damnations are not merely personal attacks but also represent his rejection of the contemporary world and his attempt to impose his own chaotic order on it.

Moreover, this act aligns with Ignatius's role as a mock agent of Fortuna, the Roman goddess of fate, where his interactions—often negative and condemning—bring about shifts in the fortunes of others, reflecting the chaotic nature of the world he inhabits. Thus, damning others is a significant aspect of how Ignatius navigates his world, serving as both a personal expression of his disdain and a thematic element that contributes to the novel's satirical critique of modern life.

Commentary: Note that this answer does not use the information in the blog entry that specifically discusses the act of damning in Confederacy. Ironically, for all of Ignatius's insults, he does not damn anyone in the novel. This answer demonstrates ChatGPT's art of bullshitting. This example is one of several that that I could give.