1 Jan 2024
The world is witnessing the dawn of the AI age and excitement is at its peak as to just how much the technology will change the existing paradigms. In this article, Oli Viner explains how these systems work and discusses how they might impact on society and the veterinary profession…
Image © iStock.com / Naphat Ittipong (edited)
Unless you’ve been living under a rock, you’ve probably heard about Chat GPT by OpenAI. Chat GPT is a large language model (LLM), and despite being the most famous, it is not the first. In fact, under ChatGPT is a model called GPT 4, which, as the name suggests, was the fourth major iteration of this model. There’s a whole evolutionary tree of this sort of model, with lots of crazy acronyms such as LLaMA from Meta, and Bard from Google.
Where ChatGPT really revolutionised things however was the “Chat” part – the interface that allowed normal people to converse with the LLM through a web browser. For those who haven’t tried it (and I’d highly recommend you do) – you visit chat.openai.com and start conversing with a “bot”, but this is unlike any bot you’ve used before. Its ability to understand and infer context is truly amazing. What is even more amazing is that we don’t truly understand exactly how all of this works.
GPT-4 has been trained on a huge dataset, using unimaginable amounts of publicly available data such as scientific articles, Wikipedia, as well as general web pages and discussion forums. All of that data went into the model, with it all broken down into words/parts as “tokens”. The system has the ability to look at the previous few thousand tokens and then attempts to guess the next one, and it does this one word at a time. For example, if it has “The cat sat on the…” and it is trying to guess the next token, it will have a probability matrix of what that next word is likely to be. The more context it has, for example a previous statement saying “In my house there is a cat near the front door”, the more likely the answer is to be correct. However, even without that context it will infer it based on probabilities and would likely guess the word “mat”, even without more context.
The truly mind-bending thing about LLMs is how accurately they can guess the next word for seemingly very complex questions and answers. This wasn’t necessarily expected. It was not obvious, nor known, when it was first started that layering transformer networks would be able to achieve anything like what they have. And potentially more concerning, they are just another type of black box that we don’t understand. We do not really know what the model is doing internally and how it is deriving the output. But they are spookily good.
When a model is first built, it is still somewhat raw. These models (which are generally not made public) can give highly “accurate” answers, but they can give information in ways that are not palatable to humans, or to certain cultural sensitivities. For example, an unrefined model can be asked how to perform terrorist acts, how to cause disruption to political systems, as well as to give fundamentally discriminatory answers to questions that are deemed unpalatable.
These raw models therefore undergo a period of refinement, generally through something known as “reinforcement learning through human feedback” (RLHF). Essentially, a group of humans presents a series of questions, and gives a thumbs up or thumbs down to the answers. The responses of the humans are fed back into the model, and the model then “reweights” parts of the network, such that the answers it returns are more likely to get a thumbs up. Think of it as puppy classes for neural networks.
What is fascinating about this process is that while the raw models are able to give correct answers to questions around probabilities and other very literal/logical questions, the performance on these sorts of answers gets worse with RLHF. Humans themselves are really poor at understanding probabilities and a worsening of purely mathematical reasoning seems to be an almost emergent property of making the network more “human”.
Another really interesting and fun thing to do with LLMs like ChatGPT is to ask them to perform a task, but given a very specific context. One such task might be to ask it to answer a question as if it were Shakespeare. Give it a go and watch it do it – it’s amazing. It can mimic the outputs to such an amazing degree that it would be hard for anyone other than an expert to discern the difference.
By correctly writing these “prompts” for the AI – how we ask the question, we can get them to adopt different “personalities”. This raises the question again around what is actually occurring within the model for these outputs to be so very different depending on the context given.
It is possible for the system to also inadvertently (one assumes) lie. A great example can be elicited by asking detailed scientific questions. The system will be able to produce links and references to other papers that look real and are termed “hallucinations”. It has done this because the system knows what a “proper” reference looks like and so it mimics a reference, often using real authors or titles, in varying combinations. These papers don’t exist, however. The answer given may not even be incorrect, but the way it has provided its source clearly is.
Possibly even more interestingly, the system is not “aware” it is lying on the initial first pass/stream of data, but if you ask the system “Does the source exist?” or even “Is this correct?” it can answer that it is not. By feeding the output back into the system as an input, it can correctly determine that it was incorrect, but as the system is just progressing one token at a time as it builds the output it is unable to do this on the first pass.
Image © Lidiia Moor / iStock.com
One of the fascinating things about these LLMs is that they start to cause us to ask deep philosophical questions around consciousness, reasoning and intelligence. How do we define these, what are they, and are they limited to biological organisms, or is it possible for a neural net to achieve them? More so, are these intrinsic qualities of the system, or are they emergent properties? Is it possible that reasoning and something resembling consciousness has or could spontaneously arise as a consequence of all of these transformer layers nested on top of each other? Is this what has occurred in our own evolutionary biology, with consciousness appearing as an emergent property as a consequence of our brain architecture? Are we the ghost in the machine, or are we actually the driver?
As I write this article, I am writing one word at a time. In some sense I am just “guessing” the next token in the same way that ChatGPT does. I have a stream of consciousness in my head, which is carrying the previous context of my whole experience, but equally I am not necessarily planning every word, rather they seem to emerge spontaneously. I am aware of the words popping into my head and my fingers typing, but I am not really aware of how this process happens, nor what internal functions are occurring. Is it possible that GPT and other LLMs are the same?
Those of us in the world of IT and programming are used to computers providing consistent outputs given the same input. In fact, we write long boring bits of code (tests) to validate that the same input creates the same output. This is not the case for these complex neural networks, as they are “non-deterministic”, meaning that if I ask the same question to ChatGPT 10 times, I’ll get 10 slightly different answers.
While the focus here has been on language, amazing strides have been made in visual neural networks. Mid-journey, a company a little over a year old can now produce photo-realistic images from a text prompt. This is beyond the scope of this article, but the way that this is achieved is equally fascinating. Using the same sorts of aforementioned neural networks, the system starts with a near random matrix of dots and works backwards to get a high-quality image. Imagine squinting and putting your eyes so far out of focus that everything is a hazy blur.
The system then gradually puts these dots through its model such that at each step it looks a little less dotty, and a little more like an image. At each pass, it compares it to whether it is something that is more like what it wants as the output, and assuming it is, it continues the process.
This is a huge over-simplification of a massively complicated process, but the results are astounding. The quality of the images is so amazing that it becomes very easy for fakes to occur. A famous such AI image showed Donald Trump being wrestled to the floor by the FBI and was circulated online causing some furore, before it was revealed that it was a fake.
As with the language models, these systems are not yet perfect. One of the “tells” in AI images can still be found around fingers and toes, with subjects often having more or less than their pentadactyl allowance. It is only a matter of time until this bug in the system is fully resolved and our descent into a post-truth world will be that little bit further along. These same processes are now even being used to create entirely AI-generated video clips, my favourite being a trailer for Star Wars as if it were a Wes Anderson movie.
One of the great concerns of a large number of respected scientists is whether we can go through a “singularity” event, where we as humans cease to be the dominant and most intellectually capable species on Earth. There is a so called runaway hypothesis, which assumes that if you can create a system able to improve itself, and you initially create a version “smarter” than the humans who created it, this becomes a very fast-feed forward loop. Because improvements don’t have a limitation on biology, it is possible that this rapid acceleration would trend ever upwards, and over a very short space of time could be dramatic.
If such a thing did happen, what would it mean for humanity? Moreso, would such a hyperintelligent system have values aligned with our own?
Another such hypothesis is called the “paperclip maximiser”, whereby an AI system is imagined to be given the job of producing paperclips, and before you know it the entire world has been transformed into one giant paperclip manufacturing plant to the diminishment of everything else. Single-minded goals set for such an AI could well cause catastrophic results, and even high-minded goals may be subject to highly unpredictable secondary consequences. It is certainly the case that we would need to be careful what we wished for.
On the brighter side, there is the possibility we end up in a period of abundance provided for by the brilliance of super-human intelligence. If AI can enable us to identify and manufacture new materials, batteries or foods, the possibilities may be endless. AI-powered humanoid robots (such as Tesla’s Optimus) may do the manual labour while all of us humans sit back, enjoy and focus on the things that an AI may lack – creativity, music, art, literature, philosophy. Whether we end up in a true utopia, something more brave new world, or even whether this leads us down the road to our own extinction is very much an unknown at this point, which is simultaneously hugely concerning and/or exciting.
So, what does all this AI advancement mean for those of us in the vet world, assuming we all still exist in the next century? My feeling is that like most emergent technologies, wherever we end up will be hard to predict at this point. For the short term, AI systems are much better placed to augment human abilities rather than to replace them. Properly tuned AI systems can improve productivity and outcomes, and the effects are already being felt in a number of industries. I doubt ours will prove to be an exception.
All of this hugely dramatic progress may suddenly come to a halt – either regulatory or technologically – and we may find that the great leap we just experienced with LLMs was a one off. Or we may find that before long there are intelligences created that exceed our own and can provide a much better diagnosis and treatment plan, more reliably, faster and cheaper than any of us could. Whatever happens I think it highlights the importance of human interactions in what we do. This is something a machine can mimic, but never replace, and an area where I think our sector can refocus its efforts.
What I believe we need to keep telling ourselves as a profession, is that our primary role is to provide veterinary services to clients and their pets and to do so in a human way, regardless of what other tools we use to augment the delivery.