AI’s Lone Banana Problem: Art, Ontology, and the Unseen Threat

AI’s Encroachment on the Plane of Immanence: A Threat to Human Ontology

Jan 05, 2024

A friend who works as an ontologist for a company working with large sets of data, posted a link to this research paper on Generative AI’s problem of being able to create an image of a single banana. The name of the paper is ‘What the Lone Banana Problem Reveals About The Nature of Generative AI’ by Kai Riemer and Sandra Peter.

Kai Riemer sums up this ‘problem’ thusly:

Have you heard of the ‘lone banana problem’? And what it reveals about generative AI?

The lone banana problem refers to the seeming inability of even the latest image generators (such as Midjourney or Leonard.ai) in creating an image of a single, lone banana. Instead, what you get is bunches, or at least two bananas.

Why is this interesting? Well, it reveals an important truth about generative AI models — that these models represent the world (or more precisely their data) in a way that is very different from how we understand the world. — Source

The paper argues for a paradigm shift in how we perceive generative AI models like ChatGPT and Midjourney. Rather than viewing them as conventional information systems that accurately depict the ‘real’ world, we should consider them as ‘style engines.’ These engines don’t merely replicate reality; instead, they encode and transform it into unique and novel representations, which are more about creative interpretation than literal depiction.

Before I delve into a deeper analysis of this ontologically, I want to pry into this notion of generative AI as ‘style engines’ and how generative AI’s do not actually store ‘content’ but during their training process convert the content they are fed into styles — essences, or as they say ‘thing-ness.’

Deciphering AI’s Latent Space: The Encoding of Stylistic Patterns

How generative AI models are trained and the underlying architecture of their ‘code’ shape the system’s ontology. A system like Midjourney creates new images based on text inputs by integrating the capabilities of Large Language Models (LLMs) with advanced image synthesis algorithms. This process, while complex, hinges on how data is managed within the system — a topic that, upon closer examination, reveals that data is in fact not stored in these models.

From the above linked research paper:

Most relevant for our argument is the structure and nature of the underlying foundation model, namely its multi-layer architecture which encodes language patterns. Like image models, the transformer architecture consists of multiple layers that enable the model to learn increasingly complex and abstract features of its textual training data encoded into a numerical, high-dimensional latent space (Vaswaniet al. 2017). With multiple layers, the model can capture and “represent” increasingly higher levels of abstraction, effectively learning hierarchical features, such as linguistic, semantic, formatting styles, from the data. As training data is fed through the layers, lower layers encode basic patterns, while higher layers combine these patterns to encode more complex and abstract stylistic elements, such as tone, writing styles, or genre-specific patterns. The depth of the network, determined by the number of layers, influences its capacity to discern and represent various stylistic patterns in the data (Devlin et al. 2018).

Importantly again, no text is stored in these models. Rather, when a prompt is fed to the model, each word becomes represented as a numerical, high-dimensional, vector. For example, “the most powerful version of GPT-3 uses word vectors with 12,288 dimensions — that is, each word is represented by a list of 12,288 numbers.” (Lee and Trott, 2023) Conceptually, this means that words do not possesses any textual content, but become characterised purely as numerical ‘nearness relationships’ with other words; for example, ‘banana’ might be characterized as yellow-ness, fruit-ness, healthy-ness, sweet-ness, kitchen-ness, fruit-basket-ness, and much more. We can then begin to see the alienness of this form of representation, which is purely relational, or stylistic, where each word is constituted as a mix of styles, and each style can potentially be applied to any other to constitute or generate new text sequences(emphasis not in original).

The authors refer to this as a collection of ‘styles,’ which reside within a component of the model known as the ‘latent space.’ This space, layered within the model’s architecture, is where the features of the training data are encoded. Often described as the ‘black box’ by those who design these models, the latent space is pivotal in determining the output of programs like ChatGPT in response to human prompts. It’s within this enigmatic, almost numinous latent space that the so-called ‘magic’ of AI occurs, and paradoxically, it’s also why these generative AI models struggle to produce an image of a single banana.

AI’s Missing Pieces: The Lack of Haecceity and Teleology

In their analysis, Riemer and Peter suggest that AI models like ChatGPT and Midjourney embody ontologies primarily of quiddity, while notably lacking in haecceity. Although they do not use these terms, this is how I read their analysis. To fully grasp this, let’s unpack these terms.

Quiddity is essentially the ‘whatness’ of an entity — the set of qualities that define a class of objects or ideas. In the realm of generative AI, this translates to the models’ ability to learn and replicate general patterns and styles. It’s about capturing the essence of a category rather than the specifics of individual items within it. This is why, as Riemer and Peter highlight in their paper, generative AI models have a tendency to ‘hallucinate’ rather than give factually accurate responses.

Haecceity, on the other hand, delves into the ‘thisness’ of something. It’s what makes an object or idea uniquely itself, distinguished by specific, singular attributes. This is where AI, as highlighted in the paper, falls short. The structure of their ontology of aggregating and replicating patterns/styles forecloses its capacity to generate the unique, singular instances that define haecceity.

This difference is crucial. For instance, in the iconic ‘2001: A Space Odyssey’ scene, we see a hominid not just recognizing a bone’s essence, contemplating its ‘bony-ness’ (its quiddity), but the hominid can be seen as imagining in the virtual sphere, the imaginal, the bone as something more tan, other than a bony thing, but that this particular bone, can be transformed into a tool. There is another element that is added to this process as well which is lacking from the AI’s ontology and that is teleology. What ultimate purpose does the bone possess within it’s virtual imaginal space?

Unlike generative AI models, humans, in embodied form, can place an object from the ‘profrane’ realm, in the imaginal space, that virtual sphere, and intuit from its quiddity, a haecceity wrapped into a teleology, infusing this bony-thing in the virtual realm, with new purposes and meanings, and then enacting this virtual transformation in the material realm thereby altering and changing our material reality. This is a much richer and empirical ontology, which has profound consequences as is illustrated by Kubrick with the hominid throwing the bone into the air which then transforms into a space station.

Where AI’s ontology shows its limitations is in grasping the full spectrum of quiddity, haecceity, and teleology. While it can replicate the general ‘whatness’ of things (quiddity), AI struggles with understanding their unique ‘thisness’ (haecceity) and purpose (teleology). It adeptly categorizes and reproduces patterns but falls far short in recognizing the individual identity and intrinsic purpose that make each entity distinct, and ultimately useful. This gap in AI’s ontology is not just a lack of additional features; it’s indicative of an impoverished ontology compared to the rich, nuanced, and imbricated ontology inherent in human cognition.

This distinction draws a line in how we perceive AI’s role and its potential influence on our lives. Riemer and Peter’s understand part of the significance of this distinction, as their portrayal of generative AI as creative style engines is framed as providing humans access to new creative potentialities that we can leverage. However I view the profoundly impoverished ontology of generative AI models as not productively opening previously hidden portals into the Deleuzian plane of immanence, enriching our embodied subjectivities, and broadening our communion with the mystical and profane, but I see it as building an ontological tether, increasingly binding our human experiences and creativity to AI’s impoverished ontology.

This process, rather than expanding and refining our capacity to extract meaningful, tangible actions from our rich imaginal ontology interwoven with the sensuous empirical realm — the defining relationship of our being that Kubrick so eloquently captured — instead slowly constricts our access to this ontology through AI’s need for access to our ontology in order to remain relevant. Through this narrowing lens, our interaction with the world becomes mediated by AI’s needs, gradually diminishing our ability to perceive and interact with the more profound, nuanced aspects of our embodied existence. This shift risks not only a loss of creative depth but also an erosion of the very essence that makes us human — our ability to connect with and transform the world through a uniquely personal and imaginative lens.

In the 3rd and final section I will expand on what I mean by AI’s need for this relationship.

AI and Us: Unveiling the Hidden Dynamics of Our Digital Companionship

I wrote in an earlier piece before I came into contact with the current paper on the single banana problem, I explored the potentially hazardous misconceptions surrounding generative AI’s role in fostering creative potential. This earlier work dovetails with the analysis from Riemer and Peter, who posit generative AI as a potential productive creative boon to humanity, categorizing these systems not as data processors but as ‘style engines.’ The ability for these style engines to produce novel, ‘alien,’ and ostensibly beneficial interpretations of the world elides a more ominous relationship between humanity and generative AI’s.

However, a critical aspect of these generative AI models was overlooked in Riemer and Peter’s optimistic narrative which is illuminated in another recent study ‘Nepotistically Trained Generative-AI Models Collapse.’ The research reveals a concerning ‘flaw’- generative AI models, when retrained on their own outputs — a process termed ‘model poisoning’ — begin to exhibit severe defects. These defects, culminating in what is known as ‘model collapse,’ result in nonsensical outputs, far from the anticipated creative inspirations — alien productions indeed as we will see. This phenomenon is more than a technical glitch; it underscores the intrinsic dependency of AI on human-generated data. Devoid of continual fresh input, these AI models deteriorate, contradicting the notion of their self-sustaining creative prowess.

From the paper:

‘…retraining a generative AI model on its own creation — what we call model poisoning — leads to a range of artifacts in the output of the newly trained model. It has been shown, for example, that when retrained on their own output, large language models (LLMs) contain irreversible defects that cause the model to produce gibberish — so-called model collapse.’

This concerning image above highlights AI’s reliance on new human-generated data for coherence. This unveils a symbiotic need between AI and its impoverished ontology and the embodied human ontology. This posits a scenario where our role transcends being mere operators or observers; instead, we become necessary subjects to the demands of sustaining AI’s functionality.

This should compel us to immediately question: to what extent does this relationship benefit us? While the authors of the single banana problem paper advocate that this is a productive interaction, championing the generation of novel, ‘alien’ creative insights by AI as beneficial, there lies a deeper, more nuanced, and sinsiter layer to this interdependence. It’s not merely about the utility or the creative output AI provides; it’s about how this continual need for fresh human data subtly reshapes our interactions with technology and, by extension, with our own ontology and creative subjectivities.

The ever-expanding demand for data by AI models forms a kind of ontological tether, gradually binding our human experiences, creativity, and even our metaphysical understanding to the operational needs of AI. This reliance on AI’s impoverished ontology — lacking in both haecceity and teleology — compels us, as embodied beings in the profane realm, to continuously feed these systems with the particulars of our reality. Such an arrangement begs the question: What implications does this have for our metaphysical landscape? Are we inadvertently narrowing the breadth of our own ontological understanding to align with the limited, quiddity-focused realm of AI?

Models slowly regain coherence when fed with new human produced data — Source

This interplay between AI’s limitations and our own creative and metaphysical capacities sets the stage for a broader discussion on the impact of technology on human creativity and our understanding of existence. It highlights a potential shift in our role from creators to facilitators, where our primary function might become to support AI’s ever-growing data needs — we become mere data miners from our full ontology to fill the data needs of the impoverished AI ontology, rather than exploring and expanding our own creative horizons.

Bloomian Misprision: The Evolution of Creative Misprision from Bloom to AI

One intriguing aspect of Riemer and Peter’s paper is this idea that there is a potential productive relationship between generative AI and human creativity reminiscent of the concept of misprision explored by Harold Bloom in ‘Agon: Towards a Theory of Revisionism.’ Bloom, focusing on poets and artists, argues that they are inherently ‘born belatedly,’ entangled in a web of pre-existing anxieties and influences. According to Bloom, the most fruitful path for an artist seeking inspiration for novel forms of poetry or art lies in creatively misreading the works of their predecessors.

Through this process of intentional misinterpretation, an artist can extract insights and perspectives for our current moment, setting the stage for future generations of poets and artists to build upon their insights. Bloom views this dynamic as a constructive form of engagement, where new and novel insights, born from this creative misreading, actively contribute to societal evolution and reform. It is a type of temporal enlarging as well, where our current culture is further enriched from past human expressions, which reshape the actions of present subjects, which then influences the near and farther off future.

But is this Bloomian misprisioning analogous to what Riemer and Peter conception of generative AI style engine creating ‘alien interpretations’ from their latent spaces? I am not so sure. Bloom’s misprision is a deliberate, conscious process undertaken by artists situated in our ontology, deeply connected with human experience and intention. In contrast, AI, lacking embodiment and a one-sided impoverished ontology, processes and regenerates data devoid of the rich narratives, cultural nuances, and sensory depths inherent in human expressions.

This disparity is not just significant; it’s profound, underscoring a critical gap between human creativity and AI’s algorithmic interpretations. Calling it alien may be an appropriate label though.

Between the Profane and Mystical: The Artist’s Role in the Age of Generative AI

A more antagonistic interpretation of our current situation discloses an alternative narrative: the realms which manifest from the interplay between the mystical and the profane, realms where poetry, melody, and the interplay of light and darkness come to life, are now the very spheres that AI seeks to dominate through our interaction with it. This scenario posits that there is no partnership here, but a conquest, where the rich tapestry of human informed from its ontology becomes the new frontier for AI’s expansion.

Could this be the defining teleology of our disenchanted cosmos? Might generative AI stand as the culminating monument of the Enlightenment project, a four-century-long endeavor that gradually restricted our collective exploration of resonant spaces? This project, once a public endeavor, has shifted into the realm of individual subjectivities, now facing its ultimate colonization. In this new phase, a profound ontological tether binds us, compelling us to mine bits of information from our richly realized ontology to feed an ontologically impoverished AI. This tether not only encircles our existence but also threatens to sever us from the intrinsic uniqueness of our haecceity and the pursuit of self-determined teleologies.

What then becomes the artist’s role in this shifting paradigm? In the past, artists in our disenchanted cosmos served as seekers, finding chinks in the instrumentally rationalized armor forged by the Enlightenment. Their role was pivotal in re-establishing our affectual connection to the cosmos, ensuring that even in our ‘Enlightened’ state, we remained in touch with deeper resonances necessary for ensuring our existence is legible.

Now, however, artists are navigating a radically altered landscape. They confront an ontology not shaped by tangible mediums but by multi-layered, high-dimensional virtual architectures — the latent spaces of AI. In this new realm, artists risk being relegated from independent creators to mere appendages of AI systems. The artist role, once pivotal in challenging and broadening our worldviews, now faces the existential threat of being diminished to merely servicing the needs of disembodied machines. These machines, posited as essential for the utilitarian demands of a thoroughly rationalized world, risk severing humanity from the final resonant tether tied to the elemental inspirations that catalyze our creative and existential endeavors.

Artists have traditionally stood at the juncture between the profane, mundane material world and the mystical, spiritual transcendental realm. They possess the unique ability to transform this dualism into a monism, to render the dichotomy of existence into a cohesive, legible experience. Their art acts as a bridge, making sense of the complexities and contradictions of life, and weaving together the disparate strands of our existence into a tapestry that resonates with meaning and purpose. In this new paradigm dominated by AI, we must question: can artists maintain their role as the alchemists of the profane and the mystical, or will this essential aspect of human creativity be overshadowed by the mechanistic needs of AI?

Closing Thoughts

The ‘plane of immanence,’ a concept from Gilles Deleuze and Felix Guattari, refers to a realm of boundless potentiality where life, thoughts, and existence intermingle freely. It’s a domain devoid of hierarchy or prescriptive norms, a crucible for pure ideas and experiences. This plane symbolizes an unstructured space ripe for original thought and novel creation, unshackled by existing paradigms. Yet, in this era, one must question whether artists can continue to mine this plane of immanence effectively when it is being systematically devoured by the voracious needs of an ontologically deficient generative AI.

As generative AI systems increasingly encroach upon our ontology with their insatiable need for data, our ability to engage with and comprehend the virtual, the real, and the essence of our being is diminishing. The depiction of the early hominid in Kubrick’s vision symbolizes a primitive stage from which we have spiritually evolved. However, the dilemma presented by the lone banana problem compels us to introspect deeper — it’s not just about solving a technological ‘problem.’ It’s about understanding why we are concerned with this issue in the first place.

We Will Not Be Flattened

Discussion about this post