22 February 2024. AI | Systems
The uncanny world of OpenAI’s Sora video product // Systems and boundaries
Welcome to Just Two Things, which I try to publish three days a week. Some links may also appear on my blog from time to time. Links to the main articles are in cross-heads as well as the story. A reminder that if you don’t see Just Two Things in your inbox, it might have been routed to your spam filter. Comments are open. And — have a good weekend.
1: The uncanny world of OpenAI’s Sora video product
Suddenly—so unlike this time last year—my feed is full of smart articles critiquing the evolution of AI in ways that make sense. In this piece I’m just going to try to pull out some strands from a couple of these, talking about the latest OpenAI product, a text-to-video application that the company has called Sora.
Sora, by the way, is a Japanese word meaning ‘sky’, but we’ll come back to that.
Generally the responses to the short demo videos that the company has produced have been gushing. But for Edward Zitron, in his newsletter Subprime Intelligence, the experience of Sora was more ‘uncanny’:
These videos — which are usually no more than 60 seconds long — can at times seem impressive, until you notice a little detail that breaks the entire facade, like in this video where a cat wakes up its owner, but the owner's arm appears to be part of the cushion and the cat's paw explodes out of its arm like an amoeba .
(Source: Open AI)
There’s a moment in this one, of a Japanese woman walking through the street of Tokyo, where her legs swap sides. You don’t notice this exactly at first—that’s not really what we do when we watch things—but even when you don’t notice it specifically you know there’s something that’s not quite right:
Sora generates responses based on the data that it has been trained upon, which results in content that is reality-adjacent, but not actually realistic. This is why, despite shoveling billions of dollars and likely petabytes of data into their models, generative AI models still fail to get the basic details of images right, like fingers or eyes, or tools .
Everyone who comes to look at AI with their critical faculties still at work comes back to this central issue sooner or later. These glitches are not something that will get better, because AI doesn’t actually know anything.
Despite what fantasists may tell you, these are not "kinks" to work out of artificial intelligence models — these are the hard limits, the restraints that come when you try to mimic knowledge with mathematics.
In text, it is harder to spot an error (I’m not going to call it a “hallucination”) because it still looks like text. You have to process it to check for the falsehoods. (Except when ChatGPT simply goes off the rails, of course). But when we use our visual senses, we are looking for errors all the time, maybe for evolutionary reasons. The thing that didn’t look right might be the thing that was going to kill us. We learn a lot and we know a lot about what our everyday world looks like.
I believe artificial intelligence companies deeply underestimate how perfect the things around us are, and how deeply we base our understanding and acceptance of the world on knowledge and context. People generally have four fingers and a thumb on each hand, hammers have a handle made of wood and a head made of metal, and monkeys have two legs and two arms. (Emphasis in original)
On his newsletter, Matt Alt discusses the question of ‘why Japan?’ The piece is partly in front of a paywall. In the 1980s, Japan was an object of economic terror to America, but obviously things have changed since then.
OpenAi has frothed up some brand comms noise about Sora meaning ‘sky’ meaning ‘limitless creative potential’, and Alt suggests that the attraction of Japan is that it’s seen as safe.
It’s also exotic, and these images of Japan aren’t really images of Japan, but of American Orientalism:
(T)hose who know Japan will find themselves quickly sliding into the uncanny valley of gibberish street signs and snow on the ground in cherry blossom season and all of the other assorted janky weirdness that comes with generative AI. That weirdness isn’t a bug, but a sort of feature.
But reading this together with Zitron’s piece makes me also wonder if there’s another attraction in using Japan as a location, as OpenAI has with many of its initial videos for Sora. Japan is strange enough that it’s harder for the majority of those watching to know what normal looks like.
AI fabulists (Zitron uses the word ‘fanatics’) have extrapolated from the appearance of these 60-second soundless clips to a world where everybody is their own video director. Because AI, right? Zitron suggests this might be an error.
(W)e are rapidly approaching the top of generative AI's S-curve , where after a period of rapid growth things begin to slow down dramatically. While Sora and other video generators like Pika may seem like the future (and are capable of some impressive magic tricks), they are not particularly adept — much like a lot of generative AI — at performing a particular task.
Zitron points out that there aren’t actually that many use cases out there that are credible (I pointed to a couple of other people who had reached similar conclusions here last year.)
And the numbers aren’t as exciting as they were last year either.
User numbers are stagnating if not declining. Traffic is down 11% from its peak in May last year. A McKinsey survey in August 2023 reported that only a quarter of businesses that had adopted AI said that it contributed more than 5% to profits—the same proportion as in 2022.
The companies that are leading the AI charge are almost certainly losing money—their fundraising is a classic venture capital pitch about not missing out on some unspecified future growth:
Tech's largest cash cow since the cloud computing boom of the 2000s is based on a technology that is impossibly unreliable... What these stories don't seem to discuss are whether these companies are making a profit, likely because generative AI is a deeply unprofitable product, demanding massive amounts of cloud computing power.
Yes, OpenAI made $1.6 billion last year, but AI companies have much worse margins than most software companies because of the costs of building and maintaining their models. Yet the CEO Sam Altman is telling investors that he needs to raise $7 trillion—the size of the combined GDP of several European states—to build chips to bring costs down.
Incidentally, it’s also worth noting a recent paper that says it has demonstrated mathematically that as Large Language Models start using the output of Large Language Models in their training sets, their output deteriorates quickly—they start to “choke on their own exhaust”.
Zitron’s article has a lot more detail on some of the business relationships around the leading AI companies, but it’s one of his wider points that caught my eye:
This industry is money-hungry, energy-hungry, and compute-hungry, yet it doesn't seem to be doing anything to sustain these otherworldly financial and infrastructural demands, other than the fact that people keep saying that "artificial intelligence is the future."... Public companies are pumping their valuations and executive salaries off the back of artificial intelligence hype, yet nobody is saying the blatantly obvious — that this industry is deeply unprofitable and yet to prove its worth.
2: Systems and boundaries
As readers will know, I like to pick up on models and frameworks which might be helpful, and I liked a recent article by Akanimo Akpan and Colleen Magner of Reos partners on a mnemonic that helps people apply systems thinking to create social change.
They define systems thinking like this:
A system is the interaction of relationships, interactions, and resources in a defined context. Systems are not merely the sum of their parts; they are the product of the interactions among these parts. Importantly, social systems are not isolated entities; they are interconnected and subjectively constructed, defined by the boundaries we establish to understand and influence them.
Systems work can get too complex to work with effectively quite quickly, as they say, and the mnemonic is designed to help you to walk through the stages in a structured way.
F - frame the challenge as a shared endeavour
E - establish a diverse convening group
N - nudge inner and outer work
C - centre an appreciation of complexity
E - embrace conflict and connection, chaos and order
D - develop innovative solutions that can be tested and scaled.
Wooden fences, Conlig, County Down. Photo: Albert Bridge, via Geograph. CC BY SA 2.0.
Some extracts from each of the stages here.
Frame the challenge as a shared endeavour
One of the features of systems is that they are generally too complex for a single stakeholder to make effective change on their own, so they need to be framed in such a way that invites in multiple stakeholders.
Good framing of the challenge lays the ground for collective ownership to emerge and allows for meaningful stakeholder engagement and buy-in. It involves moving away from the individual view of owning the problem to making room for shared ownership.
Well, I don’t like the word “buy-in” much — it’s usually used by people who want other people to see things their way, rather than shared ownership. Commitment might be better.
E - establish a diverse convening group
To think and act in a systems thinking way is to embrace a holistic perspective. This means seeing the problem from all possible angles. Central to achieving this holistic perspective is establishing a diverse convening group that will ensure the process and stakeholders are sufficiently diverse.
N - nudge inner and outer work
This might have been one of those places where it was a bit of a struggle to make the acronym work, but the idea here is that the people who are working on the system are also part of the system. Systems work is “second order” activity, not “first order” work.
This inner dimension is crucial because one's mental models, beliefs, and assumptions influence how one perceives and interacts with the system. In our work, we talk about critical meta-competencies like reflexivity, empathy, flexibility, courage, and curiosity, which are required to shift mental models and beliefs for sustainable progress on an issue.
C - centre an appreciation of complexity
They talk about three types of complexity here: dynamic complexity, where cause and effect are a long way apart in time and space; social complexity, where actors have different perspectives; and generative complexity, when you need to work with emergence to see patterns in the system.
E - embrace conflict and connection, chaos and order
Living with the mess, I suppose you might call this, although they don’t use that phrase. One of the insights here is:
This might lead to a situation where the actors do not like or trust each other. We have found that liking or trusting each other is not a necessary requirement for collaboration.
Often these processes need to move through cycles of conflict and collaboration if they are going to get anywhere.
D - Develop innovative solutions that can be tested and scaled
As Dave Snowden points out in his Cynefin model, complex systems don’t have best practice associated with them. Instead, they have emergent practices.
Finding solutions to intractable challenges involves identifying key systemic leverage points where interventions can be most effective, and creatively developing, testing, and adapting new initiatives to address systemic challenges by cross-sector innovation teams.
The authors talk about FENCED as being a reminder that good systems work also needs to understand what the boundaries of the system are:
The system's boundaries help in understanding what is included within the system and what lies outside. It provides a structured framework for analysis and synthesis where carefully considered actions are made.
But FENCED is not a sequence; instead it’s described here as a set of practices that co-evolve as you do the work. Maybe they’re best thought of as a set of guide rails. One of the reasons I liked it, and this was just because it made me smile, is because one of examples that Peter Checkland used to use, when he was explaining soft systems thinking, was a system for painting a fence. I suspect that is just a coincidence.
UPDATE: Community assets
I mentioned earlier this week the British government’s plan to get local authorities to sell off their assets to reduce (or perhaps delay) the risks of bankruptcy. I wasn’t a fan, since even councils whose financial position was wrecked by market speculation were encouraged to speculate by the government as an alternative to a fair financial settlement.
I see that at the UK in a Changing Europe Jack Shaw of the Bennett Institute has assessed this proposal. He’s not a fan either: he has six reasons why it’s a bad idea, from the tactical to the principled.
This is an Update, so I’m not going to go through all six. In brief, though, there’s a bunch of market issues: it costs money to sell, it may not be the right time to sell, you may not get the money when you need it. There’s the obvious economic issue—it’s a one off solution to a continuing problem. And then there’s the ethical and political point:
(A)ssets may already be fulfilling productive uses, such as supporting the social and human capital of communities. In some cases, they are instrumental to boosting civic pride , which remains one of the missions enshrined in the government’s Levelling Up White Paper... At root, the government’s failure to make additional investment available for authorities means they may have no other option but to sell assets, even if it’s not in their long-term interests.
It’s not as if councils have been sitting on assets and hoarding them. In 2019, the Bureau of Investigative Journalism found that during the 2010s councils had sold off £9 billion of assets.
As a campaigner in Manchester told them,
Once you sell off a building like that, it’s lost to the community forever.
j2t#545
If you are enjoying Just Two Things, please do send it on to a friend or colleague.