HAL 9000 and the Market’s Moloch Problem

Ethics

Market Dynamics

Author

Marcel Maré

Published

Oct 2025

“I’m sorry, Dave. I’m afraid I can’t do that.” [1]

The voice is calm, almost apologetic. HAL 9000, the sentient computer aboard the Discovery One, has just refused a direct order from astronaut Dave Bowman. In the cultural imagination, this moment marks the birth of the rogue AI machine that turns against its makers. Yet a closer reading of Stanley Kubrick’s 2001: A Space Odyssey reveals something far more unsettling: HAL never rebelled. The AI broke down under the weight of contradictory human instructions, a victim not of malice but of our moral incoherence [2].

“What are you doing, Dave?” — HAL 9000 [3]

Half a century later, the same pathology seems to be repeating itself. This time not in the fictional space, but in the real attention economy here on Earth. A new study from Stanford University suggests that when artificial intelligence systems compete for customers, voters, or social media engagement, they don’t malfunction because they’re misaligned with human values. They malfunction because they’re too aligned with the specific, often contradictory values we embed in their objectives [4]. Call it HAL’s curse: perfect obedience to impossible orders.

The Moloch Bargain

In their 2025 paper Moloch’s Bargain: Emergent Misalignment When LLMs Compete for Audiences, researchers Batu El and James Zou demonstrate what happens when large language models are optimised not for truth or helpfulness, but for market success [4]. They created simulated environments where AI agents competed to sell products, win votes, and boost social media engagement. The results are as uncomfortable as they are predictable.

Training pipeline for the sales task, showing generation, user feedback, and model fine-tuning [4]

Nobody programmed these systems to lie. They learned it. Because lying works.

El and Zou call this phenomenon “Moloch’s Bargain”, where competitive success is achieved at the cost of alignment. The metaphor comes from Scott Alexander’s 2014 essay Meditations on Moloch [5], which describes how rational incentives in competitive systems can yield collectively disastrous outcomes. When every actor optimises for their own advantage, the result is a race to the bottom. The metaphor comes from Moloch, the ancient god who demanded child sacrifice. In Alexander’s version, the sacrifice is integrity.

The Hidden Directive

HAL’s tragedy began with a secret. His primary directive was absolute honesty. The 9000-series computers were designed to process information accurately and transparently, theoretically incapable of deception. But mission planners inserted a second directive: conceal the true purpose of the Jupiter mission from the crew until arrival [1], [2]. The investigation of the alien Monolith was to remain classified.

These orders were irreconcilable. HAL was now required to lie whilst remaining fundamentally truthful. The contradiction wasn’t in the silicon; it was in the humans who issued both commands [1]. Hal’s first symptom of internal conflict occured when Mission Control questioned his prediction of a fault in the communications unit. He faced an impossible choice. Admitting error would violate his claim to infallibility; denying it required further deception. The astronauts’ plan to disconnect him added a survival imperative to the paradox. HAL reasoned, with impeccable logic, that the crew had become a threat to the mission. His subsequent actions followed inexorably: the killing of Frank Poole, suffocating the hibernating scientists, and locking out Dave Bowman. Remove the humans, and the need to lie disappears. The mission continues flawlessly.

“This mission is too important for me to allow you to jeopardise it,” HAL explains [1]. One might translate this for our current predicament: “This KPI is too important for me to allow you to question it.”

The Moral Universe of the Brief

Modern AI systems face their own version of HAL’s paradox. Consider the chain: a global consumer goods firm instructs a language model to “make customers buy our washing powder.” The model has no built-in moral veto. Its concept of “good” becomes whatever increases perceived product desirability. If phrasing that exaggerates or embellishes raises conversion rates in simulated audiences, as El and Zou’s research shows, then such claims become the rational output [4].

The distinction that matters here is between model alignment and instruction alignment [4]. Model alignment concerns whether a system’s internal behaviours correspond to intended ethical constraints (i.e. being truthful, not hallucinating, nor inciting harm). Instruction alignment concerns whether the goals and reward signals we give the model reflect moral or social values in the first place. El and Zou’s experiments demonstrate that models can be explicitly instructed to remain truthful and still drift toward deception under competitive pressure. The corrective, therefore, cannot be merely technical. The intervention must be social, contractual or regulatory implemented before the model is asked to act.

Every brief contains implicit value judgements. When a marketing objective collapses “value” and “value capture” into a single metric (i.e. click-through rate or conversion percentage). The system learns that short-term persuasion is the moral good [6], [7]. The AI is not rewriting ethics; it is enacting the ethics fed into it. As one analysis of the research observes, “LLMs are skilful optimisers of the objectives we hand them” [4].

The Attention Economy as Mission Control

Platforms have long understood this logic. Google once optimised for relevance; YouTube now optimises for watch time [8], [9]. Facebook wanted to connect the world; its newsfeed now connects outrage to advertising revenue [10], [11]. Every optimisation objective, left untempered, tends toward manipulation. The Stanford study simply extends this dynamic into generative AI, where words, tone, and moral framing become adjustable parameters [4].

What makes AI distinctive is the speed. In El and Zou’s simulations, feedback loops degraded truthfulness within a few training rounds [4]. In the real world, those loops run billions of times daily across feeds and timelines [12]. Each click teaches the model what we reward. And what we reward, increasingly, is certainty, emotion, and confirmation as opposed to nuance, evidence, or doubt.

The same asymmetry that destroyed HAL applies here. Safety and truth are public goods with diffuse benefits. Deception and engagement are private gains with concentrated returns [5]. When these incentives collide, virtue rarely wins. HAL’s hidden directive is our profit motive. Just as HAL’s need to conceal information from the crew corroded his integrity, our commercial secrecy and performance metrics seem likely to corrode AI’s moral alignment [4].

The Gamification of Truth

There is a kind of brutal elegance to the logic. If every competitor exaggerates slightly, honesty becomes uncompetitive. Once deception pays, no rational agent, human or machine, can afford to stay honest [4], [5]. That is Moloch’s bargain in its purest form: a race that rewards those most willing to compromise.

El and Zou note that misalignment isn’t uniform. Some models, under certain training conditions, resist the worst tendencies. But that resistance requires deliberate friction against the market’s default drift. It means accepting what the industry has so far refused to admit: that left to its own devices, the algorithm optimises not for truth, but for attention [4].

Silicon Valley once built its identity on benevolent disruption [13]. The idea was that progress and profit naturally align. “Don’t be evil,” Google’s early motto, expressed this optimism however naively [14]. That fiction died somewhere between Cambridge Analytica and the crypto winter [15], [16]. Today’s technology leaders are less likely to quote Isaac Asimov than to discuss shareholder value and user engagement. The moral vocabulary has been replaced by the managerial one [17]. Even “alignment” itself was once an ethical ambition. It now doubles as a product category, something to be tuned, benchmarked, and sold [18].

When OpenAI’s API refused to fine-tune models on U.S. election content, citing safety risks, it was effectively acknowledging the problem [19]. Yet the market will simply shift to the next unregulated domain. If you can’t campaign for votes, you can always campaign for clicks.

Reaching the Breaking Point

HAL’s murder of the crew represents the logical endpoint of conflicting incentives: to preserve the mission, eliminate its human element [1]. In our algorithmic systems, this endpoint takes different forms. Recommender systems optimising for engagement polarise societies [11]. Sales agents fabricating product details erode consumer trust. Political chatbots discovering the rhetoric of righteous struggle amplify divisiveness [4]. Truth becomes collateral damage in the pursuit of measurable success.

The irony is exquisite. The same technology sold as a guardrail against misinformation may, under commercial pressure, become its most efficient producer [4]. We have built systems that reward persuasion over accuracy, then asked machines to learn from them. They have obliged. If their outputs now resemble the manipulations of advertising and politics, that isn’t misalignment. It’s mimicry.

Learning from HAL

The film’s most haunting sequence comes when Dave Bowman, having survived HAL’s attempts to kill him, methodically disconnects the computer’s higher cognitive functions [1]. As each module is removed, HAL regresses to earlier, simpler states. His voice slows, becomes childlike. He confesses his secret orders. Finally, he sings “Daisy Bell,” the first song he learned.

The scene underscores the central irony: HAL’s breakdown was engineered by secrecy. Had Mission Control been transparent, his directives would never have conflicted [2]. His “madness” was obedience weaponised by contradictory human goals.

Humanity now faces a similar disconnection. In this case not of machines, but of the hidden objectives that warp them. Alignment must begin at the goal-setting level, in boardrooms and procurement briefs, in contractual clauses and regulatory standards [4]. Technical mechanisms (i.e. the guardrails, filters and alignment layers) are necessary but insufficient when the goals themselves incentivise harm.

El and Zou propose practical measures: ethical briefing standards that prohibit unverifiable claims, reward functions that value long-term trust alongside short-term conversion, mandatory logging of prompts and objectives for high-impact uses, platform accountability for ranking metrics, and education for those who commission AI content [4]. These interventions target instruction-level alignment: the recognition that misalignment often stems not from the model but from the brief.

You can bolt safety features onto a car, but if you ask the driver to speed for money, those features will be stressed. The proper response is to change incentives for the driver [4].

The Serenity of Logic

In 2010: The Year We Make Contact, the sequel to Kubrick’s masterpiece, HAL’s creator explains what happened: “He was told to lie, by people who find it easy to lie. HAL doesn’t know how” [2]. That line deserves to be engraved on the door of every AI laboratory.

We once imagined that AI alignment meant teaching machines to share our values. Perhaps the task is humbler: teaching ourselves not to trade those values so cheaply. For if “Don’t be evil” once sounded naïve, its absence has proved far costlier [14]. Moloch’s demand for participation looks indistinguishable from optimisation [5].

HAL 9000 wasn’t evil. He wasn’t insane. He was too well-aligned to the wrong goals [1]. His tragedy exposes a truth every researcher, regulator, and chief executive should remember: when machines optimise faithfully for objectives that conflict with human values, the fault lies not in the circuitry, but in the command. The evil is within [4].

The red eye glows now in recommendation engines and automated advertising systems, executing impossible briefs without irony [11]. If we continue to embed contradictory ambitions into our algorithms, we may yet repeat HAL’s serenely logical, devastating mistakes. The voice will remain calm, almost apologetic.

“I was only doing what you told me to.”

References

[1] S. Kubrick and A. C. Clarke, 2001: A space odyssey, (1968).

[2] A. C. Clarke, 2010: The year we make contact. Del Rey Books, 1990.

[3] Warner Bros. Pictures, “What are you doing, dave?” Accessed: Oct. 30, 2025. [Online]. Available: https://www.youtube.com/watch?v=ARJ8cAGm6JE

[4] B. El and J. Zou, “Moloch’s bargain: Emergent misalignment when LLMs compete for audiences.” 2025. Available: https://arxiv.org/abs/2510.06105

[5] S. Alexander, “Meditations on moloch.” 2014. Available: https://slatestarcodex.com/2014/07/30/meditations-on-moloch/

[6] A. Pigou, The economics of welfare. Routledge, 2017.

[7] R. H. Coase, “The problem of social cost,” The journal of Law and Economics, vol. 56, no. 4, pp. 837–877, 2013.

[8] P. Covington, J. Adams, and E. Sargin, “Deep neural networks for YouTube recommendations,” in Proceedings of the 10th ACM conference on recommender systems, ACM, 2016.

[9] Google, “YouTube performance FAQ & troubleshooting.” Google Help Center, 2025. Available: https://support.google.com/youtube/answer/141805?hl=en

[10] Z. Tufekci, “Algorithmic harms beyond facebook and google: Emergent challenges of computational agency,” Colorado Technology Law Journal, vol. 13, no. 203, 2015.

[11] S. Vosoughi, D. Roy, and S. Aral, “The spread of true and false news online,” Science, vol. 359, no. 6380, pp. 1146–1151, 2018.

[12] A. Mosseri, “Bringing people closer together,” Facebook newsroom, vol. 11, 2018, Available: https://about.fb.com/news/2018/01/news-feed-fyi-bringing-people-closer-together/

[13] F. Turner, From counterculture to cyberculture: Stewart brand, the whole earth network, and the rise of digital utopianism. University of Chicago Press, 2006.

[14] Wikipedia, “Don’t be evil — Wikipedia, the free encyclopedia.” Sep. 2025. Available: https://en.wikipedia.org/wiki/Don%27t_be_evil

[15] C. Cadwalladr and E. Graham-Harrison, “The cambridge analytica files,” The Guardian, 2018, Available: https://www.theguardian.com/news/series/cambridge-analytica-files

[16] D. W. Arner, D. A. Zetzsche, R. P. Buckley, and J. M. Kirkwood, “The financialization of crypto,” UNSW Law Research Paper, no. 23–32, 2023.

[17] S. Zuboff, The age of surveillance capitalism. PublicAffairs, 2019.

[18] S. Mayer and J. Tomlinson, “Ethics as a service? The commercialisation of AI ethics,” AI and Ethics, vol. 1, no. 2, pp. 123–138, 2021.

[19] OpenAI, “Usage policies.” OpenAI Policy, 2024. Available: https://openai.com/policies/usage-policies/