The Division of Interpretive Labor
A biased comparison of Claude-style and GPT-style prose
This post contains the full text of an annotated Claude-generated essay. Since it was generated, it is not sent as an email to subscribers, only published here.
But why read a Claude-generated essay at all? The full answer to this question is in the catalog of Claude-prose patterns. The short answer is that this essay, while trying to avoid overuse of cheap rhetoric, failed to do so and then I turned it into an illustration of it by annotating it with the patterns from the catalog. The patterns are indicated as bolded text followed by [pattern identifier], which is also a link to the respective catalog entry. The two demonstration paragraphs (the “dense” and “textured” pair) are left unmarked. I have added my comments as separate paragraphs in italics.
The comparison usually offered between “Claude-style” and “GPT-style” prose treats them as two fixed voices. They are better understood as two defaults – the prose each model produces when a prompt specifies a task but not a register. [SDA][CB] Both models are steerable across almost [RH] the whole range; what differs is where each lands when left alone. [MCS] The object of analysis is therefore not a style but an attractor [CB]: the center of gravity of a distribution. [CR]
That’s a good point. It also means that if you use the current catalog as an instruction to censor the style, it will simply find new attractors.
Four caveats belong at the front [MS], because they shape everything that follows. [SK]
SK often follows MS. It doesn’t occur frequently in this essay, but is otherwise frequent enough to be treated as a compound pattern.
First, the common claims rest on impression rather than measurement[CB]; below I try to specify [MS] what would make them testable. Second, the defaults shift with each model release, so any description of them dates quickly. Third, this essay is written by one of the systems under examination, under instructions that pull it toward the opposite end of its own native range [AHM]; the reader should weigh that. Fourth, and most important [SRC], the analysis is evaluative-neutral. Neither default is the better one. Each is a trade with symmetric costs, and any sentence below that seems to praise one tendency is meant to be readable, with equal force, as naming its price. [MS]
With those in place, the cleanest organizing idea is this. [SRC][SH] The two defaults differ mainly in how they divide interpretive labor between writer and reader. Claude-style prose tends to [RH] do more of that labor in advance; GPT-style prose tends to [RH] leave more of it to the reader. Both framings cut two ways. Doing the work in advance can be considerate or it can be patronizing; leaving it to the reader can be economical and a mark of respect for the reader’s competence, or it can be underspecified and a way of offloading the writer’s job. [MCS] The mechanism by which either tendency operates has a name. [SH]
The mechanism: metadiscourse
Metadiscourse, a term developed by Ken Hyland out of work in systemic-functional linguistics, is language about the discourse rather than about the world [CB]. “Higher rates lower prices” is propositional; “it’s worth seeing why this matters” is metadiscourse – it carries no fact, only an instruction about how to read what follows. [SDA][CB] Hyland sorts it into two families. Interactive metadiscourse organizes the text for the reader: transitions, frame markers, code glosses, evidentials. Interactional metadiscourse positions writer and reader toward the content: hedges, boosters, attitude markers, self-mention, and engagement markers.
The features usually attributed to Claude-style prose are, with few exceptions [RH], a heavier use of metadiscourse; the features attributed to GPT-style prose are its relative thinning. That reduction is useful, because [SS] metadiscourse is countable. “Model X hedges more” stops being a vibe and becomes a measurable claim: markers per thousand words [CR], sorted by category. The wider point that machine-written prose carries a measurable stylistic signature is now established empirically: Kobak and colleagues, tracking word frequencies across fifteen million biomedical abstracts, found an abrupt post-2022 spike in a set of “excess” style words and used it to estimate that at least a tenth of recent abstracts were processed by a language model. (That their own title verb, “delving”, is one of the flagged markers is a small joke at the field’s expense.) Style, in other words [RG], leaves fingerprints, and fingerprints can be counted. [AE]
One qualification carries through the rest of this piece. [MS] What follows analyzes the common characterization of the two defaults, not the output of a controlled head-to-head. [CB] To the extent the tendencies hold, here is what they are doing and what they cost; where they do not hold, the analysis simply does not apply. [MCS]
Five features, one example
Holding the propositional content fixed isolates the texture. Take a single claim – raising interest rates lowers asset prices – and apply each feature in turn.
The direct rendering states it and moves on:
Raising interest rates lowers asset prices.
Framing adds an interactive preview before the content lands:
It helps to see why this matters: when a central bank raises rates, asset prices generally fall.
Commitment can be tuned in either direction, and bare assertion is not the neutral midpoint – it is one option among three. [CB][SDA] Compare a hedge, a booster, and the unmarked statement:
Raising interest rates tends to lower asset prices, at least on average. (hedge) Raising interest rates clearly lowers asset prices; the mechanism is not in dispute. (booster) Raising interest rates lowers asset prices. (bare)
This matters for the comparison, because [SS] GPT-style prose is often described as “unhedged” as though it carried no stance. It carries one [ARR]: the bare and boosted forms sit at the confident end of the same scale on which the hedge sits at the cautious end. Choosing not to hedge is a stance, not its absence. [AE][CB]
Contrast disambiguates by naming and rejecting an alternative reading:
The driver here isn’t sentiment but discount rates: higher rates lower the present value of future cash flows, so prices fall.
Engagement recruits the reader into shared cognition:
You can think of it this way: when rates rise, the same future cash flows are worth less today, so prices drop.
Recursion restates the idea in a second register:
Higher rates lower asset prices. Put differently, raising the discount rate shrinks the present value of future cash flows; the asset is worth less because tomorrow’s money has become cheaper relative to today’s.
Real outputs combine these rather than deploying them singly. The aggregate effect is clearest in a short passage rendered both ways. First, dense:
Raising interest rates lowers asset prices. Higher rates increase the discount applied to future cash flows, so their present value falls. They also raise borrowing costs, which dampens demand for leveraged assets. The size of the effect depends on duration: long-duration assets, like growth equities and long bonds, move most.
Then the same content, fully textured:
It’s worth understanding why rates matter so much here. When a central bank raises rates, asset prices generally fall – not because of panic, but because of arithmetic. Higher rates mean future cash flows are discounted more heavily, so their present value drops. In other words, the same stream of future income is simply worth less today. There’s a second channel too: borrowing becomes more expensive, which tends to cool demand for assets bought on leverage. You might notice that the effect isn’t uniform. It depends on duration – long-duration assets, such as growth equities and long-dated bonds, tend to move the most.
The second passage carries no propositional information the first lacks, and on a count of Hyland’s markers it runs to roughly [RH] seven against effectively none, at close to twice the length. It would be wrong, though, to call the difference padding. [CP] The markers do work: the contrast pre-empts a reader who would otherwise reach for a sentiment story, the hedges flag that the relationship is statistical rather than exact[CB], and the gloss rescues a reader who does not already know what discounting is. The added length buys resistance to particular misreadings. Whether that resistance is worth paying for depends entirely on who is reading. For someone who has never priced a bond, it is insurance; for someone who prices them daily, the same machinery is friction, and the dense version is the courtesy. [MCS][AE]
I don’t have a good explanation of this strong defense. Is this just coming from LLMs being “walks on high-dimensional graphs of linguistic transitions” as one paper described it or coming from some of the additional layers?
Redundancy and the cost of a misread
The deepest point in the usual comparison is [SRC] that these features are not decoration but error-prevention. [CB] That is correct, and it can be made precise. [VP]
Every metadiscourse marker is, propositionally, redundant: it adds decoding instructions, not facts about the world. [CB] Redundancy is exactly what, in Claude Shannon’s information theory, protects a message against noise. Shannon defined the redundancy of a language as the degree to which a message can be shortened without losing information; the slack is what survives corruption in transmission. High-metadiscourse prose runs the channel at high redundancy; thinned prose runs it at high density. [MCS] The intuition that the markers prevent errors is right, and this is its exact form: [VP] they trade channel redundancy for information density.
Relevance theory, in Deirdre Wilson and Dan Sperber’s account, supplies the other half. [CF] Comprehension trades processing effort against cognitive effect: an input is worth attending to in proportion to the effect it yields and in inverse proportion to the effort of deriving it. [MCS] Framing, restatement and explicit contrast lower the reader’s processing effort by performing the disambiguation in advance. That effort is not abolished; it is relocated to the writer. [AE][CB] Thinned prose makes the opposite trade, buying density and speed by passing the disambiguation back to the reader, who pays in attention and in the risk of decoding wrongly.
Put this way, neither default is correct in general. In the terms of Grice’s maxims – be as informative as required and no more [DT] (quantity), be truthful (quality), be relevant (relation), be brief and orderly (manner) – high-metadiscourse prose risks quantity in the upward direction and the manner maxim’s stricture against prolixity, when the reader is expert and the context rich. Thinned prose risks quantity in the downward direction [MCS] and the manner maxim’s injunction to avoid ambiguity, when context is thin. Correctness is a function of the reader and of the cost of a misread, not a property of the prose. [AE][CB]
The two failure modes are symmetric and worth seeing side by side. [SS] Over-redundant prose fails by burying its own point: a reply that opens with three sentences on why the question matters before stating a one-word answer leaves the reader hunting for the answer, and the framing that was meant to aid comprehension defeats it. Under-redundant prose fails by assuming context the reader lacks: an instruction to “restart the service after deploying” misfires when it does not say which service, or that restarting drops live connections. The first failure wastes a careful reader’s time; the second misleads a careless or unfamiliar one. [MCS] Both are real, and a writer cannot minimize the risk of one without raising the risk of the other. [MCS]
A word on cause, kept separate from the functional account above. [MS]
Meta-signposting [MS] is not the only pattern here. There is a bit of significance-signaling SS of some kind that the current catalog is missing, as Claude would say. What kind that is I still don’t know. When I explained the pattern, Claude suggested “significance-signaling fused with a fastidiousness display” as a flavor, or a separate pattern, “epistemic-fastidiousness display.” Not happy with either. Yet, I wanted to flag that there is some advertising of intellectual care, which is definitely a separate pattern.
Describing what the markers do is not the same as explaining why the models produce them, and the honest answer [CDF] to the why is that the internals are not externally verifiable. What can be said is that both defaults are products of preference-based training and of stated design priorities, and that this cuts against reading either as intrinsically more honest.
Here’s where I learned that LLMs can be “intrinsically honest” to varying degrees.
Modern assistants are tuned on human preference data; labs declare different emphases – Anthropic’s “helpful, honest, harmless” framing in Constitutional AI is one such declaration, not a neutral description. [CB] Preference-based training is also known to reward the wrong things: Sharma and colleagues found that both human raters and the preference models trained on them sometimes favor convincingly written but incorrect answers. That finding should make one suspicious in both directions. Visible hedging may reflect honest calibration, or it may be risk-averse throat-clearing that a reward model happened to reward [MCS]; visible confidence may reflect clarity, or it may be overclaiming that read well to a rater. [MCS] The markers are evidence of training, not of virtue. [AE][CB]
Matching redundancy to context
The practical question is usually posed as how to make text less Claude-shaped. Better posed: [RF][CR] how much redundancy does this reader, in this setting, actually need. The operation runs in both directions, and the harder skill is usually [RH] adding redundancy back where a thinned draft has stripped too much [SRC].
More redundancy earns its cost when a misread is expensive or context is thin – dosage instructions, legal terms, onboarding material for newcomers, error messages that must not be misconstrued, text read asynchronously by people who cannot ask a follow-up, audiences split across cultures or disciplines. [SDA] A terse instruction misexecuted in a control room is the standing argument for it. [MCS] Less redundancy wins when the reader is expert, the channel high-trust, space or time short, and a misread cheap or self-correcting – a memo between specialists, code review among peers, a reader of a literary review who resents being walked to a conclusion she reached two sentences earlier.
An escalating defense…
Being talked through the obvious is the standing argument against it. [MCS] Neither argument generalizes; each is a claim about a reader. [AE]
… elegantly balanced and polished with an aphoristic ender. Admirable!
For the more common request, reducing redundancy, the procedure is straightforward, with the inverse implied throughout:
Claude loves “throughout,” but since this is not about simple overuse, it’s not marked.
Cut interactive metadiscourse that previews rather than informs[CB].
Collapse paraphrastic restatement to its single strongest formulation.
Reduce contrastive scaffolding to one level.
Recalibrate hedges rather than deleting them[CB]. Uniform hedging and uniform confidence are both miscalibrations – the first reads as evasive, the second as reckless – [MCS][SDA] and stripping every hedge does not sharpen prose so much as make it misreport its own certainty. [AE]
What would settle it
Most of the above is structured impression, and could be wrong. [AHM] Testing it is not difficult to specify. [CL] Hold the prompts and the system prompt fixed, sample current model versions at a fixed temperature, and tag the outputs against Hyland’s categories: hedge and booster density, frame-marker and code-gloss counts, contrastive-construction frequency, reader-pronoun rate, mean restatements per claim. Report effect sizes rather than anecdotes[CB]. The excess-vocabulary method shows the corpus-scale version is feasible, and also shows its limits. The confounds are real – prose is highly prompt-sensitive, versions drift, temperature moves the whole distribution, and an unbiased tagger is hard to build. [SDA] The priming problem is worse still: an observer who expects one model to hedge will over-count its hedges and under-count the other’s [MCS], so blind tagging is not optional. [DT][CL]
There is also a deflationary point [SRC] that neutrality forces. These are two wide distributions with heavy overlap. The difference between them is one of central tendency, not of kind [CB], and a great many single samples could have come from either model. The “two styles” are visible only in aggregate, and only when no one is steering. Specify a register and the distinction largely [RH] dissolves – which is why the most reliable thing that can be said about either default is that it is a default [AE], and a default is the easiest thing about a model to change. This essay, pulled toward terseness by a single line of instruction, is a small instance of that, and no evidence of anything finer. [AHM][DT]
Sources
Ken Hyland (2005), Metadiscourse: Exploring Interaction in Writing, Continuum – the interactive/interactional taxonomy used throughout. Reviewed in Language in Society (Cambridge Core).
Claude E. Shannon (1948), “A Mathematical Theory of Communication”, Bell System Technical Journal – entropy, redundancy, source coding. DOI: 10.1002/j.1538-7305.1948.tb01338.x.
Claude E. Shannon (1949), “Communication Theory of Secrecy Systems” – the definition of a language’s redundancy as how far it can be shortened without losing information. PDF.
Deirdre Wilson and Dan Sperber (2004), “Relevance Theory” – the processing-effort / cognitive-effect account of comprehension. PDF (author-hosted).
H. P. Grice, “Logic and Conversation” (1975) – the Cooperative Principle and the maxims of quantity, quality, relation and manner. Stanford Encyclopedia of Philosophy: Implicature; SEP: Paul Grice.
George Lakoff (1973), “Hedges: A Study in Meaning Criteria and the Logic of Fuzzy Concepts”, Journal of Philosophical Logic 2: 458–508 – hedges as devices that adjust commitment and category membership. Open PDF (eScholarship); DOI: 10.1007/BF00262952.
Long Ouyang et al. (2022), “Training Language Models to Follow Instructions with Human Feedback” (InstructGPT) – reinforcement learning from human feedback as the tuning method behind current assistants. arXiv:2203.02155.
Yuntao Bai et al. (2022), “Constitutional AI: Harmlessness from AI Feedback” – the “helpful, honest, harmless” design framing as a stated, non-neutral priority. arXiv:2212.08073.
Mrinank Sharma et al. (2023), “Towards Understanding Sycophancy in Language Models” – evidence that human and model preference judgments can favor convincing-but-wrong and user-flattering answers. arXiv:2310.13548.
Dmitry Kobak et al. (2025), “Delving into LLM-assisted writing in biomedical publications through excess vocabulary”, Science Advances – machine-written prose carries a measurable, countable stylistic signature. DOI: 10.1126/sciadv.adt3813.

