The substrate
Methodology
What the cohort pages are, how they’re built, and where we draw the line between observation and interpretation.
The method
From one brand to a cohort
Every published cohort is the same pipeline run thousands of times. Nothing is hand-curated.
What this is
BrandGap.AI runs brand-positioning analyses for individual companies. Each analysis examines a brand alongside its competitors and produces a structured read of how that brand presents itself in market — its archetype, its positioning, its tone, its claims, its competitive territory.
The substrate is what happens when you do that across every major industry, again and again. Patterns emerge. Cohorts form. You can see, in aggregate, how brands in DTC describe themselves — or what tone B2B fintech defaults to, or where the under-claimed territory sits in healthcare.
The cohort pages are a public window onto that substrate. They’re not industry research and they’re not market reports. They’re a record of how brands describe themselves, aggregated to the point where the noise drops out and the pattern shows up.
How a cohort is built
A user submits a brand and up to a handful of competitors. Our system analyzes the public-facing surface of each — websites, social, the messages a brand chooses to put in front of the world. An AI pass analyses each brand against a consistent rubric: dominant archetype, positioning on a two-axis map, tone profile across five dimensions, claimed differentiators.
Every brand analyzed becomes part of an industry-wide cohort. The classification is structured: free-text industry input from the user gets mapped onto a controlled taxonomy so “Fintech & Banking,” “FinTech,” and “Financial Technology” all land in the same bucket.
Aggregation runs the same maths across every cohort: archetype distribution, quadrant counts, average tone scores, common phrases, under-claimed territories. Nothing is hand-curated. The same code that produces the DTC cohort produces the agency cohort.
The AI pass
Each brand is analyzed by a frontier large language model against a structured prompt. The prompt asks for archetype, positioning coordinates, tone scores across five dimensions, common messages, and common differentiators — and returns the result as structured JSON. The same prompt runs against every brand. Different brands produce different answers; the framework producing those answers is stable.
The prompt is use-case-aware. A brand analyzed under the B2B framework is scored against a different rubric than a brand analyzed under Consumer or Employer. The use case shifts what counts as “premium” tone, what archetypes feel contextually appropriate, and what positioning territory looks rational for the category. The framework respects that B2B buyers and B2C buyers respond to different signals; both signals are real.
Cohorts are tagged with the methodology snapshot they were computed under. When the methodology is updated, new cohort snapshots are produced and the previous snapshots remain on record. Every cohort page shows the date of its last computation. This is how we keep the substrate honest as the model and the rubric evolve.
When a cohort becomes public
A cohort is published as a public page once it passes a minimum sample threshold. Below that, the maths still runs — the cohort exists internally and powers comparisons inside individual reports — but we don’t put it on a permanent URL.
The threshold isn’t magic. It’s the point at which archetype percentages stop being one or two analyses moving the needle, and tone averages stop swinging on small samples. Below it, the distributions are too sensitive to outliers to publish as a credibility surface. Above it, the pattern is the pattern.
Sample size is shown on every cohort page, every published finding, and every in-report callout — so readers can weigh claims against the evidence behind them. Larger cohorts are more reliable than smaller ones. The same aggregation runs every time; the patterns get sharper as the sample grows.
What the four numbers mean
Archetype distribution
For each brand analyzed, we identify the dominant Jungian archetype it expresses. The framework draws on Carl Jung’s original work on archetypes and the contemporary brand adaptation articulated by Carol Pearson and Margaret Mark in The Hero and the Outlaw (2001). The twelve archetypes: Caregiver, Creator, Everyman, Explorer, Hero, Innocent, Jester, Lover, Magician, Rebel (sometimes called Outlaw), Ruler, and Sage.
The cohort distribution shows what proportion of brands in that industry land on each archetype. A cohort heavily weighted on Caregiver tells you something specific: brands in this space are competing on care, not innovation or rebellion. The same user analyzing the same brand with the same brief gets the same archetype call, the same positioning coordinates, the same tone scores — the system fingerprints inputs and serves the cached result within a twelve-month window. Two different users analyzing the same brand with different briefs get fresh analyses; the framework producing those calls is the same. The system is reproducible by design.
What this framework isn’t is the only possible one. Other models exist, and a different framework would draw different lines. We use this one because it is the one with the most usable category language in brand work.
Positioning distribution
Every analysis plots brands on a two-axis positioning map. The axes shift by use case — because what counts as distinctive positioning is different for a B2B SaaS platform, a consumer DTC brand, an employer-brand campaign, and a product launch. There are four frameworks:
- B2B — Enterprise ↔ Agile on the horizontal axis (heavyweight, complex, legacy at one end; lightweight, modern, fast at the other), Accessible ↔ Premium on the vertical (SME-friendly, low barrier vs enterprise-priced, high-touch).
- B2C / Consumer — Traditional ↔ Innovative (heritage, familiar, classic vs modern, disruptive, trend-forward), Accessible ↔ Premium (mass market, democratic vs aspirational, luxury).
- B2T / Employer brand — Corporate ↔ Human (formal, hierarchical, structured vs personal, flat, authentic), Collaborative ↔ Performance (team-first, support-oriented vs high-achievement, ambitious, results-driven).
- Product launch — Functional ↔ Emotional (features, specs, performance vs story, identity, feeling), Niche ↔ Mass (specialist, targeted vs broad appeal, mainstream).
When a brand is analyzed without a use case selected, the system defaults to the B2B framework. The framework determines what counts as “premium”, what archetypes feel contextually appropriate, and what positioning territory looks rational for the category.
The quadrant counts on each cohort page show where brands cluster — and where they don’t. The under-claimed quadrants are the ones worth paying attention to. A brand operating in an under-claimed corner of its category has structural distinctiveness available to it, whether or not the rest of the brand is doing the work to claim it.
Tone profile
Every brand is scored across five tone dimensions on a 1–10 scale. Each dimension has explicit anchor descriptions at 1, 3, 5, 7, and 10, so two brands that score “7 on warmth” are using the same definition of what a 7 means. The same rubric runs against every brand. The interpretation of where on the scale a brand should sit shifts by use case — described at the end of this section — but the anchor definitions themselves are stable.
Warmth — how human and approachable the voice is.
- 1. Institutional. Zero personal pronouns. Passive voice. “Solutions are delivered.”
- 3. Formal but not cold. Occasional “we” but no “you”. Professional distance.
- 5. Balanced. Mix of professional and friendly. Uses “you” occasionally.
- 7. Warm and inclusive. Regular “you”. Conversational without being casual.
- 10. Feels like a friend. Contractions throughout. Casual, highly personal.
Confidence — how assertive and certain the voice is.
- 1. Heavily hedged. “May”, “might”, “could”, “we think”. Constant qualifiers.
- 3. Measured. Makes claims but always softens them.
- 5. Clear and direct. Occasional qualifier. Neither bold nor timid.
- 7. Bold declarative statements. Minimal qualifiers. Owns its position.
- 10. Absolute certainty. No hedging. Direct commands. “The best.” “We are.”
Formality — how formal vs casual the register is.
- 1. Extremely casual. Slang, emoji, sentence fragments, Gen-Z voice.
- 3. Casual. Contractions, informal vocabulary, relaxed grammar.
- 5. Neutral. Professional but not stiff. Standard business English.
- 7. Formal. Complete sentences, no contractions, professional vocabulary.
- 10. Highly formal. Legal or academic register. No colloquialisms. Third-person references.
Innovation — how modern and forward-looking the positioning is.
- 1. Traditional. “Trusted since”, heritage language, conservative imagery.
- 3. Established. Modern enough but leans on experience and reliability.
- 5. Contemporary. Current and relevant, not actively pushing disruption.
- 7. Progressive. Future-focused, technology-forward, signals change.
- 10. Disruptive. Actively challenges the status quo. “Revolution.” Cutting-edge.
Premium — how upscale and exclusive the brand feels.
- 1. Budget or mass-market. Price prominence. “Affordable.” “Everyone can.”
- 3. Mid-market. Quality claims but accessible. Broad audience language.
- 5. Quality-focused but not exclusive. Professional, not luxury.
- 7. Premium. Craft language. Quality signals. Selective about who it’s for.
- 10. Luxury or exclusive. Scarcity signals. Aspirational. Price never mentioned.
The five tone scores are independent of the two-axis positioning map described above. A brand can score high on Premium tone (its copy reads aspirational) while sitting on the Accessible side of the positioning map (its pricing and access model are democratic). The two are different measurements: positioning is what the brand IS in its category; tone is how the brand SOUNDS in its copy.
The interpretation of where a brand should sit shifts by use case. A warmth score of 5 reads neutral for B2B but cold for an employer brand, where 7 is the floor. A confidence score of 8 reads bold for B2C but expected for a product launch. The framework respects that what looks distinctive in one category looks generic in another, and what feels rational in one buyer’s context feels off in another’s. The cohort averages on each cohort page show what the industry is actually doing on each dimension — the baseline against which any individual brand becomes interesting or unremarkable.
A cohort with high average confidence isn’t neutral — it’s a category-wide claim being made about authority. Brands plotting below the average are either under-confident or deliberately challenging the convention. Either reading is useful; the data shows what the category is doing.
Common messages and differentiators
Phrases that recur across the cohort — the words brands keep reaching for. These are extracted from the marketing surface of each analyzed brand and counted across the cohort. The phrases that appear most often are the category’s shared vocabulary. They’re also, usually, the most over-claimed territory.
Honest limits
What we don’t claim
Several things, named explicitly:
- Industry classification has a precision ceiling. When a user types “Fintech & Banking,” the classifier picks a canonical slug based on that string alone. It can’t reliably distinguish a B2B fintech from a consumer fintech without more signal. Where a cohort name says “B2B-leaning” or “Consumer-leaning,” that’s us being honest about the lean rather than overclaiming a clean split.
- Archetype detection isn’t perfect. A small percentage of brands resist clear archetype assignment and land in an “Unknown” bucket. We surface that figure on cohort pages rather than hide it.
- The substrate reflects what brands say, not necessarily what they are. We analyze public-facing surface. If a brand’s positioning is misaligned with its internal reality, the substrate sees the positioning. That’s the point. We’re mapping the claim space, not the underlying truth.
- Larger cohorts are more reliable than smaller ones. More analyses make for a stronger pattern. The sample size on every page tells you which you’re looking at.
These aren’t bugs. They’re the conditions under which the substrate is honest about itself. A brand-positioning dataset that doesn’t acknowledge its own measurement limits isn’t research — it’s marketing.
Versioning and freshness
The pipeline is stable, the aggregation rules are stable, and the cohort thresholds are stable. What will evolve: how we describe nuance on the cohort pages themselves, whether we add new measurement dimensions, and how we handle ambiguous classification cases as the substrate grows.
Cohorts recompute on a regular cadence. Every cohort page shows the date of its last computation in the header. New analyses are added continuously; cohorts cross the publication threshold as their sample sizes grow.
If a claim on a cohort page surprises you, the data is the data. The analysis is the analysis. Both are open to question.
Findings library
Where the cohort sample is large enough, we publish a finding: a short essay reading the cohort, naming what stands out, and saying what it means for someone analyzing their brand against it. The findings library sits alongside the methodology — each finding is anchored to a specific cohort, a specific sample size, and the data behind it.
See the findings library for everything currently published.