Intro
The American political landscape is extremely polarized, and its electoral systems seem good at nothing except selecting unpopular leaders. These problems are entwined, but I believe the root issues can be addressed with a mere information system upgrade. “The Politics Industry” (2020), co-written by Katherine Gehl and Michael Porter, lays out a compelling case that our primary systems are to blame. Essentially, party interests (donors, party wings, etc.) want different candidates than the wider electorate wants. Certainly, other dynamics, such as our media landscape, loop out and into this, but the primary system is load-bearing inside of these dynamics. If you’re skeptical or curious, see my review and summary of the book here.
To fix this, we could reform the primary elections directly (the book suggests open primaries with ranked-choice voting), but parties’ resistance to early efforts has indicated it won’t be an easy fight. Thus, for pragmatic reasons, I’m proposing a parallel project to upgrade the information environment so that existing party primaries select more popular candidates. No party approval nor new laws are needed; just straightforward information technology.
Should we?
The idea that parties should select more popular candidates regained steam recently. Matthew Yglesias coined popularism as a critique of the Democratic party and sums it up as “You should encourage candidates to embrace popular progressive causes and allow them to make tactical retreats from fights where conservatives have public opinion on their side.” I believe this is just as true on the right, and doubly so if they want to stay competitive against opponents acting this way.
Not everyone agrees with Yglesias, and their complaints make just as much sense from the left or right perspective, so it’s useful to consider them. For example, David Dayen argues that if you make promises you do not enact, you reap voter fatigue and incumbent losses. I believe that’s what the kids call a “skill issue,” however, as you can address popular concerns without overpromising and underdelivering. Show empathy to the aggrieved, remind people there are limits to what the office can or should do, and make directional progress that doesn’t trigger partisan backlash. Leadership, basically. Another complaint is that this move is only ever a smokescreen. Adam Johnson argues that the party establishment is using this to selectively push policies they wanted to do anyway while ignoring the (populist) ones they do not. He sarcastically remarks, “Under this line of reasoning, we can’t really be too upset at any right-wing turns by Democrats because they are simply Responding To The Market.” This is a clear example of how party wings do not like popularism and will fight it by claiming moves to the center are not actually aligned with the people’s will. Still, it correctly points out that people love to grasp threads of data that confirm the narrative or plan they already favored. The solution isn’t to abandon data; it’s to do it much better and out in the open, so its conclusions are less up to interpretation.
But how?
What we need is common knowledge about the will of the people. If everyone knows what everyone wants, it’s difficult for special interests to put their fingers on the scale. We need a polling regime so rich that, before a primary, we can see projections for all hypothetical matchups of campaign platforms for a given race. As a primary process is underway, we also need a system to track how well each candidate performs with primary and general election voters (of all parties) to predict how each possible candidate pairs match up. Such predictions must be public so that parties who favor less popular (but party-aligned or donor-aligned) candidates take heat after an election loss. More importantly, when credible matchup predictions are public, it shifts the game-theoretic winning strategy toward picking popular candidates. Any party that refuses that strategy will keep losing until they reconsider.
For the first part, we need to generate campaign platforms for each party that will maximize the chance of victory. Mostly, they do so by being popular with that party’s base and swing voters, but they could also do so by flipping voters from the opposing party or at least being non-offensive enough that some stay home. Economists might call this the Pareto frontier in the victory landscape. Such frontiers tempt historic blowouts, but in a big-if-true way, so it’s important that the process and its data be open source.
This creates a context where, before the primaries, everyone knows each party’s likely paths to victory in the general election. Not perfectly, of course, but well enough to open the door to more popular candidates running, and in a way that special interests have less leverage to dissuade them. Candidates still need to walk through that door and demonstrate they have the credibility and charisma to follow one of those paths. That is where the second aspect of the system would come in, integrating rapid polling of each candidate into the broader predictive model to simulate general election outcomes. The idea is to influence primary voters such that most select the most popular candidate with the whole electorate and not just the one they individually like. If the election isn’t modeled to be close, they still won’t, but for many elections, it will be in their best interest.
This new information environment changes how America selects political representatives, yielding victors larger mandates (and less opposing vitriol), both of which should ease polarization and dissolve gridlock.
Such poll-driven politics has its critics. Timothy Noah argues that it can yield stagnant and morally compromised representatives and that moral leadership is required for good outcomes. I am sympathetic, but mostly, I disagree with the idea leadership can only live in the heads of politicians. Today, information moves sufficiently fast that everyone can be involved in the leadership process. Politicians, or anyone else, can pitch ideas to the public and then read how it was received. As it becomes increasingly practical to have citizens in the loop, moral leadership can be undertaken by anyone with a megaphone, and it becomes something we no longer want our representatives to do unilaterally.
But specifically, how?
The polling system we want has these properties
- Coverage - anticipates everyone in the country.
- Accuracy - a 5% margin of error or so.
- Freshness - before a primary, it has to be current to events and national conversation.
- Fidelity - it needs to model how voting groups in each state will turn out and vote for candidates with hypothetical campaign platforms.
- Trust - everyone must believe the projections.
I can imagine a way to make it even better, but this is the minimum to create the information environment to shift the game theory demands.
Layers
Practical, continual modeling of the whole country will necessitate a clever approach. I propose four layers of data with transform functions between them, where each aggregates continuous data from micro-clusters of citizens for practicality and anonymity.
1. Demographics
For pragmatic reasons, we start with the demographics of the US census: zip code, age, sex, race, household composition, and income. As this is an academic project, we can get access to each state’s voter databases and enrich the census data. We can then break the nation into micro clusters grouped by common attributes.
2. Psychographics
The second layer holds stable low-level beliefs and psychological attributes. These attributes change extremely slowly in adults, so measuring them once yields informational returns over a long period. Finding and validating the set to track isn’t trivial but some work has been done already. One such schema is Haidt’s Moral Foundations Theory, which identified five axes: harm/care, fairness/reciprocity, ingroup/loyalty, authority/respect, and purity/sanctity. Another is Schwartz’s Value Theory, which enumerates two: self-direction, stimulation, hedonism, achievement, power, security, conformity, tradition, benevolence, and universalism. If these are insufficient, we can endeavor to use machine learning techniques to find and name the relevant dimensions in our datasets.
The only practical way I can see to collect psychographics is to interview representative samples of each demographic for their beliefs and use that to build the mapping. With this mapping, we can derive distribution beliefs for given demographics and regions from the census data. Can we assume every married, college-educated, white 45-year-old male with two kids in Miami who makes 75k/yr has the same fundamental worldview? No, but the spread of worldviews that the cluster exhibits is more narrow than the general population, and so we can make better probabilistic inferences than we could without it.
3. Issue Stances
The third layer holds issue stances, which is how someone feels towards a policy action the government might take. For example, you might believe it’s good to promote domestic re-industrialization, good to threaten tariffs on Chinese goods, and bad to cut funding for schools.
Issue stances change relatively rapidly, so they must be continually monitored. We need to build and continuously update a function mapping psychographics (from layer two) into the issue stances at this level. We also need to obtain data from multiple channels in parallel.
One channel is traditional polling techniques, which simultaneously ask people questions about their beliefs and issues stances. This method is likely too expensive to rely on, but it can help fill in gaps in the demographic profile that the other methods can’t.
The most robust and cost-effective channel would be a website where voters continually update their stances. I realize how improbable that sounds, but I have outlined a sketch of how this might come about in my post citizens-lobby, which, briefly, is a site similar to Change.org where activists drive politically motivated voters to show support on critical issues. While only motivated and engaged voters would spend time there, it’s a rich information channel that should inform the stances of psychographic clusters.
The third channel is an analysis of media, and I don’t know if it will work. If it does work, it’s extremely low cost and so it should be considered. The idea is we have AI transcribe and parse the semantic content of all popular political podcasts, videos, blogs, and opinion columns. We notice the demographics and psychographics of the speaker, along with any issue stance they say out loud. We attempt to keep a model of their ‘reach’ into the voting population, which we can either do by looking at their audience demographics or noticing historical correlations of how well their stated stances tracked those of voters. In plain terms, if Rogan utters a political stance, we can assume the mostly young male audience is slightly more likely to hold that belief. This would need years of validation, but it has the extremely useful property of predicting future changes in voter stances instead of catching on six months later.
And that just gets you the stances of the population on each issue; it doesn’t define the issues themselves, which is also quite tricky. Over time, new ideas enter the Overton window while others become irrelevant in the face of technological shifts. As narratives shift, once-separate issues may seem to combine (woke or anti-woke each becoming bundles of beliefs). Defining and redefining issues might need a messy human process, but as long al all parties have a chance to be involved, they should be accepted.
Notably, this layer by itself is a considerable public good; even the broader four-layer strategy cannot predict elections.
4. Votes
Finally, to project hypothetical election outcomes, we must generate a distribution of votes from each cluster for a hypothetical matchup of two campaigns.
First, we generate the campaign platforms for which we create matchups. We’ll need to use heuristics initially because there are a combinatorially explosive number of such platforms. One general class of heuristics to find them is to enumerate all possible positions above some popularity threshold from the issue stances layer, consider the sum of distributions of how each cluster feels on each, and then run an optimization algorithm to find the most popular possible platforms. This optimization process would need to model the turnout for party voters, swing votes earned, and the opposition voters who are indifferent enough to stay home.
Next, we must protect every voter’s stance on every platform matchup. For example, how a 40-year-old middle-class male registered D might vote for an incumbent D with platform X versus challenging an R with platform Y. Of course, the issue stances could be used if we add up the positive and negative feelings towards each issue in each platform and pick the highest number. This is simple and easy to inspect, but I fear it needs to be more accurate to show how people decide to vote. The main weakness is this system won’t understand the intensity to which people vote on select issues. In aggregate, someone might have polled to be mildly positive on the issues and intensely negative on one. How do we determine how they vote for a platform with those eleven issues on the platform? It may work in the aggregate, but it seems complicated to assume without testing it through several elections.
A different and perhaps better way to do it is to run real polls asking people about hypothetical matchups and use that to build a mapping from all the lower three layers to outcomes in this one. In practice, this is an ML process that is trained to take as input a given voting group, their psychographics, issue stances, a platform matchup, and the election context, which outputs a matchup stance. The design and execution of this could skew the outcome, and I would assume each party would want to run it themselves and keep most of the produced data private. Providing the tools for non-insiders to run their own version of the analysis could provide a check on this secrecy impulse of the party. Nate Silver might control the narrative with his run-through of the tooling if parties don’t release theirs.
Either way, this model must consider the dynamics of the election. For example, for the presidential election, a Monte Carlo simulation of the Electoral College would be a good choice. This tooling should also be released.
Validation
This layered approach is complicated compared to the naive approach of asking a representative sample in small polls. People will be skeptical. Continually validating its predictions with more surveys will be important.
Another way to validate the approach is to analyze the previous election retroactively. We would use the current data on beliefs and issues, which hopefully are still relevant, and then analyze the platforms on which candidates actually ran and see how the prediction lined up with reality.
And as this is being built, we can stick to analyzing a single state. Building the first two layers for a state is just as time-consuming as doing things nationally, but layer three is easier. We can even omit layer four if we analyze voter initiatives instead of candidates. Initiatives are more straightforward as there is a simpler mapping from political stances to them. Initiatives also offer more total data points to validate (a voter in San Francisco had around 20 different initiatives on their ballot in 2024), which helps us confirm things faster.
Cost
What would it take? Let’s look at several related areas.
- I eyeballed the polls listed on Wikipedia for 2024 to be around 400k polled people.
- I saw a Quora answer indicating it costs around $25/response to run polls.
- Thus, the 2024 election saw 10 million dollars in public polling. Campaigns likely spent more, but it’s hard to know.
- The 2020 US census is estimated to cost 13.7 billion dollars to run.
- OpenSecrets reports that the 2024 election had 4.5 billion in outside spending. The candidates spent about 1.5 billion, which is 6 billion in total.
- The US market research industry seems to gross around 30-40 billion annually.
If we polled everyone who voted (a little over 100 million people), that’s 2.5 billion dollars for each polling run. Enormous but conceivable. A sweeping campaign finance reform law could reallocate all funds that would fund campaigns to this effort instead. Or the census apparatus could be utilized.
More realistically, we need to reduce costs. Again, I suggest the layered approach. The demographics layer is free. Let’s say belief polling costs 10 million every four years. Issue stance polling costs another 5 million yearly for direct polling and 10 million yearly for media analysis. The vote layer might cost another 2.5 million in computing and expertise a year, so we’re at 20 million yearly. Much better.
A motivated organization might come up with other creative approaches. What if it created a free, excellent tax prep website, and the only catch was you had to fill out an anonymous survey at the end? That ensures all surveys are linked to real SSN / demographics and gets us a lot of reach! (Perhaps we suggest Intuit just fund the polling, so this approach isn’t, erm, necessary?). Another tact might be to see if there are anonymized but still valuable data byproducts to sell to AI or marketing research companies. The media analysis proposed in layer three may also yield information campaigns that advertisers pay for, which could fund the broader effort.
Skepticism
How do you ensure the projected platforms are not too big or too ambitious? What if the system outputs campaign platforms full of politically impractical positions, and parties feel running with them will overpromise, underdeliver, and damage faith in the party? This is fine. The parties won’t just take a platform blindly and run with it; candidates will use it as inspiration and do some needed editing and customization.
How does fundraising factor in? Large donors are historically essential factors in general election success, and this system would ignore that reality. It’s a concern and I favor campaign finance reform for that reason. Still, I believe the electoral gains of this approach would overcome the setback of reduced funding.
Doesn’t the presidential election have other important factors, such as the economy? Or, candidates acting as broader referendums on culture war issues? Okay, yes, this is a big problem. If people who like big trucks feel scolded by and resentful towards climate-focused leftists, a model tracking their stances on climate, Middle East policy, and offshore drilling would miss the issue driving their behavior. Possibly, such errors are uncorrelated and cancel out. Probably they don’t. But, again, the point of this system is to guide parties into running on better platforms, and not to predict elections perfectly. Furthermore, we can adjust the issues layer to have stances like “How good is the current president on the economy?” or “Would your economic interests improve if a small government fiscal conservative was elected?“. The sitting president’s approval rating with different clusters might also be factored in as a way to get at these questions indirectly.
What if both parties run nearly identical platforms but diverge on vibes and messaging? This is only bad if the system isn’t correctly modeling electorate preferences, and if it isn’t doing so, I hope an alternative effort springs up, takes the open-source data, and offers better paths to victory more in line with voters.
How vulnerable is this to preference falsification? I’m not worried about it in layers 1 or 2. However, any polls conducted to get issue stances in layer 3 are vulnerable. Preference falsification is correlated and thus unlikely to get averaged out. Careful and anonymous polling may be the only guard.
Is there really a platform that can win 70% of the popular vote in 2028? My thesis depends on this. I should partially validate this by considering counterfactual platforms that either 2024 presidential candidate could have run to shift their voting totals considerably.
This would take several cycles of being correct before anyone trusts it enough to take it as common knowledge. True. Still, it helps directionally when we map demographics to stances. With that, political strategists can take the data and use whatever process and presentation they feel will effectively sway their party’s strategy. Constituents can point at this data anytime an elected politician makes a move most people disapprove of. All of this should notably improve our democracy and be worth the cost of admission alone.
Don’t political parties already do some of this? Yes, but it mostly predicts where to spend funds and perhaps on what messages to pick. They do not seem to use it to pick more popular candidates.
Status
As of 2025, this idea is quite alive for me. I am currently working on refining how each layer will work. If you read to the end, I am looking for skepticism, riffs, existing efforts, collaborators, and everything in between.