Using Points to Rate Different Kinds of Evidence
There’s a lot of discussion on the EA Forum and LessWrong about epistemics, evidence, and updating.
I don’t know of many attempts at formalizing our thinking here into concrete tables or equations. Here is one (very rough and simplistic) attempt. I’d be excited to see much better versions.
Equation
Initial Points
Scientific Evidence
- 20 - A simple math proof proves X
- 8 - A published scientific study in Economics supporting X
- 6 - A published scientific study in Psychology supporting X
Market Prediction
- 14 - Popular stock markets strongly suggest X
- 11 - Prediction markets claim X, with 20 equivalent hours of research
- 10 - A poll shows that 90% of LessWrong believe X
- 6 - Prediction markets claim X, with one equivalent hour of research
Expert Opinion
- 8 - An esteemed academic believes X, where it’s directly in their line of work
- 6 - The author has strong emotions about X
Reasoning
- 6 - There's a (20-100 node) numeric model that shows X
- 5 - A reasonable analogy between X and something clearly good/bad
- 4 - A long-standing proverb
Personal Accounts
- 5 - The author claims a long personal history that demonstrates X
- 3 - Someone in the world has strong emotions about X
- 2 - A clever remark, meme, or tweet
- 2.3 - An insanely clever, meme, or tweet
- 0 - Believing X is claimed to be personally beneficial
Tradition / Use
- 12 - Top businesses act as if X
- 8 - A long-standing social tradition about X
- 5 - A single statistic about X
Point Modifiers
Is this similar to existing evidence?
Subtract the similarity from the extra amount of evidence. This likely will remove most of the evidence value.
Is it convenient for the source to believe or say X?
-10% to -90%
Is there a lot of money or effort put behind spreading this evidence? For example, as an advertising campaign?
+5% to +40%
How credible is the author or source?
-100% to +30%
Do we suspect the source is goodharting on this scale?
-20%
Points, In Practice
Evidence Points, as outlined, are not trying to mimic mathematical bits of information or another clean existing unit. I attempted to find a compromise between accuracy and ease of use.
Meta
Using an Equations for Discussion
The equation above is rough, but at least it’s (somewhat) precise and upfront. This represents much information, and any part can easily be argued against.
I think such explicitness could help with epistemic conversations.
Compare:
“Smart people should generally use their inside view, instead of the outside view” vs. “My recommended points scores for inside-view style evidence, and my point scores for outside-view style evidence, are all listed below.”
“Using many arguments is better than one big argument” vs. “I’ve adjusted my point table function to produce higher values when multiple types of evidence are provided. You can see it returns values 30% higher than what others have provided for certain scenarios.”
“It’s really important to respect top [intellectuals|scientists|EAs]” vs. “My score for respected [intellectuals|scientists|EAs] is 2 points higher than the current accepted average.”
“Chesterton’s Fence is something to pay a lot of attention to” vs. “See my score table the points from various kinds of traditional practices.”
In a better world, different academic schools of thought could have their own neatly listed tables/functions. In an even better world, all of these functions would be forecasts of future evaluators.
Presumptions
This sort of point system makes some presumptions that might be worth flagging. For example, it claims that even really poor evidence is evidence.
I often see people throwing out low-informative evidence as completely worthless. I think this take is misguided. The antidote to a poor argument isn’t to mistakingly claim that the argument is entirely worthless - it’s to provide a better argument.
Agreeing on an evidence-weighing algorithm before direct discussions
In classical debate, after choosing a side, debaters will talk up the importance of the sorts of evidence that they might happen to have and dismiss the sorts of evidence their opponents bring up.
This becomes particularly gnarly when a group (like a political interest) goes through a long list of heated discussions on different topics. In each, they’re likely to gerrymander their effective evidence points rankings in order to most benefit their argument.
An obvious epistemic improvement would be for parties to declare consistent epistemologies upfront. An even better state might involve parties agreeing on some shared aggregation of their epistemic preferences. Aggregate epistemics, not policy beliefs.
In some worlds, intellectuals would spend most of their time improving epistemic processes and revealing unbiased evidence. A book about tax reform might be rewarded based on how many total points of evidence it brings up, regardless of which side of the debate those points are on.
Isn’t this too complicated and speculative?
As long-time readers will know, I’m a big fan of attempting to measure highly speculative concepts. I guess explicit and speculative models are often preferable to standard text discussions. There’s a potential danger that some people might over-trust these numbers because they are numbers. However, the alternative to modeling is often “lots of blog posts with different undefined ontologies and tons of misunderstanding,” so I think this is often a reasonable tradeoff.
One great thing about models is that you can improve them. As we get more evidence and opinions, I’m hopeful that eventually, models emerge that wind up being pretty okay. If you kill mediocre attempts, you likely eventually kill decent ones, too.
Future Work
This is basic now, but I think it illustrates a worthy goal. Potential future work would include:
- If you’re reading this, post your own list! It would be good to get thoughts there.
- Organized surveys for different groups to point these things.
- Instead of specific points, use probability densities for each. Even better would be functions - for example, if survey data is used, there could be a function that takes in the number and quality of the respondents and outputs a corresponding point value.
- Use ML to come up with the algorithm. Its algorithm might be very complicated, but it could be helpful even if it were a black box.
- Point an extensive list of concrete examples of evidence. For example, “How many points of evidence do you think Tweet X provides for claim Y?”
- Have people forecast what experts will think, maybe using something like Squiggle.
- As you have functions that people (generally) trust, use automatic evaluations on evidence. On websites, display these points wherever evidence is presented. Reward people/analysts as a function of the points they have discovered.
Thanks to Nuño Sempere for discussion on this topic. You can also see a short attempt by him on something similar here.
Afterward: Quick Attempts by LLMS
I fed this post to Claude and ChatGPT to have them provide simple takes.
Claude
Scientific Evidence
- Meta-analysis of multiple randomized controlled trials (n>10,000) - 10 points
- Large randomized controlled trial (n>1,000) - 9 points
- Observational or cohort study (n>100,000) - 8 points
- Smaller randomized trial (n<1,000) - 7 points
- Data from international space agency - 7 points
- Government census data - 6 points
- Survey by reputable organization (n>1,000) - 6 points
- Computational simulation model - 5 points
- Observational study (n<100,000) - 5 points
- Survey (n<1,000) - 4 points
Expert Opinion
- Consensus of leading experts in field - 8 points
- Opinion of renowned thought leader in field - 7 points
- Tenured academic in field - 6 points
- Industry practitioner with 10+ years experience - 5 points
Reasoning
- Simple logical deduction from accepted premises - 5 points
- Complex logical reasoning with multiple steps - 4 points
- Common sense heuristic - 3 points
- Speculative theory - 2 points
Records
- Government records - 6 points
- Historical documents - 5 points
- Personal diary/journal - 3 points
- Unverified personal account - 2 points
- Anonymous internet claim - 1 point
Culture
- Widespread social tradition - 5 points
- Popular opinion or "conventional wisdom" - 4 points
- Commentary in major newspaper - 3 points
- Remark from comedian/entertainer - 2 points
- Meme or viral tweet - 1 point
GPT-4
Experimental Evidence (Evidence from controlled experiments or scientific studies)
- Randomized Controlled Trial (RCT) results: 25 points
- Meta-analysis of multiple RCTs: 23 points
- Laboratory experiment results: 20 points
- Observational study with a large sample size: 15 points
- Case-control study: 12 points
Empirical Evidence (Evidence based on observation or experience)
- Longitudinal study: 14 points
- Cross-sectional study: 12 points
- Field observations by non-experts: 9 points
- Informal surveys: 7 points
- Anecdotal observations with potential biases: 5 points
Expert Testimony and Opinion
- Consensus of experts in the field: 20 points
- Peer-reviewed publication: 19 points
- Opinion of a recognized expert in the field: 16 points
- Opinion of a professional not specifically in the field: 9 points
- Opinion of a well-read layperson: 6 points
Anecdotal and Personal Accounts
- Detailed case study: 10 points
- Personal testimonies verified by third-party: 8 points
- Unverified personal accounts: 5 points
- Hearsay or second-hand accounts: 4 points
- Popular myths or urban legends: 3 points
Historical and Archival Evidence
- Primary historical documents (e.g., treaties, letters): 18 points
- Secondary sources or historical analysis: 15 points
- Oral histories or traditions: 9 points
- Conspiracy theories based on historical events: 3 points
Logical and Theoretical Evidence
- Proven mathematical or logical theorem: 22 points
- Philosophical argument based on accepted premises: 13 points
- Hypothetical scenarios or thought experiments: 9 points
- Personal reasoning without validation: 6 points
Public Opinion and Mass Media
- Large-scale, representative opinion poll: 14 points
- News reports from lesser-known outlets: 8 points
- Opinion pieces or editorials: 7 points
- Rumors or unverified news: 4 points
- Social media trends or viral content: 5 points
- Individual's post or comment on social media: 2 points
Miscellaneous
- Personal intuition or gut feeling: 4 points
- Popular sayings or proverbs: 6 points
- Unverified quotes: 5 points
- Popular memes or jokes: 3 points
This system offers a broader range of evidence types, especially those under 10 points, to capture the nuances of everyday information sources.