Evaluation Consent Policies
Epistemic Status: Early idea
A common challenge in nonprofit/project evaluation is the tension between social norms and honest assessment. We've seen reluctance for effective altruists to publicly rate certain projects because of the fear of upsetting someone.
One potential tool to use could be something like an "Evaluation Consent Policy."
For example, for a certain charitable project I produce, I'd explicitly consent to allow anyone online, including friends and enemies, to candidly review it to their heart's content. They're free to use methods like LLMs to do this.
Such a policy can give limited consent. For example:
- You can't break laws when doing this evaluation
- You can't lie/cheat/steal to get information for this evaluation
- Consent is only provided for under 3 years
- Consent is only provided starting in 5 years
- Consent is "contagious" or has a "share-alike provision". Any writing that takes advantage of this policy, must itself have a consent policy that's at least as permissive. If someone writes a really bad evaluation, they agree that you and others are correspondingly allowed to critique this evaluation.
- The content must score less than 6/10 when run against Claude on a prompt roughly asking, "Is this piece written in a way that's unnecessarily inflammatory?"
- Consent can be limited to a certain group of people. Perhaps you reject certain inflammatory journalists, for example. (Though these might be the people least likely to care about getting your permission anyway)
This would work a lot like Creative Commons or Software Licenses. However, it would cover different territory, and (at this point at least) won't be based on legal enforcement.
Potential Uses
I'm considering asking a few organizations to provide certain consent for several of their projects. One potential outcome is a public dataset of a limited but varied list of projects that are marked as explicitly open for public analysis. Perhaps various AI agents could evaluate these lists at different times, then we could track how different agents agree with one another. There's clearly a lot of important details to figure regarding how this would work, but having a list of available useful and relevant examples seems like a decent starting point.
Potential Criticisms
"Why do we need this? People are already allowed to critique anything they want."
While this is technically true, I think it would frequently break social norms. There are a lot of cases where people would get upset if their projects were provided any negative critique, even if it came with positive points. This would act as a signal that the owners might be particularly okay with critique. I think we live in a society that's far from maximum-candidness, and it's often difficult to tell where candidness would be accepted - so explicit communication could be useful.
"But couldn't people who sign such a policy just attack evaluators anyway?"
I don't think an explicit policy here will be a silver bullet, but I think it would help. I expect that a boss known for being cruel wouldn't be trusted if they provided such a policy, but I imagine many other groups would be. Ideally there could be some common knowledge about which people/organizations fail to properly honor their policies. I don't think this would work for Open Philanthropy that much (in the sense that effective altruists might expect OP to not complain publicly, but later not fund the writer's future projects), but it could for many smaller orgs (that would have much less secretive power over public evaluators/writers)