I think this is a fairly thoughtful document, thanks for writing it, and for sharing it here.
For current projects (changing disclosure levels): Anyone can propose changing the disclosure level of a project.
- Secret → Private: To move a project from secret to private, all members of the project and the appointed infohazard coordinator must agree.
- Private → Public: Before making public any information, all members of the project must agree. Also, members must consult external trusted sources and get a strong majority of approval.
My first impression about this is worrying.
Broadly, I'd also add another point:
Broadly I don't really know what to do about secrecy and find it very costly personally, so don't take any of my points too strongly.
Another thought:
infohazard coordinator
On first pass, I didn't pickup if this is always Connor or can be different people in Conjecture. Anyway, I think whoever it is should consider it an active responsibility to be very responsive to anyone's requests or queries. The default thing that happens when there's a person with massive power over a project but isn't in constant contact with the project, is that they slow everything down. Like if it were my job, I might be like...
Okay, I really don't know, there's a bunch of factors. I don't know if the infohazard coordinator is actually on the team of the project they're coordinator for, I don't know how many projects they're coordinator for, and I don't know how fast requests need to be answered. Nonetheless, here's the kind of rule-set I can imagine making sense.
My guess is without this sort of ruleset, without the infohazard coordinator taking on the responsibility to respond extremely quickly whenever they're blocking a research team, at some point folks will be asking "Why did Project X not get finished 2 months faster?" and the answer will be "Well it was too costly for us to get in-sync with the infohazard coordinator about who we could share on this project because they were always busy with other projects, so we ended up not sharing our work with Alice, Bob, or Charlie until much later than we otherwise would have, and each time we did we got a big speed up."
This post benefited from feedback and comments from the whole Conjecture team, as well as others including Steve Byrnes, Paul Christiano, Leo Gao, Evan Hubinger, Daniel Kokotajlo, Vanessa Kosoy, John Wentworth, Eliezer Yudkowsky. Many others also kindly shared their feedback and thoughts on it formally or informally, and we are thankful for everyone's help on this work.
Much has been written on this forum about infohazards, such as information that accelerates AGI timelines, though very few posts attempt to operationalize that discussion into a policy that can be followed by organizations and individuals. This post makes a stab at implementation.
Below we share Conjecture’s internal infohazard policy as well as some considerations that we took into account while drafting it. Our goal with sharing this on this forum is threefold:
Note that at the current level of implementation, mutual trust relies mostly on the consequence of "if you leak agreed-upon secrets your reputation is forever tarnished.” But since alignment is a small field, this seems to carry sufficient weight at current scale.
Overview and Motivation
“Infohazard” is underspecified and has been used to mean both “information that directly harms the hearer such that you would rather not hear it” and “information that increases the likelihood of collective destruction if it spreads or falls into the wrong hands.”
At Conjecture the kind of infohazard that we care about are those that accelerate AGI timelines, i.e., capabilities of companies, teams, or people without restraint. Due to the nature of alignment work at Conjecture it is assured that some employees will work on projects that are infohazardous in nature, as insights about how to increase the capabilities of AI systems can arise while investigating alignment research directions. We have implemented a policy to create norms that can protect this kind of information from spreading.
The TL;DR of the policy is: Mark all internal projects as explicitly secret, private, or public. Only share secret projects with selected individuals; only share private projects with selected groups; share public projects with anyone, but use discretion. When in doubt consult the project leader or the “appointed infohazard coordinator”.
We need an internal policy like this because trust does not scale: the more people who are involved in a secret, the harder it is to keep. If there is a probability of 99% / 95% / 90% that anyone keeps all Conjecture-related infohazard secrets, the probability of 30 people doing so drops to 74% / 21% / 4%. This implies that if you share secrets with everyone in the company, they will leak out.
Our policy leans conservative because leaking infohazardous information could lead to catastrophic outcomes. In general, reducing the spread of infohazards means more than just keeping them away from companies or people that could understand and deploy them. It means keeping them away from anyone, since sharing information with someone increases the opportunities it has to spread.
Considerations
An infohazard policy needs to strike the right balance between what should and what should not be disclosed, and to whom. The following are a number of high level considerations that we took into account when writing our policy:
In other words, we need to balance many different considerations, not merely whether “it is an infohazard or not”.
The Policy
(Verbatim from Conjecture’s internal document.)
Introduction
This document is for Conjecture and includes employees, interns, and collaborators. Note that this policy is not retroactive; any past discussions on this subject have been informal.
This policy applies only to direct infohazards related to AGI Capabilities. To be completely clear: this is about infohazards, not PR hazards, reputational hazards, etc.; and this is about AGI capabilities.
Examples of presumptive infohazards:
1-3 are obvious. 4-5 are dangerous because they attract more attention to ideas that increase average negative externality. If in the future we want to hide more types of information that are not covered by the current policy, we should explicitly extend the scope of what is hidden.
Siloing of information and projects is important even within Conjecture. Generally any individual team member working on secret projects may disclose to others that they are working on secret projects, but nothing more.
The default mantra is “need to know”. Does this person need to know X? If not, don’t say anything. Ideally, no one that does not need to know should know how many secret projects exist, which projects people work on, and what any of those projects are about.
While one should not proactively offer that they are keeping a secret, we should strive for meta-honesty. This means that when asked directly we should be transparent that we are observing an infohazard policy that hides things, and explain why we are doing so.
Rules
There are three levels of disclosure that we will apply.
We will consider these levels of disclosure for following types of information:
Each project that is secret or private must have an access document associated with it that lists who knows about the secret and any whitelisted information. This document is a minor infosecurity hazard, but is important for coordination.
An appointed infohazard coordinator has access to all secrets and private projects. For Conjecture, this person is Connor, and the succession chain goes Connor → Gabe → Sid → Adam. When collaborating with other organizations on a secret or private project, each organization’s appointed coordinator has access to the project. This clause ensures there is a dedicated person to discuss infohazards with, help set standards, and resolve ambiguity when questions arise. A second benefit of the coordinator is strategy: whoever is driving Conjecture should have a map of what we are working on and what we are intentionally not working on.
Leaking infohazardous information is a major breach of trust not just at Conjecture but in the alignment community as a whole. Intentional violation of the policy will result in immediate dismissal from the company. This applies to senior leadership as well. Mistakes are different from intentional leaking of infohazards.
More details on the levels of disclosure are below, and additional detail on consequences and the process for discerning if leaked information was shared intentionally or not is discussed in “Processes”.
Secret
Private
Public
Processes
1. Assigning Disclosure Levels
For new projects: Whenever a new project is spun up, the appointed infohazard coordinator and the project lead work will work together to assess if the content of the project is infohazardous and if it should be assigned as secret, private, or public. Each conversation will include:
(1) what information the project covers
(2) in what forms the information about the project already exists, e.g., written, repo, AF post, etc.
(3) who knows about the project, and who should know about the project
(4) proposed disclosure level
If the project is determined to be secret or private, an access document must be created that lists who knows about the project and any whitelisted information. Any information about the project that currently exists in written form must be moved to and saved in a repository or project folder with permissions limited to those on the access document list.
Anyone can ask the appointed infohazard coordinator to start a project as a secret. The default is to accept. At Conjecture, the burden of proof is on Connor if he wants to refuse, and he must raise an objection that proves that the matter is complicated enough to not accept immediately, and might change in the future. In general, any new technical or conceptual project that seems like it could conceivably lead to capabilities progress should be created as secret by default.
(We will return to this clause after some months of trialing this policy to write better guidelines for deciding what status to assign projects).
For current projects (changing disclosure levels): Anyone can propose changing the disclosure level of a project.
When collaborating with another organization, there should be one or more individuals that both parties agree is trusted to adjudicate on the matter.
Here, the burden of proof is on the individual proposing the change, and they should discuss the matter directly with the project leader or the appointed infohazard coordinator. If the coordinator (and in most cases the project lead) agree, follow the process in “for new projects” above.
Additionally, if the project was private and if this is feasible, check in with everyone that currently has access to the information to inform them that the disclosure level is changing to secret, and have them read the infohazard policy. Each person must be added to the list of people who know about the project. If these individuals will no longer be working on the project, they should still be noted as knowing about the project, but in a separate list.
2. Sharing Information
Each project must have an access document associated with it that lists who knows about the information and what information is whitelisted to discuss more freely. This list will be kept in a folder or git repository that only members of the secret or private project have access to.
Secret information can only be shared with the individuals who are written on the access list. Anyone in a secret project may propose adding someone new to the secret. First discuss adding the individual with the project leader, and then inform all current members and give them a chance to object. If someone within the team objects, the issue is escalated to the appointed infohazard coordinator, who has the final word. If the team is in unanimous agreement, the coordinator gets a final veto (it is understood that the coordinator is supposed to only use this veto if they have private information as to why adding this person would be a bad idea).
Private information can only be shared with members of groups who are written on the access list. Before sharing private information with person X, first check if the private piece of information has already been shared to someone from the same group as X. Then, discuss general infohazard considerations with X and acknowledge which select groups have access to this information. Then, notify others at Conjecture that you have shared the information with X. In case of doubt, ask first.
Public information can be talked about with anyone freely, though please be reasonable.
For all secret and private projects, by default information sharing should happen verbally and should be kept out of writing (in messages or documents) when possible.
3. Policy Violation Process
We ask present and future employees and interns to sign nondisclosure agreements that reiterate this infohazard policy. Intentional violation of the policy will result in immediate dismissal from the company. The verdict of whether the sharing was intentional or not will be determined by the appointed infohazard coordinator but be transparent to all members privy to the secret ((i.e., at Conjecture, Connor may unilaterally decide, but has his reputation and trust at stake in the process).
C-suite members of Conjecture are not above this policy. This is imperative because so much of this policy relies on the trust of senior leaders. As mentioned above, the chain of succession on who knows infohazards goes Connor → Gabe → Sid → Adam; though actual succession planning is outside the scope of this document. If it is Connor who is in question for intentionally leaking an infohazard, Gabe will adjudicate the process with transparency available to members of the group privy to the secret. Because of the severity of this kind of decision, we may opt to bring in external review to the process and lean on the list of “Trusted Sources” above.
Mistakes are different from intentional sharing of infohazards. We will have particular lenience during the first few months that this policy is active as we explore how it is to live with. We want to ensure that we create as robust a policy as possible, and encourage employees to share mistakes as quickly as possible such that we can revise this policy to be more watertight. Therefore, unless sharing of infohazardous information that is particularly egregious, nobody will be fired for raising a concern in good faith.
4. Information Security and Storage
[Details of Conjecture’s infosecurity processes are - for infosecurity reasons - excluded here.]
5. Quarterly Policy Review
We will review this policy as part of our quarterly review cycle. The policy will be discussed by all of Conjecture in a team meeting, and employees will be given the opportunity to talk about what has gone well and what has not gone well. In particular, the emphasis will be on clarifying places where the policy is not clear or introduces contradictions, and adding additional rules that promote safety.
The quarterly review will also be an opportunity for Project Leaders to review access documents to ensure lists of individuals and whitelisted information for each project are up-to-date and useful.
This policy will always be available for employees at Conjecture to view and make suggestions on, and the quarterly review cycle will be an opportunity to review all of these comments and make changes as needed.
Additional Considerations
The information below is not policy, but is saved alongside Conjecture’s internal policy for employee consideration.
Example Scenarios
It is difficult to keep secrets and few people have experience keeping large parts of their working life private. Because of this, we anticipate some infohazardous information will leak due to mistakes. The following examples are common situations where infohazardous information could leak; we include potential responses to illustrate how an employee could respond.
Potential response: Consider discussing the matter in private with the project lead or appointed infohazard coordinator. If it is unknown whether information could potentially be infohazardous, it is safer to assume risk. A secret project could be spun off from the public project to investigate how infohazardous it is. If the experimental direction is safe, it could be updated to be public. If the experimental direction is infohazardous, it could stay secret. If the experimental direction is sufficiently dangerous, the formerly public project could be made secret by following the process in “Assigning Disclosure Levels” in the policy.
Potential response: Ultimately, a policy should be practical. Sharing information makes people more effective at doing alignment research. There is always a small probability that things can go wrong, but if you feel that an idea has low P(experiments result in capabilities boost) while also being additive to alignment, you can discuss it without treating it as secret. That said, if you have any doubt as to whether this is the case or not in a particular situation, see scenario (1).
Potential response: Mention the public projects. You may mention the fact that there are private and secret projects that we do not discuss, even if you are not part of any. If the individual is a member of one of the groups, you may mention the private projects the group the person belongs to is privy to.
Potential response: The fact that this is a time limited event should not change anything. One must go through the process, and the process takes time. This is a feature and not a bug. Concretely, this means you do not discuss that secret project or the ideas related to the project with that person. Feel free to learn more about how far that person is in their idea though.
Potential response: Mention this to the project lead and appointed infohazard coordinator as soon as possible before returning to the people, and discuss what to do with them. Because these situations are highly context dependent it is best to treat each on a case by case basis rather than establishing one general rule for mistakes.
Potential response: This depends on how good you are with words. If you confidently know you are good enough to hold this conversation without spilling beans, go. Else, if you have any doubt, mention this to your project lead and the appointed infohazard coordinator.
Best Practices
The following are a number of miscellaneous recommendations and best practices on infohazard hygiene. Employees should review these and consider if their current approach is in line with these recommendations.
Psychological Safety
Working on a secret project and not being able to talk about what you’re doing and thinking about can take an emotional toll. The nature of Conjecture (startup, generally young, mostly immigrants) means that for most employees, coworkers provide the majority of socialization, and a large aspect of socialization with coworkers is talking about projects and ideas.
On one hand, the difficulty of secret-keeping should be embraced. The fact that it takes an emotional toll is not coincidence, and is well aligned with reality. Mitigations against this may make things worse, and we should default towards not employing people if they have difficulty holding secrets.
On the other hand, we do not currently have the bandwidth to be perfectly selective as to who we hire and assign to secret projects. And we can’t rely on people self-reporting that they'll be incapable of holding a secret before being hired or assigned to a project. Most people don't have a good counterfactual model of themselves.
Therefore psychological safety is not just a concern for the emotional well-being of employees but also for the robustness of this policy. Someone who is feeling stressed or isolated is more likely to breach secrecy. Emotional dynamics are just as real a factor in the likelihood that secrets get shared as the number of people who know the secret. In both cases we assume human fallibility. If we only ever hired infallible people, there would be no reason to have internally siloed projects.
Potential risk factors that amplify the likelihood that an infohazard is revealed:
As such, we will consider taking some possible solutions into account with our approach to infohazardous projects such as not assigning people only to siloed projects, siloing projects between collaborators who are used to being very open with each other, or adding a trusted emotional support person to project siloes who knows only high-level and not implementation details. Note that Conjecture will not guarantee following any of these steps, and therefore this is not policy but rather general considerations.
In general, employees reading this policy should understand that mental health and psychological safety are taken seriously at Conjecture, and that if there are ever any concerns about this, that they should raise any concerns with senior management or whomever else they are comfortable speaking with. Rachel and Chris have both volunteered as confidants if individuals would prefer to express concerns to someone besides Connor, Gabe, or Sid.
An additional emotional consideration is that it should cost zero social capital to have and keep something secret. This is very much not the default without a written policy, where it often costs people social capital and additional effort to keep something secret. The goal at Conjecture is for this not to be the case, and for anyone to be able to comfortably keep things secret by default without institutional or cultural pushback. We also intend for this policy to reduce overhead (the need to figure out bespoke solutions for how to handle each new secret) and stress (the psychological burden of keeping a secret). Having access to a secret is by no means a sign of social status. In that vein, a junior engineer might have access to things that a senior engineer does not.