Wiki-Tags in Need of Work

Axioms (together with definitions) forms the basis of mathematical theorems. Every mathematical theorem is only proven inside its axiom system... (read more)

AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:.. (read more)

The Open Agency Architecture ("OAA") is an AI alignment proposal by (among others) @davidad and @Eric Drexler.  .. (read more)

Singluar learning theory is a theory that applies algebraic geometry to statistical learning theory, developed by Sumio Watanabe. Reference textbooks are "the grey book", Algebraic Geometry and Statistical Learning Theory, and "the green book", Mathematical Theory of Bayesian Statistics.

Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)

Open Threads are informal discussion areas, where users are welcome to post comments that didn't quite feel big enough to warrant a top-level post, nor fit in other posts... (read more)

A Black Marble is a technology that by default destroys the civilization that invents it. It's one type of Existential Risk. AGI may be such an invention, but isn't the only one... (read more)

AI Evaluations focus on experimentally assessing the capabilities, safety, and alignment of advanced AI systems. These evaluations can be divided into two main categories: behavioral and understanding-based... (read more)

Object-Level AI Risk Skepticism is the view that the potential risks posed by artificial intelligence (AI) are overstated or misunderstood, specifically regarding the direct, tangible dangers posed by the behavior of AI systems themselves. Skeptics of object-level AI risk argue that fears of highly autonomous, superintelligent AI leading to catastrophic outcomes are premature or unlikely.

Tag Voting Activity

User Post Title Tag Pow When Vote

Recent Tag & Wiki Activity

Updateless Decision Theory (UDT) is a decision theory meant to deal with a fundamental problem in the existing decision theories: dynamic inconsistency, IE, having conflicting desires over time. In behavioral economics, humans are often modeled as hyperbolic discounters, meaning that rewards further away in time are seen as proportionately less important (so getting $200$100 one week from now is as good as $100$200 two weeks from now). This is dynamically inconsistent because the relative value of rewards changes as they get closer or further away in time. (Getting $200$100 one year from now sounds about the same asmuch less desirable than getting $100$200 one year plus one week from now.) This model explains some human behaviors, such as snoozing alarms repeatedly.[1]

Top level posts that review essays from 2023 in a more holistic way.

2023 Longform Reviews

Top level posts that review essays from 2023 in a more holistic way.

Aligned AI Proposals are proposals aimed at ensuring artificial intelligence systems behave in accordance with human intentions (intent alignment) or human values (value alignment).

The main goal of these proposals is to ensure that AI systems will, all things considered, benefit humanity.

AIXI is not a feasible AI, because Solomonoff induction is not computable, and because some environments may not interact over finite time horizons (AIXI only works over some finite time horizon, though any finite horizon can be chosen).computable. A somewhat more computable variant is the time-space-bounded AIXItl. Real AI algorithms explicitly inspired by AIXItl, e.g. the Monte Carlo approximation by Veness et al. (2011) have shown interesting results in simple general-intelligence test problems.

OODA Loops stand for "observe, orient, decide, act". 

OODA Loops

OODA Loops stand for "observe, orient, decide, act". 

Anthropic is an AI safety and research company based in San Francisco that's working to build reliable, interpretable, and steerable AI systems.Francisco. The company is known for developing the Claude AI family and publishing research on AI alignment, safety, and scalable oversight.safety.

Alignment taxTax (sometimes called a safety tax) is the extra cost of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative. The term ‘tax’ can be misleading: in the safety literature, ‘alignment/safety tax’ or ‘alignment cost’ is meant to refer to increased developer time, extra compute, or decreased performance, and not only to the financial cost/tax required to build an aligned system.

An alignmentAlignment tax (sometimes called a safety tax) is the extra cost of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative. The term ‘tax’ can be misleading: in the safety literature, ‘alignment/safety tax’ or ‘alignment cost’ is meant to refer to increased developer time, extra compute, or decreased performance, and not only to the financial cost/tax required to build an aligned system.

In order to get a better idea of what the alignment tax is, consider some of the cases that lie at the edges. The best case scenario is No Tax: This means we lose no performance by aligning the system, so there is no reason to deploy an AI that is not aligned, i.e., we might as well align it. The worst case scenario is  Max Tax: This means that we lose all performance by aligning the system, so alignment is functionally impossible. So you either deploy an unaligned system, or you don’t get any benefit from AI systems at all.  We expect something in between these two scenarios to be the case.

  • Limited temporal scope in decision-making
  • Focus on immediate reward optimization
  • Reduced instrumental incentives

     

PostsObject-Level AI Risk Skepticism is the view that expressthe potential risks posed by artificial intelligence (AI) are overstated or misunderstood, specifically regarding the direct, tangible dangers posed by the behavior of AI systems themselves. Skeptics of object-level reasonsAI risk argue that fears of highly autonomous, superintelligent AI leading to be skeptical of core AI X-Risk arguments. catastrophic outcomes are premature or unlikely.

Encultured AI is a for-profit public benefit corporation working to make AI safer and healthier for human beings.

Its current main strategy involves building a platform usable for AI safety and alignment experiments, comprising a suite of environments, tasks, and tools for building more environments and tasks.

Myopia meansrefers to short-sighted, particularly with respectsightedness in planning and decision-making processes. It describes a tendency to planning -- neglecting long-prioritize immediate or short-term consequences in favoroutcomes while disregarding longer-term consequences.

The most extreme form of the short term. The extreme case, in which myopia occurs when an agent considers only immediate rewards are considered, is of particular interest. We can think ofrewards, completely disregarding future consequences. In artificial intelligence contexts, a perfectly myopic agent as one that only considers how bestwould optimize solely for the current query or task without attempting to answer the single question that you give to it rather than considering any sort of long-term consequences. Such an agent might have a number of desirable safety properties, such as a lack of influence future outcomes.

Myopic agents demonstrate several notable properties:

Corrigibility is an AI system's capacity to be safely and reliably modified, corrected, or shut down by humans after deployment, even if doing so conflicts with its current objectives.

Within the field of machine learning, reinforcement learningReinforcement Learning refers tois the study of how to train agents to complete tasks by updating ("reinforcing") the agents with feedback signals. 

A Neuromorphic AI ('neuron-shaped') is a form of AI where most of the functionality has been copied from the human brain. This implies that its inner workings are not necessarily understood by the creators any further than is necessary to simulate them on a computer. It is considered a more unsafe form of AI than either Whole Brain Emulation or de novo AI because its lacks the high quality replication of human values of the former and the possibility of good theoretical guarantees that the latter may have due to cleaner design.

Machine Learning refers to theis a general field of study that deals with automated statistical learning and pattern detection by non-biological systems. It can be seen as a sub-domain of artificial intelligence that specifically deals with modeling and prediction through the knowledge extracted from training data. As a multi-disciplinary area, it has borrowed concepts and ideas from other areas like pure mathematics and cognitive science.

Language modelsModels are computer programs made to estimate the likelihood of a piece of text. "Hello, how are you?" is likely. "Hello, fnarg horses" is unlikely.