Roland Pihlakas - AI Alignment Forum

I agree, sounds plausible that this could happen. Likewise as we humans may build a strongly optimising agent because we are lazy and want to use simpler forms of maths. The tiling agents problem is definitely important.

That being said, agents properly understanding and modelling homeostasis is among the required properties (thus essential). It is not meant to be sufficient one. There may be no single sufficient property that solves everything, therefore there is no competition between different required properties. Required properties are conjunctive, they are all needed. My intuition is that homeostasis is one such property. If we neglect homeostasis then we are likely in trouble regardless of advances in other properties.

If we leave aside the question of sloppiness in creating sub-agents, I disagree with the zero cost assumption in the problem you described. I also disagree that it would be an expected and acceptable situation to have powerful agents having a singular objective. As the title of this blog post hints - we need a plurality of objectives.

Having a sub-agent does not change this. Whatever the sub-agent does, will be the responsibility or liability of the main agent who will be held accountable. Legally, one should not produce random sub-agents running amok.

In addition to homeostasis, a properly constructed sub-agent should understand the principle of diminishing returns in instrumental objectives. This topic I do mention towards the end of this blog post. We can consider wall-building as an instrumental objective. But instrumental objectives are not singular and in isolation either, there are also a plurality of these. Thus, spending excessive resources on a single instrumental objective is not economically cost-efficient. Therefore, it makes sense to stop the wall building and switch over to some other objective at some point. Or at least to continue improving the walls only when other objectives have been sufficiently attended to as well - thus providing balancing across objectives.

Secondly, a proper sub-agent should also keep in mind the homeostatic objectives of the main agent. If some homeostatic objective from among the plurality of homeostatic objectives would get harmed as a side effect of the excessive wall-building, then that needs to be taken into consideration. Depending on the situation, the main agent might potentially care about these side effects before it launches the sub-agent.

Thirdly, following the principles of homeostasis does not necessarily mean laziness and sloppiness in everything. Instead, homeostasis primarily notes that unbounded maximisation of a homeostatic objective is incompatible and harmful even for the very objective that was maximised for. In addition to potentially having side effects to the plurality of other objectives. So homeostasis is primarily about minding the target value as opposed to maximisation of the actual value. An additional relevant principle is minding the plurality of objectives.

Finally, when an agent has a task to produce 100 paper clips then that does not mean that the number of paper clips needs to stay at 100 after the task has been completed. Perhaps it is entirely expected that these 100 paper clips will be carried away by authorised parties. Walls help against theft and environmental degradation of produced paper clips, but we do not exactly need the walls to keep the paperclip number at 100 at all times - there is some deeper need or transaction behind the requested paper clips.

In order to avoid confusion, pointing also out that there are two types of balancing involved in these topics:
1. Balancing of an homeostatic objective - keeping the actual value of a single homeostatic objective near the target value - not too low, not too high.
2. Balancing across objectives - as a form of considering the utilities of multiple objectives equally. That means meeting them in such a manner that the homeostatic objectives have for example least-squares deviations, while unbounded objectives have approximately same utility value after the utility functions with diminishing returns have been applied to each actual value.

I am curious, how does this land with you and does this respond to your question?

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)

Roland Pihlakas2mo*10

A brief review of the reasons multi-objective RL could be important in AI Safety Research

Roland Pihlakas4y*20

You can apply the nonlinear transformation either to the rewards or to the Q values. The aggregation can occur only after transformation. When transformation is applied to Q values then the aggregation takes place quite late in the process - as Ben said, during action selection.

Both the approach of transforming the rewards and the approach of transforming the Q values are valid, but have different philosophical interpretations and also have different experimental outcomes to the agent behaviour. I think both approaches need more research.

For example, I would say that transforming the rewards instead of Q values is more risk-averse as well as "fair" towards individual timesteps, since it does not average out the negative outcomes across time before exponentiating them. But it also results in slower learning by the agent.

Finally there is a third approach which uses lexicographical ordering between objectives or sets of objectives. Vamplew has done work on this direction. This approach is truly multi-objective in the sense that there is no aggregation at all. Instead the vectors must be compared during RL action selection without aggregation. The downside is that it is unwieldy to have many objectives (or sets of objectives) lexicographically ordered.

I imagine that the lexicographical approach and our continuous nonlinear transformation approaches are complementary. There could be for example two main sets of objectives: one set for alignment objectives, the other set for performance objectives. Inside a set there would be nonlinear transformation and then aggregation applied, but between the sets there would be lexicographical ordering applied. In other words there would be a hierarchy of objectives. By having only two sets in lexicographical ordering the lexicographical ordering does not become unwieldy.

This approach would be a bit analogous to the approach used by constraint programming, though more flexible. The safety objectives would act as a constraint against performance objectives. An approach that is almost in absurd manner missing from classical naive RL, but which is very essential, widely known, and technically developed in practical applications, that is, in constraint programming! In the hybrid approach proposed in the above paragraph the difference from classical constraint programming would be that among the safety objectives there would still be flexibility and ability to trade (in a risk-averse way).

Finally, when we say "multi-objective" then it does not just refer to the technical details of the computation. It also stresses the importance of acknowledging the need for researching and making more explicit the inherent presence and even structure of multiple objectives inside any abstract top objective. To encode knowledge in a way that constrains incorrect solutions but not correct solutions. As well as acknowledging the potential existence of even more complex, nonlinear interactions between these multiple objectives. We did not focus on nonlinear interactions between the objectives yet, but these interactions are possibly relevant in the future.

I totally agree that in a reasonable agent the objectives or target values / set-points do change, as it is also exemplified by biological systems.

Until the Modem website is down, you can access our workshop paper here: https://drive.google.com/file/d/1qufjPkpsIbHiQ0rGmHCnPymGUKD7prah/view?usp=sharing

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments