Aligning to Virtues

Feb 16

Which alignment target?

5 Comments

I like the critiques of consequentalism and deontological principles, but "virtue" still feels like a stand in for the flexible nuanced elusive thing we want AIs aligned to. Like conceptual negative space.

sepiatone

Feb 23

Would Anthropic's move from the earlier Constitutional AI to the current "soul" document approach be considered moving from a purely deontological (obey rules) approach to a proto-virtue approach (express certain traits like honesty, kindness, helpfulness, and possess a stable disposition)?

Kurt Pieper

Jun 21

The problem is that virtue ethics, as opposed to consequentialism, underspecify behaviour. i.e. after an ASI is made virtuous, there would still be conflict about what *exactly* it should do. Furthermore, power-seekers are much more darwinianstically plausible/favored unfortunately.

Ryan Baker

May 23

I feel like that's a critique of a caricature of consequentalism. Still useful as someone may try to implement a caricature and that deserves critique. I think virtues have a difficult time existing separately from consequentalism. The space between them, the flex, what fills that in? Seems like it's reasoning of consequences.

That said, I think virtues are a useful entry point here. If depth of reasoning isn't achieved, virtues are a great way to add some depth, and constrain simplistic failures. Virtues still allow for complex failures, but these may be easier to self-detect, at least self-detect the risk of, and steer failures towards the less harmful, or more correctable outcomes (wait, rather than act mostly).

I think the Anthropic constitution approach is in many ways an expression of this.

Shadow Rebbe

May 6

https://www.meaningalignment.org/

are you aware of their work?

Mind the Future

Aligning to Virtues