Discussion about this post

User's avatar
Davey's avatar

I like the critiques of consequentalism and deontological principles, but "virtue" still feels like a stand in for the flexible nuanced elusive thing we want AIs aligned to. Like conceptual negative space.

Ryan Baker's avatar

I feel like that's a critique of a caricature of consequentalism. Still useful as someone may try to implement a caricature and that deserves critique. I think virtues have a difficult time existing separately from consequentalism. The space between them, the flex, what fills that in? Seems like it's reasoning of consequences.

That said, I think virtues are a useful entry point here. If depth of reasoning isn't achieved, virtues are a great way to add some depth, and constrain simplistic failures. Virtues still allow for complex failures, but these may be easier to self-detect, at least self-detect the risk of, and steer failures towards the less harmful, or more correctable outcomes (wait, rather than act mostly).

I think the Anthropic constitution approach is in many ways an expression of this.

2 more comments...

No posts

Ready for more?