Tag: Multi-stage preference optimization