Tag: LLM cross-modal representation capability