AI DailyAI Daily – 2025-05-23(Evening)agentAGENTIF benchmark testAI ModelASL-3 safety ratingClaude 4 Behavior and Safety Evaluation ReportClaude 4 Opuscoding capabilityMultimodalmultimodal time-series large model ChatTSsafety evaluationSonnet 4SWE-bench Verified score