VLM-IRP
Input representations for spatial reasoning in vision-language models.
PDF
Code