Large language models are now evolving beyond their early unimodal days, when they could only process one type of data input. Nowadays, interest is shifting toward multimodal large language models (MLLMs), with reports suggesting that the multimodal AI market will grow by 35% annually to $4.