-
2025Following the great progress in text-conditioned image generation there is a dire need for establishing clear comparison benchmarks. Unfortunately, assessing performance of such models is highly subjective and notoriously difficult. Current automatic assessment of generated images quality and their alignment to text are approximate at best while human assessment is subjective, poorly calibrated and not
-
DCC 20252025Video compression enables the transmission of video content at low rates and high qualities to our customers. In this paper, we consider the problem of embedding a neural network directly into a video decoder. This requires a design capable of operating at latencies low enough to decode tens to hundreds of high-resolution images per second. And, additionally, a network with a complexity suitable for implementation
-
COLING 2025 Workshop on Evaluation of Multi-Modal Generation2025Multimodal generative AI usually involves generating image or text responses given inputs in another modality. The evaluation of image-text relevancy is essential for measuring response quality or ranking candidate responses. In particular, binary relevancy evaluation, i.e., “Relevant” vs. “Not Relevant”, is a fundamental problem. However, this is a challenging task considering that texts have diverse formats
-
2025Visual artifacts are often introduced into streamed video content, due to prevailing conditions during content production and delivery. Since these can degrade the quality of the user’s experience, it is important to automatically and accurately detect them in order to enable effective quality measurement and enhancement. Existing detection methods often focus on a single type of artifact and/or determine
-
2025Given a small number of images of a subject, personalized image generation techniques can fine-tune large pre-trained text-to-image diffusion models to generate images of the subject in novel contexts, conditioned on text prompts. In doing so, a trade-off is made between prompt fidelity, subject fidelity and diversity. As the pre-trained model is fine-tuned, earlier checkpoints synthesize images with low
Related content
-
June 24, 2022The field motivated him to pursue a PhD, which eventually led him to Amazon.
-
June 23, 2022EMVA Young Professional Award honors “outstanding and innovative work of a student or a young professional in the field of machine vision or image processing.”
-
June 22, 2022CVPR papers examine the recovery of 3-D information from camera movement and learning general representations from weakly annotated data.
-
June 21, 2022How she moved across the world to discover a passion for (and a career in) machine learning.
-
June 20, 2022Amazon’s director of applied science in Adelaide, Australia, believes the economic value of computer vision has “gone through the roof".
-
June 16, 2022Senior principal scientist Aleix M. Martinez on why computer vision research has only begun to scratch the surface.