Publications

Generating diverse and informative natural language fashion feedback

Gil Sadeh, Lior Fritz, Gabi Shalev, Eduard Oks

CVPR 2019 Workshop on Language and Vision

2019

Recent advances in multi-modal vision and language tasks enable a new set of applications. In this paper, we consider the task of generating natural language fashion feedback on outfit images. We collect a unique dataset, which contains outfit images and corresponding positive and constructive fashion feedback. We treat each feedback type separately, and train deep generative encoder-decoder models with

Computer vision

Estimating uncertainty in instance segmentation using dropout sampling

Doug Morrison, Anton Milan, Nontas Antonakos

CVPR 2019 Robotic Vision Probabilistic Object Detection Challenge

2019

Vision is an integral part of many robotic systems, and especially so when a robot must interact with its environment. In such cases, decisions made based on erroneous visual detections can have disastrous consequences. Hence, being able to accurately measure the uncertainty associated with visual information is highly important for making informed decisions. However, this uncertainty is often not captured

Computer vision

Clothing recognition in the wild using the Amazon catalog

Fabian Caba Heilbron, Bojan Pepik, Zohar Barzelay, Michael Donoser

ICCV 2019 Workshop on Computer Vision for Fashion, Art and Design

2019

The emergence of online inﬂuencers, the explosion of video content, and the massive amount of movie collections have served as an advertising vehicle for the fashion industry. This trend has created the need for automated methods that recognize people’s outﬁt in such image and video collections. However, existing computer vision solutions for fashion recognition require an enormous amount of labeled data

Computer vision

On the accuracy of video quality measurement techniques

Deepthi Nandakumar, Hai Wei, Yongjun Wu, Avisar Ten-Ami

MMSP 2019

2019

With the massive growth of Internet video streaming, it is critical to accurately measure video quality subjectively and objectively, especially HD and UHD video which is bandwidth-intensive. We summarize the creation of a database of 200 clips, with 20 unique sources tested across a variety of devices. By classifying the test videos into 2 distinct quality regions SD and HD, we show that the high correlation

Computer vision

Bag of tricks for image classification with convolutional neural networks

Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

CVPR 2019

2019

Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods. In the literature, however, most refinements are either briefly mentioned as implementation details or only visible in source code. In this paper, we will examine a collection of such refinements and empirically evaluate their

Computer vision

Balancing specialization, generalization, and compression for detection and tracking

Dotan Kaufman, Koby Bibas, Michael Chertok, Eran Borenstein, Tal Hassner

BMVC 2019

2019

We propose a method for specializing deep detectors and trackers to restricted settings. Our approach is designed with the following goals in mind: (a) Improving accuracy in restricted domains; (b) preventing overfitting to new domains and forgetting of generalized capabilities; (c) aggressive model compression and acceleration. To this end, we propose a novel loss that balances compression and acceleration

Computer vision

Learning event sequence embedding for dense event-based deep stereo

Stepan Tulyakov, Françoise Fleuret, Martin Kiefel, Peter Gehler, Michael Hirsch

ICCV 2019

2019

Today, a frame-based camera is the sensor of choice for machine vision applications. However, these cameras, originally developed for acquisition of static images rather than for sensing of dynamic uncontrolled visual environments, suffer from high power consumption, data rate, latency and low dynamic range. An event-based image sensor addresses these drawbacks by mimicking a biological retina. Instead

Machine learning

Scalable automated system For benchmarking user experience with smart devices

Zongyi Liu

ICCE 2019

2019

In this paper, we present an automated scalable system that measures user experience on smart devices such as TVs, tablets and smart phones. The system consists of three parts: (i) a robot with a mobile arm to perform touches and clicks on a tested device such as a tablet or a phone, and sensors to capture the video signals, (ii) a signal capturing process records the input video in real time, controlled

Computer vision

OCGAN: One-class novelty detection using GANs with constrained latent representations

Pramuditha Perera, Ramesh Nallapati, Bing Xiang

CVPR 2019

2019

We present a novel model called OCGAN for the classical problem of one-class novelty detection, where, given a set of examples from a particular class, the goal is to determine if a query example is from the same class. Our solution is based on learning latent representations of in-class examples using a denoising auto-encoder network. The key contribution of our work is our proposal to explicitly constrain

Computer vision

Co-occurrent features in semantic segmentation

Hang Zhang, Han Zhang, Chenguang Wang, Junyuan Xie

CVPR 2019

2019

Recent work has achieved great success in utilizing global contextual information for semantic segmentation, including increasing the receptive field and aggregating pyramid feature representations. In this paper, we go beyond global context and explore the fine grained representation using co-occurrent features by introducing Co-occurrent Feature Model, which predicts the distribution of co-occurrent features

Computer vision

d-SNE: Domain adaptation using stochastic neighborhood embedding

Xiang Xu, Xiong Zhou, Ragav Venkatesan, Gurumurthy Swaminathan, Orchid Majumder

CVPR 2019

2019

Deep neural networks often require copious amount of labeled-data to train their scads of parameters. Training larger and deeper networks is hard without appropriate regularization, particularly while using a small dataset. Laterally, collecting well-annotated data is expensive, time-consuming and often infeasible. A popular way to regularize these networks is to simply train the network with more data from an alternate representative dataset...

Computer vision

Domain-adaptive pedestrian detection in thermal images

Tiantong Guo, Cong Phuoc Huynh, Mashhour Solh

CVPR 2019

2019

This paper presents an approach to pedestrian detection in thermal infrared (thermal) images with limited annotations. The key idea is to adapt the abundance of color images associated with bounding box annotations to the thermal domain for training the pedestrian detector...

Computer vision

Scalable logo recognition using proxies

István Fehérvári, Srikar Appalaraju

WACV 2019

2019

Logo recognition is the task of identifying and classifying logos. Logo recognition is a challenging problem as there is no clear definition of a logo and there are huge variations of logos, brands and re-training to cover every variation is impractical. In this paper, we formulate logo recognition as a few-shot object detection problem. The two main components in our pipeline are universal logo detector

Computer vision

Dynamics and periodicity based multirate fast transient-sound detection

Jun Yang, Philip Hilmes

EUSIPCO 2018

2018

This paper proposes an efficient real-time multirate fast transient-sound detection algorithm on the basis of emerging microphone array configuration intended for multimedia signal processing application systems such as digital smart home. The proposed detection algorithm first extracts the dynamics and periodicity features, then trains the model parameters of these features on Amazon machine learning platform

Conversational AI

CRAFT: Complementary recommendation by adversarial feature transform

Cong Phuoc Huynh, Arridhana Ciptadi, Ambrish Tyagi, Amit Agrawal

ECCV 2018

2018

We propose a framework that harnesses visual cues in an unsupervised manner to learn the co-occurrence distribution of items in real-world images for complementary recommendation. Our model learns a non-linear transformation between the two manifolds of source and target item categories (e.g., tops and bottoms in outfits). Given a large dataset of images containing instances of co-occurring items, we train

Computer vision

Publications

Latest news

Work with us