Learning attribute-driven disentangled representations for interactive fashion retrieval
2021
Interactive retrieval for online fashion shopping provides the ability to change image retrieval results according to the user feedback. One common problem in interactive retrieval is that a specific user interaction (e.g., changing the color of a T-shirt) causes other aspects to change inadvertently (e.g., the retrieved item has a sleeve type different than the query). This is a consequence of existing methods learning visual representations that are semantically entangled in the embedding space, which limits the controllability of the retrieved results. We propose to leverage on the semantics of visual attributes to train convolutional networks that learn attribute-specific subspaces for each attribute to obtain disentangled representations. Thus operations, such as swapping out a particular attribute value for another, impact the attribute at hand and leave others untouched. We show that our model can be tailored to deal with different retrieval tasks while maintaining its disentanglement property. We obtain state-of-the-art performance on three interactive fashion retrieval tasks: attribute manipulation retrieval, conditional similarity retrieval, and outfit complementary item retrieval. Code and models are publicly available.
Research areas