Few-shot learning techniques rely on generalizations of a base model trained on a large training set in order to transfer learn to more specialized tasks. Such techniques extend unsupervised
learning models by allowing for fine-tuning a general model to a domain of interest using a relatively low number of training samples. This is especially important for applications where data may be non-homogeneous and difficult to label, such as product catalog data that vary drastically from category to category and require domain expertise for labeling. In this paper, we explore the application of a few-shot learning model to infer structured product attribute values from unstructured descriptive text. We experiment with Task-Aware Representation of Sentences (TARS) model [3] applied to catalog data, and further extend the model to support training on negative examples. We demonstrate that with a few labeled examples, we can train a specialized model for each attribute of a product category, and provide results of comparable quality to current state-of-the art techniques. Results show that with only 40 training examples per product attribute, the model, when cross trained over attributes, produces improved recall and comparable precision over an existing baseline model that relies on tens of thousands of examples per product attribute. To further exemplify the generalization power, experiments are conducted using synthetic training data, extracted automatically from search queries. This completely eliminates the need for manually labeled examples and further leverages customer behavioral signals to help prioritize model training by what customers deem important.
Enhancement and analysis of TARS few-shot learning model for product attribute extraction from unstructured texts
2021
Research areas