Recommendation systems have developed beyond simple matrix factorization to focus on two important sources of information: the temporal order of events (Hidasi et al., 2015) and side (e.g., spatial) information encoded in user and item features (Rendle, 2012). However, state-of-art temporal modeling is often limited by model capacity for long user histories. In addition, meta data are rarely used in generic