Internet is one of the largest scale distributed system made up of multiple networks that is used to digitally connect billions of users. Traffic Engineering (TE) is a core problem in networking, which is responsible for routing packets across networks to provide the best user experience while ensuring a secure, stable, well-utilized and cost-efficient network. The time-varying graph nature of the network along with unexpected topology and traffic changes makes TE in large operational networks challenging. This paper first provides a holistic sketch of different viewpoints taken by researchers and practitioners to formulate and solve the TE problem. We first cover the systems view, where researches have defined the problem and used creative heuristic protocols to manage large networks. We then focus on the theoretical formulations from optimization and control theory to provide optimal and stable networks. These formulations provide clear definitions and provable properties for designing TE. We devise a taxonomy of existing studies on how such theoretical problems are being solved today. Finally, we present the AI/ML and heuristic perspective for near-optimal TE. Owing to the dynamism and large scale of the networks, especially planet-scale cloud networks, these theoretical methods need real-time data and greedy approximations to solve TE. Recent progress in AI/ML provides encouraging tools here, where time-series and graph based AI/ML models can be used to detect/predict the network state, and control algorithms such as Reinforcement Learning (RL) be powerful tools to solve TE. We survey these approaches and propose promising directions towards AI/ML in network control and TE.
Research areas