Confidence estimation for Speech Emotion Recognition (SER) is instrumental in improving the reliability in the behavior of downstream applications. In this work we propose (1) a novel confidence metric for SER based on the relationship between emotion primitives: arousal, valence, and dominance (AVD) and emotion categories (ECs), (2) EmoConfidNet - a DNN trained alongside the EC recognizer to predict the