We introduce two models for high precision sound event detection leveraging transfer learning. The sound events we detect include “speech”, “music”, and “chime”. Both models consist of a CNN backbone pre-trained using AudioSet for audio classification. To get high precision detection results, the first model employs transposed convolutional layers as the detection head, while the second model uses Feature