In this paper, we address the problem of extreme head pose estimation from intensity images, in a monocular setup. We introduce a novel fusion pipeline to integrate into a dedicated Kalman Filter the pose estimated from a tracking scheme in the prediction stage and the pose estimated from a detection scheme in the correction stage. To that end, the measurement covariance of the Kalman Filter is updated in every frame. The tracking scheme is performed using a set of keypoints extracted in the area of the head along with a simple 3D geometric model. The detection scheme, on the other hand, relies on the alignment of facial landmarks in each frame combined with 3D features extracted on a head mesh. The head pose in each scheme is estimated by minimizing the reprojection error from the 3D-2D correspondences. By combining both frameworks, we extend the applicability of head pose estimation from facial landmarks to cases where these features are no longer visible. We compared the proposed method to other related approaches, showing that it can achieve state-of-the-art performance. We also demonstrate that our approach is suitable for cases with extreme head rotations and (self-) occlusions, besides being suitable for real time applications.