We introduce a novel fusion framework for real-time head pose estimation using a tailored Kalman Filter. This approach estimates the pose from intensity images in monocular video data. The method is robust to extreme head rotations and varying illumination, with real-time capability. Our framework incorporates the head pose computed from a keypoint-based tracking scheme into the prediction step of the Kalman Filter and the head pose computed from a facial-landmark-based detection scheme into the correction step. The head pose from the tracking scheme is estimated from 2D keypoints tracked in two consecutive frames in the region of the head and their 3D projection on a simple geometric model. In contrast, the head pose from the detection scheme is estimated from 2D facial landmarks detected in each frame and their 3D correspondences retrieved through triangulation. In each scheme, the head pose results from the minimization of the reprojection error from the 3D-2D correspondences. In each iteration, we update the state transition matrix of the filter and subsequently the estimated covariance. We evaluated our approach on a publicly available dataset and compared with related methods of the state of the art. Our approach could achieve similar performance in terms of mean average error, while operating in real time. Furthermore, we tested our method on our own dataset, to evaluate its performance in the presence of large head rotations. We show good results even in cases where facial landmarks are partially occluded.