This paper presents a novel approach to address the head pose estimation (HPE) problem in real world and demanding applications. We propose a new framework that combines the detection of facial landmarks with the tracking of salient features within the head region. That is, rigid facial landmarks are detected from a given face image, while at the same time, salient features are detected within the head region. The 3D coordinates of both set of features result from their intersection on a simple geometric head model (e.g., cylinder or ellipsoid). We then formulate the HPE problem as a perspective-n-point problem that we separately solve by minimizing the reprojection error of each 3D features set and their corresponding facial or salient features in the next face image. The resulting head pose estimations are then combined using Kalman Filter, which allows us to take advantage of the high accuracy when using facial landmarks while enabling us to handle extreme head poses by using salient features. Results are comparable to those from the related literature, with the advantage of being robust under real world situations that might not be covered in the evaluated datasets.