The reduction of a general dense and square matrix to Hessenberg form is a well known first step in many standard eigenvalue solvers. Although parallel algorithms exist, the Hessenberg reduction is still one of the bottlenecks in state-of-the-art software for the distributed QR algorithm. We propose a new NUMA-aware algorithm that fits the context of the QR algorithm and evaluate the tunability of its algorithmic parameters. The proposed algorithm can be faster than LAPACK and ScaLAPACK for small problem sizes. In addition, evaluating the algorithmic parameters shows that there is potential for auto-tuning some of the parameters.