Systems which are narrow banded and strictly diagonally dominant by rows can be solved in parallel using a variety of methods including incomplete block cyclic reduction. We show how to accelerate the algorithm by approximating the very first step. We derive tight estimates for the forward error and explain why our procedure is suitable for linear systems obtained by discretizing some common parabolic PDEs. An improved ScaLAPACK style algorithm is presented together with strong scalability results.