Grinding, i.e. reducing the particle size of mined ore, is often the bottleneck of the mining concentrating process. Thus, even small improvements may lead to large increases in profit. The goal of the grinding circuit is two-sided; to maximize the throughput of ore, and minimize the resulting particle size of the ground ore within some acceptable range. In this work we study the control of a two-stage grinding circuit using reinforcement learning. To this end, we present a solution for integrating industrial simulation models into the reinforcement learning framework OpenAI Gym. We compare an existing PID controller, based on vast domain knowledge and years of hand-tuning, with a black-box algorithm called Proximal Policy Optimization on a calibrated grinding circuit simulation model. The comparison show that it is possible to control the grinding circuit using reinforcement learning. In addition, contrasting reinforcement learning from the existing PID control, the algorithm is able tomaximize an abstract control goal: maximizing profit as defined by a profit function given by our industrial collaborator. In some operating cases the algorithm is able to control the plant more efficiently compared to existing control.