In response to the growing emphasis on sustainability in federated learning (FL), this research introduces a dynamic, dual-objective optimization framework called Carbon-Conscious Federated Reinforcement Learning (CCFRL). By leveraging Reinforcement Learning (RL), CCFRL continuously adapts client allocation and resource usage in real-time, optimizing both carbon efficiency and model performance. Unlike static or greedy methods that prioritize short-term carbon constraints, existing approaches often suffer from either degrading model performance by excluding high-quality, energy-intensive clients or failing to adequately balance carbon emissions with long-term efficiency. CCFRL addresses these limitations by taking a more sustainable method, balancing immediate resource needs with long-term sustainability, and ensuring that energy consumption and carbon emissions are minimized without compromising model quality, even with non-IID (non-independent and identically distributed) and large-scale datasets. We overcome the shortcomings of existing methods by integrating advanced state representations, adaptive exploration and exploitation transitions, and stagnating detection using t-tests to better manage real-world data heterogeneity and complex, non-linear datasets. Extensive experiments demonstrate that CCFRL significantly reduces both energy consumption and carbon emissions while maintaining or enhancing performance. With up to a 61.78% improvement in energy conservation and a 64.23% reduction in carbon emissions, CCFRL proves the viability of aligning resource management with sustainability goals, paving the way for a more environmentally responsible future in cloud computing.