In recent years, there has been growing interest in developing robots capable of explaining their behavior, thereby improving their acceptance by humans with whom they share their environment. Proposed software designs are typically based on the advances being made in conversational systems built on deep learning techniques. However, apart from the ability to formulate explanations, the robot also needs an internal episodic memory, where it stores information from the continuous stream of experiences. Most previous proposals are designed to deal with short streams of episodic data (several minutes long). With the aim of managing larger experiences, we propose in this work a high-level episodic memory, where relevant events are abstracted to natural language concepts. The proposed framework is intimately linked to a software architecture in which the explanations, whether externalized or not, are shaped internally in a collaborative process involving the task-oriented software agents that make up the architecture. The core of this process is a runtime knowledge model, employed as working memory whose evolution allows for capturing the causal events stored in the episodic memory. We present several use cases that illustrate how the suggested framework allows an autonomous robot to generate correct and relevant explanations of its actions and behavior.