Gata6 and Hnf1ba are the major downstream effectors of retinoic acid for the specification of zebrafish pancreas

Retinoic acid (RA) is a key signaling molecule required for the specification of the pancreatic field within the endodermal germ layer. Still, the gene regulatory cascade triggered by RA in endoderm remains poorly characterized. In this study, we investigated the gene regulatory network induced by RA signaling in zebrafish endodermal cells by a combination of RNA-seq, RAR ChIP-seq and ATAC-seq experiments. By analysing the effect of RA and BMS439 on the transcriptome and on the chromatin accessibility of endodermal cells, we identified a large set of genes and regulatory regions regulated by RA signaling. Localization of RAR binding sites in the zebrafish genome by ChIP-seq highlighted the putative direct RAR target genes. Among them, Hnf1ba and Gata6, two known pancreatic regulatory factors activated by RA treatment, play a crucial role in opening chromatin at many genomic loci as revealed by the strong enrichment of their sequence binding motifs in RA-induced nucleosome-free regions. Furthermore, comparison of RAR ChIP-seq data obtained in zebrafish and in mice highlights the evolutionary-conserved direct targets, comprising the well-known Cyp26a or Hox genes but also Hnf1b and Gata6. Some RAR binding sites are located in highly conserved noncoding regions revealing the strong evolutionary constraint to maintain the function of such regulatory sequences. Among them, we identify a novel RA-induced enhancer located far upstream from the Hoxb Locus. In conclusion, our data reveal the central role of HNF1ba and Gata6 as pioneer transcription factors for the RA-dependent specification of the pancreatic field and highlight the RAR sites conserved from fish to mammals.


Introduction
The endoderm is the embryonic germ layer that gives rise, among other organs, to the pancreas.
During gastrulation and early somitogenesis, dynamic movements bring the endoderm into proximity with mesodermal tissues that will secrete different factors to pattern the endoderm, including FGF, Wnt, RA, and BMP ligands. Retinoic acid (RA) is important for the anterior-posterior patterning of the endoderm (Bayha et al., 2009;Martín et al., 2005;Stafford et al., 2004). It is notably required for the specification of the dorsal pancreatic bud in mice, chick, Xenopus and zebrafish embryos (Bayha et al., 2009;Molotkov et al., 2005;Stafford and Prince, 2002;Zeynali and Dixon, 1998), and used in protocols for generating insulin secreting beta-like cells in vitro from embryonic stem cells (Schiesser and Wells, 2014). The role of RA has been previously studied mostly in the anterior-posterior regionalization of the nervous system (Diez del Corral et al., 2014;Maden, 2007Maden, , 2002Rhinn and Dollé, 2012;Schubert et al., 2006). RA signaling acts through the binding of intracellular receptor and regulates the expression of various genes including the Hox family to regionalize the forebrain, midbrain, hindbrain, and spinal cord during embryogenesis (Schubert et al., 2006). The precise role of RA in the induction of pancreas is still largely unclear. Previous studies in chick and Xenopus embryos showed that RA alone is not sufficient to induce ectopic pancreatic cells like in zebrafish, however inhibition of RA signaling at earlier stages of development prevents pancreas formation (Bayha et al., 2009;Zeynali and Dixon, 1998).
RA is synthesized from vitamin A in a two-step reaction that involve the action of RDH10 and RALDH2 enzymes (Das et al., 2014;Feng et al., 2010;Kam et al., 2012;Kedishvili, 2016) Sandell et al., 2007. The levels of endogenous RA are also tightly controlled through its degradation catalysed by CYP26A and DHRS3 enzymes (Kinkel et al., 2009;Kam et al., 2012;Feng et al. 2010).
In vertebrates, CYP26A, DHRS3 and RALDH2 are directly regulated by RA itself, creating an autoregulatory loop that keeps RA levels within a normal range in the embryo (Feng et al., 2010;Kam et al., 2012). RA binds the Retinoic Acid Receptors (RAR) that are forming heterodimers with the Retinoic X Receptors (RXR). These RAR/RXR heterodimers regulate gene expression through their binding to "Retinoic Acid Response Elements" (RAREs) located in promoters or enhancers.
These data uncovered thousands of RAR/RXR binding sites in the murine genome, some located near RA-induced genes like the Hox, Cyp26, Dhrs3 or Rara genes (Das et al., 2014;Tanoury et al., 2013).
However, no data is available on the effects that RA might have on chromatin accessibility and whether RAR binding sites correlates with active enhancers.
Two zebrafish mutants demonstrated the requirement of RA for pancreas development. The neckless and non-fin mutants, containing a mutation in the raldh2 enzyme (Begemann et al., 2001;Grandel et al., 2002), lack pancreatic progenitor cells, while the endodermal cells are still present, indicating that RA action is not required in endoderm formation but rather in pancreas specification. Furthermore, wild type embryos treated with BMS493, a molecule that binds the RARs blocking their activities, show the same phenotype as the raldh2 mutants confirming that the lack of pancreas is due to a lack of RAR activity (and not to a secondary role of Raldh2). Besides, addition of RA into the growing media of raldh2 mutants at different time points showed that RA acts at the end of gastrulation to induce pancreas (Stafford et al., 2004). Furthermore, treatment of wild type zebrafish embryos with RA during gastrulation is sufficient to induce ectopic pancreatic endocrine cells towards the anterior part of the endoderm (Stafford and Prince, 2002). Cell transplantation studies have shown that RA synthesized in the anterior paraxial mesoderm act directly on the foregut endoderm to promote pancreas development (Stafford et al., 2006). Although many RAR direct target genes are known, the exact role of RA in the endoderm is still unclear and the gene regulatory network induced by RA in the endoderm is still poorly defined. Experiments performed on Xenopus explants have recently indicated the involvement of Hnf1b and Fzd4 (Gere-Becker et al., 2018); however, many other regulatory genes induced by RA and mediating its effect are still to be identified.
In this study, we used a combination of RNA-seq, ChIP-seq and ATAC-seq experiments to investigate the gene regulatory network induced by RA in zebrafish endodermal cells. We identified by RNA-seq experiments all genes regulated by RA and BMS493 treatments. Direct targets of RA were also identified by performing RAR ChIP-seq assays. Then, ATAC-seq experiments (Buenrostro et al., 2015) allowed us to identify all chromatin regions whose accessibility is modified by RA signaling. By integrating all these transcriptomic and genomic data, we build a model describing the action of RA in endodermal cells. Furthermore, comparison of our RAR sites detected in the zebrafish genome with those of the murine genome highlight the RAR target genes which have been conserved from fish to mammals and which are probably crucial in RA action.

Sample preparation and cell purification
Fish were maintained in accordance with the national guidelines and all animal experiments described herein were approved by the ethical committee of the University of Liege (protocol number 1980). Endodermal and non-endodermal cells were obtained using the transgenic line Tg(sox17:GFP).
Embryos from this line were incubated in DMSO, 1 μM RA or 1 μM BMS943 from 8 cells stage (1.25 hpf) to 3 somites stage (11 hpf) and grown at 23ºC. Embryos were dechorionated and then deyolked and dissociated in FACS solution. Finally, cells were sorted by FACS Aria II based on the GFP expression. Sorting was performed as a single run in purity mode.

RNA-seq library preparation and data analysis
cDNA was obtained using the SMARTer protocol with minor modifications. Briefly, cells were pelleted and lysed in Lysis Buffer by freezing in liquid nitrogen and stored at -80ºC. Synthesis of cDNA was performed directly on the lysed cells and the cDNA was amplified and purified before assessing its quality by the bioanalyzer (2100 high sensitivity DNA assay, Agilent Technologies).
150 pg. of cDNA were used as input to prepare the libraries using the Nextera XT DNA kit (Illumina). 75 bp single-end sequences were obtained using the NextSeq500 Illumina Sequencer with coverage of 20 million reads per library.
Raw reads were aligned to the zebrafish genome (Zv9, Ensembl genome version 79, ensembl.org) using STAR (Dobin et al., 2013). Genes with at least 1 read in at least 2 samples were kept for further analysis. Normalization and differential expression analysis were performed using DESeq2 (Love et al., 2014). Genes were considered differentially expressed with FDR < 0.01 (False Discovery Rate).

Synthesis of RARaa-myc
The whole zebrafish RARaa coding sequence was cloned into the pCS2MT vector that contains a myc-tag and mRNA was produced from this new construct by in vitro transcription (mMESSAGE mMACHINE sp6 transcription kit, Invitrogen). Validation of the tagged RARaa mRNA was done by western-blot using the ChIP-graded anti-myc antibody (ab9132, Abcam) in the cytoplasm and nuclei fractions of injected and non-injected embryos.

ChIP-seq library preparation and data analysis
Zebrafish fertilized eggs were injected with the myc-tagged RARaa mRNA at a concentration of 72 ng/µl and incubated until the bud stage was reached. RA acid was added to the media at a final concentration of 1 μM and embryos were grown for an extra hour. Around 2000 injected embryos were fixed for ChIP-seq analysis. A Diagenode Bioruptor sonicator was used for sonicating the chromatin. Dynal Protein A magnetic beads (Diagenode) and ChIP-graded anti-myc antibody (ab9132, Abcam) were used to precipitate the chromatin. Libraries were prepared with the NEBNext Ultra II DNA Library Prep kit (Bioke). 42 bp pair-end sequences were obtained using the NextSeq500 Illumina Sequencer with coverage of 60 million reads per library.
Raw reads were mapped to the zebrafish genome (Zv9) using bowtie2. Enriched peaks were called using MACS2. Motif enrichment analysis were performed using HOMER adding a motif length from 6 to 18 bp, to ensure the inclusion of the different RARE. Annotation of peaks was done using ChIPseeker.

ATAC-seq library preparation and data analysis
After isolation by FACS, cells were pelleted and ATAC-Seq libraries were prepared as described elsewhere (Buenrostro et al., 2015). Libraries were sequenced 42 bp paired end using the NextSeq500 Illumina Sequencer with coverage of 40 million reads per library.
Quality assessment of ATAC-seq libraries was performed with ATACseqQC (Ou et al., 2018) and ChIPpeakAnno (Zhu et al., 2010) was used to create the density and heatmap plots. Enriched peaks for ATAC-seq datasets were called using MACS2. Peaks showing different signal among ATAC-seq samples were identified with DiffBind (Stark and Brown, n.d.). Briefly, the density of mapped reads was calculated for each of the 156604 regions obtained by merging the peaks obtained for each of the 12 individual samples and differential analysis was performed in these regions to determined which are the regions with different open chromatin in each condition. Motif enrichment analysis for each set of treatment specific peaks were performed using HOMER (Heinz et al., 2010). Annotation of peaks was done using the ChIPseeker (Yu et al., 2015).

Data integration and visualization
To study the correlation of ATAC-seq/ChIP-seq and RNA-seq a student's t-test was performed to compare the read counts in RNA-seq of each gene and the transcription starting site (TSS) accessibility in ATAC-seq/ChIP-seq. Correlation of flanking treatment/cell specific and common ATAC-seq elements, as well as, ChIP-seq peaks and log2 fold change from RNA-seq was analysed by Spearman's rank correlation.

Retinoic acid affects the transcriptome of zebrafish endodermal cells
Previous studies have shown that treating zebrafish embryos with 1 μM RA during gastrulation (from 5 to 10 hpf) induces ectopic pancreatic cells toward the anterior part of the embryo (Stafford and Prince, 2002). Inversely, blocking the RA action during this period, e.g. by adding the RA antagonist BMS493, blocks pancreas formation. In order to determine the effect of RA on the transcriptome of zebrafish endodermal cells, transgenic Tg(sox17:GFP) zebrafish embryos were treated during gastrulation either with 1µM RA, with 1µM BMS493 (RA-antagonist) or with DMSO (used as control) and endodermal GFP+ cells were next selected by FACS from embryos both at 3-somites (3-S) and 8-somites (8-S) stages (11-hpf and 13.5-hpf, respectively). Non-endodermal (GFP-) cells were also selected from the DMSO-treated control embryos in order to identify genes displaying endodermal enriched expression. RNA-seq was performed on all these FACS-isolated cells prepared in triplicates (24 samples in total) and transcriptomes were analysed using the bioinformatic pipelines as described in Methods. Principal component analysis of all RNA-seq data presented in Figure 1A indicates i) a tight clustering of all triplicate samples confirming a high reproducibility, ii) a strong difference between the transcriptome of endodermal and non-endodermal cells (discrimination along the first axis of the PCA plot), iii) relative similar transcriptomes of cells isolated at 3-S and 8-S stages, and iv) the clustering of BMS493 samples near DMSO samples indicates a much slighter effect of BMS493 treatments compared to the RA treatments. These conclusions were further confirmed by the differential gene expression analyses between the different conditions : more differentially expressed genes were identified between endodermal and non-endodermal cells (1370 and 1410 differentially expressed genes at 3-S and 8-S stages, respectively) than between endodermal cells treated with RA versus DMSO (756 and 514 RA-regulated genes at 3-S and 8-S stages respectively) or with BMS493 versus DMSO (32 and 71 BMS493-regulated genes at 3-S and 8-S stages respectively). We found that there is a large overlap among the sets of genes having an endodermal-enriched expression at both 3-S and at 8-S stages (Supplementary Figure S1A; list of genes given in Supplementary Table S1 and S2) and these sets include all known endodermal markers including sox17, gata6 and foxa1/2/3, validating the accurate sorting of endodermal cells. RAregulated genes consist of a large set of up-and down-regulated genes (Supplementary Table S3 and S4), many of them being regulated at both 3-S and 8-S stages (Supplementary Figure S1B). These RA-regulated genes contain many known RAR-direct targets such as cyp26b1/a1, dhrs3a and several hox genes, validating the efficiency of RA-response. Interestingly, BMS493 treatment led mostly to a down-regulation of genes: at 3-S stage, all the 32 BMS493-regulated genes were repressed, and, at 8-S stage, 68 genes were repressed while only 3 genes were up-regulated by the BMS493 treatment (Supplementary Tables S5 and S6). A large overlap is also observed between the genes downregulated by BMS493 at 3-S and at 8-S stage (Supplementary Figure S1C). As expected, a large proportion of genes down-regulated by BMS493 were up-regulated by RA treatments, this observation being evident mostly at 3-S stage (see Figures 1B and C). Tables 1 and 2 show the genes upregulated by RA and down regulated by BMS493 as well as those enriched in the endoderm at 3and 8-somites stage respectively (highlighted in green). gata6, insm1a and ascl1b are the only 3 known pancreatic regulatory genes which are regulated by RA and BMS493 (Table 1). Other pancreatic transcription factors, like mnx1, insm1b, hnf1ba, nr5a2, jag2 or neuroD1, were induced by the RA-treatment but were not significantly repressed by BMS493, and inversely, other pancreatic regulators are inhibited by BMS493 but not significantly induced by RA like pdx1, rfx6 and myt1b (Supplementary Tables S3 and S6). We can assume that the induction of pancreatic fate by RA is mediated, at least in part, by the direct or indirect regulation of these pancreatic regulatory factors.
In conclusion, the analysis of all these RNA-seq data highlight all the genes with an enriched expression in zebrafish endodermal cells and which are regulated by the RA signaling pathways; this gene set should comprise regulators involved in the AP patterning of the endoderm and in the specification of the pancreatic field.

Genes directly regulated by RA have RAR binding sites near them
To further identify the genes directly regulated by RAR and determine if some pancreatic regulatory genes are direct targets of RA signaling, we performed ChIP-seq experiment. Since no ChIP grade RAR antibodies have been described for zebrafish to date, a tagged RARaa was expressed in zebrafish gastrulae by injecting zebrafish fertilized eggs with a mRNA coding for the zebrafish RARaa protein fused to a myc-tag at its C-terminal end. RARaa was chosen as the RNA-seq data indicate that it is the most highly expressed RA receptor in zebrafish endodermal cells. Injection of this mRNA did not disturb the development of embryos. Chromatin was prepared from about 2000 injected zebrafish embryos at 10hpf (end of gastrulation) and immunoprecipitation was performed with ChIP grade Myc antibody. Comparison of reads obtained with myc-RAR ChIP and input control led to the identification of 4858 RAR peaks. In order to identify bona fide RAR binding sites showing strong affinity, we selected all peaks with a height score above 50 (Supplementary Table S7). By choosing such criteria, 2848 RAR robust binding sites were identified. As shown in Figure 2A, a majority of these sites are located near or within genes: 8% in gene promoters (i.e. 1kb upstream gene TSS), 30% in upstream sequence (from 1 to 10 Kb), 22% in introns and 4% in exons while only 33% are intergenic. Sequence analysis of all RAR peaks revealed that the highest represented motif corresponds to the DR5 RAR/RXR consensus binding sequence (present in 39% of identified RAR peaks) ( Figure 2B). Furthermore, several RAR peaks were observed around many genes known to be RAR direct target genes such as cyp26a1, dhrs3 or the hoxb1a-hoxb4a genomic region ( Figure 2C and data not shown). All these observations confirm the accuracy of the ChIP-seq data. Interestingly, many identified zebrafish RAR sites are located in evolutionary conserved genomic sequences as shown by the fish PhastCons track of Figure 2C (and see below).
The RAR ChIP-seq peaks which were located less than 250 Kb from a gene TSS were assigned with the name of that gene and when several genes were lying in vicinity of a RAR site, the closest gene was considered as the putative RAR-regulated genes. Using that strategy, of the 2848 RAR sites, 2144 were linked to a gene. From this gene set, 94 were up-regulated by RA at 3-S stage and 61 were down-regulated ( Figure 3B, Supplementary Tables S8 and S9). If we analyse the correlation of RA gene regulation and presence of RAR sites by plotting the fold change in gene expression induced by RA according to the number of RAR sites near the gene ( Figure 3A), we observe that RAR acts mainly as a transcriptional activator. This analysis supports the classical model where the RAR/RXR heterodimer recruits co-activators upon RA ligand binding. Interestingly, a large part of the genes (i.e. 18 genes out of 32) down regulated by BMS493 treatment at 3S stage contain RAR sites ( Figure   3B). Among the genes up-regulated by RA and harbouring a RAR site, we identified the pancreatic regulatory genes hnf1ba/b, gata6, insm1b, jag2 and mnx1 suggesting that these genes are direct target of RA.

RA affects the chromatin structure in endodermal cells
To determine whether the chromatin accessibility of endodermal cells is modified by RA or BMS493 treatment, we performed ATAC-seq experiments, a very sensitive technique allowing the identification of open chromatin and nucleosome-free regions (Buenrostro et al., 2013;Quillien et al., 2017). Furthermore, as most nucleosome-free regions corresponds to regulatory sequences, ATAC-seq can locate the enhancers or promoters whose accessibility is modified by the RA and BMS493 treatments. By this way, we can also assess whether the identified RAR binding sites are located in nucleosome-free regions and become more accessible after RA treatment. This assay can also identify enhancer regions, not bound by RAR, but which become more accessible upon RA treatment, thereby identifying regulatory regions indirectly activated by RA. The sequence analysis of such indirect RA targets can give clues on the transcription factors mediating the RA effect. Like for the RNA-seq analyses, zebrafish embryos were treated with RA, BMS493 and DMSO (control) during gastrulation and about 10000 endodermal cells were selected by the FACS at 3-somites stage (11hpf). Non-endodermal cells from control DMSO-treated embryos were also analysed in parallel.
Cell preparations and ATAC-seq were done in triplicates for each condition and analysed as described in Methods. To verify the accuracy of the ATAC-seq data, we performed several quality controls analyses. First, for all samples, the analysis of the ATAC-seq fragment size distribution reveals the expected pattern with abundant short (<150 bp) fragments corresponding to nucleosome-free regions and larger fragments of about 200 and 400 bp corresponding to mono-and bi-nucleosome regions, respectively (Supplementary Figure S2A). Secondly, as reported by Buenrostro and collaborators, genome mapping of the nucleosome-free fragments shows a clear enrichment in promoter regions immediately upstream of transcriptional start sites (TSSs) while mono-nucleosomes are depleted from TSSs and rather map just downstream of the TSSs in a periodic manner (Supplementary Figure   S2B). Thirdly, we verified that the ATAC-seq fragments highlight zebrafish regulatory regions by comparing them with regions harbouring the histone modifications H3K4me3 and H3K27ac associated to promoters and enhancers which were identified from whole zebrafish embryos at 10 hpf (end of gastrulation, "bud stage") (Paik et al., 2013). Heat-maps of ATAC-seq reads from all samples show an obvious enrichment at loci harbouring these two histone modifications ( Figure 4A and Supplementary Figure S3). As regulatory regions often display sequence conservation, we also compared our ATAC-seq reads to a collection of annotated zebrafish evolutionary-conserved noncoding elements (zCNEs) identified using multiple fish and tetrapod genomes (Hiller et al., 2013).
Heat-maps of ATAC-seq reads from each sample also showed a strong correlation with zCNEs ( Figure 4A, Supplementary Figure S3). These observations confirm that regions identified by ATACseq exhibited hallmarks of active regulatory elements. We next verified the reproducibility of ATACseq peaks by a principle component analysis ( Figure 4B). The results show that i) triplicate samples are tightly clustered, and ii) endodermal and non-endodermal (NE) cell clusters are separated along the PC2 axis, representing 3% of the variability among data, while RA-treated cluster is separated from the DMSO-and BMS-cell cluster along the PC3 axis representing only 1% of variability. So, as observed for the RNA-seq data, stronger differences are observed between GFP+ and GFP-cells compared to the differences between RA-treated and DMSO-treated endodermal cells. The samples corresponding to the BMS493-treated cells and DMSO-treated cells were not segregated, suggesting no significant effect of BMS493 on the nucleosome distribution. These conclusions were supported by the differential peak intensities analyses as described hereafter.
From the 12 samples analysed altogether, 156604 nucleosome-free regions were identified in the zebrafish genome. The density of reads was next calculated for all identified peaks in each cell sample in order to identify the elements which become accessible upon treatment (RA, BMS493 versus DMSO) or according to cell identity (GFP+ versus GFP-) with an FDR<0.05. This analysis revealed more differences in chromatin accessibility between cell types than between treatments. Indeed, among the 156604 identified ATAC-seq peaks, 22696 display significant differences in intensities between endodermal and non-endodermal cells : 9722 and 12974 regions are more accessible in endodermal and non-endodermal cells, respectively (Supplementary Tables S10 and S11), while only 1989 display differences between RA and DMSO treatments (1240 peaks have significant more reads upon RA treatment and 749 peaks have less reads on RA treatment, Supplementary Tables S12 and S13). No significant differences could be identified between the BMS493 samples and the DMSO controls. Interestingly, sequence analysis of all endodermal-specific ATAC-seq regions revealed a highly significant enrichment of the binding sites motif for the Gata, Fox and Sox protein families ( Figure 4B), which can be explained by the action of the well-known endodermal regulatory factors Gata4/5/6, Foxa1/2/3 and Sox32/17. Regulatory regions opened specifically in endodermal cells were often identified near endoderm and pancreatic regulatory genes, such as in the foxa2, nkx6.1, hnf4g, sox17 or mnx1 loci (Supplementary Figure S4 and data not shown).
Annotation of the RA-induced ATAC-seq elements to the closest gene revealed that they are often located near RA-upregulated genes identified above by RNA-seq. We found that there is a significant correlation between the level of RA-induced gene expression and the number of RA-induced ATAC-seq elements ( Figure 4E). Interestingly, about 10% (136 regions) of RA-induced ATAC-seq peaks contain RAR binding sites as identified above by ChIP-seq and 101 regions (~74% ) possess the DR5 motif recognized by the RAR/RXR complex ( Figure 5D). This is the case for RA-induced peaks in the dhrs3a, cyp26a1/b1 and insm1b genes (Supplementary Figure S5 and data not shown). In contrast, about 90% of RA-induced ATAC-seq peaks do not harbour RAR binding sites although they are usually found near RA-upregulated genes (Supplementary Table S14). Such genes are presumably indirect targets of RA signaling. For example, the pancreatic regulatory genes pdx1 and insm1a, both regulated by RA signaling but not harbouring RAR binding site in their genomic loci, have both one RA-induced ATAC-seq peak located about 50 Kb upstream their TSS (see Supplementary Figure   S6). Motif analysis of all RA-induced elements showed a significant enrichment for Gata and Hnf1b binding motifs, present in ~38 and ~13% of the RA-induced elements respectively ( Figure 4B). To identify which GATA and Hnf1b proteins are opening these chromatin regions, we searched in the list of the direct RAR-target genes (Supplementary Table S8) those coding for members of these transcription factor families. hnf1ba and hnf1bb are both induced by RA, however the level of expression of hnf1ba is about 200-fold higher than hnf1bb in endodermal cells. Furthermore, hnf1ba contains three RAR binding sites located in nucleosome-free and evolutionary-conserved regions ( Figure 5), while hnf1bb has only one RAR site of weaker affinity (data not shown). As for the Gata family, gata6 is strongly induced by RA and has a high affinity RAR binding site in a nucleosomefree region located about 30 kb upstream from its TSS ( Figure 5). Thus, all these analyses strongly suggest that Gata6 and Hnf1ba are the major effector of the RA signaling in zebrafish. As these two transcription factors are known to be crucial for pancreas development (Carrasco et al., 2012;Xuan et al., 2012), it indicates that they are the key effectors of RA action to induce the pancreatic field.

Some RAR genomic binding sites are well conserved among vertebrate species.
As many zebrafish RAR sites are located in evolutionary conserved genomic regions ( Figure 2C and 5), we next determined the proportion of RAR sites which are also found in mammalian genomes.
Chatagnon et al. published the chromatin binding of both RAR and RXR proteins in the murine F9 embryonal carcinoma cells during differentiation into primitive endodermal cells after induction with RA (Chatagnon et al., 2015). Comparison of our datasets with the one obtained by Chatagnon et al. showed that, among the 2144 zebrafish genes harbouring a RAR site, 722 have also a RAR site in the murine orthologous gene. This list of conserved RAR-bound genes comprises notably the cyp26a1/b1, dhrs3a, several hox genes, raraa/b as well as the pancreatic gata6 and hnf1ba/b genes.
We next determined whether the RAR binding sites are themselves conserved. For that, we determine which RAR sites are located in highly conserved non coding elements (HCNEs) identified through the comparison of multiple vertebrate genomes (Engström et al., 2008). This analysis revealed that 235 HCNEs regions carry RAR binding sites, of which 116 are zebrafish and 146 murine RAR binding sites. Interestingly, in 27 of them, the location and the binding motifs of these RAR sites are conserved within these HCNE (see Figure 6 and data not shown) revealing that the organization of the enhancer has been maintained throughout vertebrate evolution. This confirms that the RA regulation of a subset of genes have been well conserved throughout vertebrate's evolution. Among these well conserved RAR sites, we have identified an extremely well conserved enhancer located in the last intron of skap1 gene (Figure 4) which is located far downstream from the hoxba locus and which could act at long distance to regulate not only skap1 gene but also several Hoxba genes.

Discussion
It is known that RA is required for the proper development of the pancreas as well as for the anteriorposterior regionalization of the endoderm (Chen et al., 2004;Molotkov et al., 2005), but the mechanism by which it acts on the endoderm to set up the anterior-posterior axis and to induce pancreas are not well known. Here, we have deciphered the genome-wide transcriptional mechanisms of RA action in the endodermal cells during specification of the pancreas. To this end, we have determined the transcriptome and the nucleosome-free regions of endodermal cells collected from embryos that were treated with DSMO, RA and BMS493, and the binding sites of RARaa in whole embryos.
RA affected much more the transcriptome of endodermal cells than BMS493 treatment; furthermore, while the genes affected by RA were either up or down regulated, the ones affected by BMS493 were mostly down regulated. The fact that much fewer genes are affected by the loss of function (BMS493 treatment) compared to the gain of function (RA treatment) could be explained by the number of endodermal cells affected by these 2 treatments: RA treatment acts on a large proportion of the endoderm (anterior endoderm) while the BMS493 treatment acts on a small set of endodermal cells (posterior foregut) where the endogenous RA is active. The observation that most of the BMS493affected genes are down-regulated strongly suggests that the RARs act as transcriptional activators.
It is thus possible that the genes down regulated by RA corresponds to indirect targets. 23 genes were up-regulated by RA and down-regulated by BMS493 at 3 somites stages and 27 at 8 somites stages.
Among them, some were already known to be regulated by RA and involved in its synthesis or signalling, such as cyp26b1, cyp26a1, dhrs3a, and raraa. Besides, we found some that are expressed during pancreas development, such as gata6, hnf1ba, and nr5a2.
When the location of the genomic RARaa binding sites is analysed and compared to the list of RAregulated genes, it is obvious that only a minority of RA-regulated genes have RARaa binding sites around their loci; we can propose that a large part of these genes are the actual direct RARaa target genes, the other genes probably corresponding to indirect RAR targets. Also, it is interesting to notice that the majority of RARaa binding sites could not be linked to nearby RA-regulated genes (see Figure   3B). This observation raises the question of the function of such RARaa binding sites. Several hypotheses can be proposed about their putative role: 1) some of these sites could be functional but the regulation of the nearby genes by RA did not reach the statistical threshold (FC>2 fold, FDR<0.01); 2) some of these RAR sites are involved in the regulation of genes located at much longer distances (>250 kb); 3) some of these RAR sites are involved in the RA-regulation of nearby genes but in other tissues or at later developmental stages; or 4) some of these RAR are not functional and do not participate in the transcriptional regulation by RA.
Comparison of the zebrafish RARaa binding sites with the murine RAR ones showed that sequence where the RARs bind is conserved among species. Besides, many orthologous genes have binding sites for the RAR in both species, indicating that the regulation of those genes by RA is conserved among species. However, when we compared the binding sites location by using the HCNE, we could not detect a high overlap. Only a small proportion of genes have their RAR binding sites within HCNE. Furthermore, when RAR binding sites are found near the same orthologous murine and zebrafish gene, these RAR sites are not often located at the same place near this gene, being located upstream in one species and in introns or downstream in the other. Thus, this suggests high plasticity or remodelling during vertebrate evolution allowing relocation of RARE around genes. These analyses suggest that although genes regulated by RA are conserved between both species, the binding site for this receptor is often not located in the same conserved regions, only 23 genes harbouring their RARE in the same HCNE in mouse and zebrafish.
Moreover, by comparing chromatin accessibility upon RA and BMS493 treatment, we characterized regions of the chromatin that are either induced or blocked by RA. However, we could not detect any region of the chromatin affected by BMS493. The regions modified by RA are mainly located in introns, regions downstream of the TSS and distal intergenic regions of the chromatin, which indicates that they might correspond to enhancers. Furthermore, the correlation of the regions induced by RA with gene expression showed that genes up-regulated by RA have several regions induced by RA around them. Although, these regions generally do not correspond to their TSS, supporting again the hypothesis that the regions induced by RA are mainly enhancers. Indeed, we have identified the motifs enriched in these regions and they correspond to the binding sites of three known TFs: RAR:RXR, Gata6, and Hnf1b. Interestingly, the genes coding for these TFs are also up-regulated by RA and two of them, Gata6 and Hnf1b, are involved in pancreas development (Carrasco et al., 2012;Decker et al., 2006;Gere-Becker et al., 2018, p.;Lancman et al., 2013).
Finally, by integrating all these data, a model can be proposed were Gata6 and Hnf1ba TFs are the key mediators of RA action in endodermal cells to induce pancreas. Previous studies in different species have shown that both of them are involved in pancreas development (Gere-Becker et al., 2018;Holtzinger and Evans, 2005;Lancman et al., 2013). A zebrafish mutant line where the hnf1ba gene is inactive has been identified and display severe defects in pancreas development (Sun and Hopkins, 2001). Concerning gata6, no zebrafish mutant line has been described so far but a loss of function experiments has been reported using morpholinos and indicate gata4/6 involvement in the liver as well as exocrine pancreas development, although the phenotype analyses were quite limited (Holtzinger and Evans, 2005). Our data suggest that both factors, Gata6 and Hnf1b, should act as pioneer factors by opening the chromatin near target genes, as we confirmed for the RAR (data not shown). The pioneer activity of Gata6 has been previously reported during mouse liver development (Iwafuchi-Doi and Zaret, 2014;Zaret and Carroll, 2011), however, Hnf1b activity as a pioneer factor has not yet been reported. Furthermore, hnf1ba has been identified as a key mediator of RA action in the zebrafish hindbrain (Ghosh et al., 2018), indicating that at least one part of RA action seems very similar in the endoderm and hindbrain. This is also confirmed by the GO analysis performed for the 355 genes up-regulated by RA, where some of the terms were associated with hindbrain, brain and head development (Data not shown).

Declarations
All animal experiments were conducted according to national guidelines and were approved by the ethical committee of the University of Liège (protocol numbers 1980). The raw datasets generated in the current study will be deposited on ENA (in progress). The authors declare that they have no competing interests.