Genomics projects, including genome sequencing, transcriptomics, genome-wide association mapping and epigenetics assays, producevast quantities of data. Extracting the required information from such complex datasets is a significant challenge and even where software tools do exist, these are often not intuitive or designed fornon-specialist users. This dissertation details how I have applied design principles from the field of Human-Computer Interaction (HCI) to the development of intuitive bioinformatics web-based resources for exploring genomics data. In the first part of the thesis I detail the development of a specialised genomics resource that enables non specialists who lack bioinformatics skills to access, explore and extract new knowledge from a variety of genomics data types. These tools were developed in collaboration with wet lab biologists and bioinformaticians who represent typical end-users. The tools developed have been integrated within the PlantGenIE (Plant Genome IntegrativeExplorer) web resource, which has been established as a platform for exploring genomics data for Populus, conifer, Eucalyptus and Arabidopsisgenomics data. Even though the ability to collect, store and manage data is increasing faster due to new technologies and science, our ability to understand it remains constant. To help address this, in the second part of this dissertation I focus on the usability enhancement of tools based on the HCI and User Experience (UX) practices. To achieve this, I utilised visualisation techniques and design principles in the design process for the improvement of the PlantGenIEUser Interface (UI), and applied usability methods to evaluate the UX of PlantGenIE tools. These results were then used to inform adaptations and fine-tuning of those. I show that utilisation of these research methods and practices with the development life cycle represents a framework for designing usable bioinformatics tools. Wider-scale use of these methods by future designers and developers will enable the creation of more usable bioinformatics resources.
Visualization - the process of interpreting data into visual forms - is increasingly important in science as data grows rapidly in volume and complexity. A common challenge faced by many biologists is how to benefit from this data deluge without being overwhelmed by it. Here, our main interest is in the visualization of genomes, sequence alignments, phylogenies and systems biology data. Bringing together new technologies, including design theory, and applying them into the above three areas in biology will improve the usability and user interaction.
The main goal of this paper is to apply design principles to make bioinformatics resources, evaluate them using different usability methods, and provide recommended steps to design usable tools.
There are an ever-increasing number of genomes being sequenced, many of which have associated RNA sequencing and other genomics data. The availability of user-friendly web-accessible mining tools ensures that these data repositories provide maximum benefit to the community. However, there are relatively few options available for setting up such standalone frameworks. We developed the Genome Integrative Explorer System (GenIE-Sys) to set up web resources to enable search, visualization and exploration of genomics data typically generated by a genome project.
GenIE-Sys is implemented in PHP, JavaScript and Python and is freely available under the GNU GPL 3 public license. All source code is freely available at the GenIE-Sys website (https://geniesys.org) or GitHub (http://github.com/plantgenie/geniesys.git). Documentation is available at http://geniesys.readthedocs.io.
2020-04-27: Registered as accepted in The Plant Journal, ISSN 0960-7412, EISSN 1365-313X.
Despite tremendous advancements in high throughput sequencing, the vast majority of tree genomes, and in particular, forest trees, remain elusive. Although primary databases store genetic resources for just over 2,000 forest tree species, these are largely focused on sequence storage, basic genome assemblies, and functional assignment through existing pipelines. The tree databases reviewed here serve as secondary repositories for community data. They vary in their focal species, the data they curate, and the analytics provided, but they are united in moving toward a goal of centralizing both data access and analysis. They provide frameworks to view and update annotations for complex genomes, interrogate systems level expression profiles, curate data for comparative genomics, and perform real-time analysis with genotype and phenotype data. The organism databases of today are no longer simply catalogs or containers of genetic information. These repositories represent integrated cyberinfrastructure that support cross-site queries and analysis in web-based environments. These resources are striving to integrate across diverse experimental designs, sequence types, and related measures through ontologies, community standards, and web services. Efficient, simple, and robust platforms that enhance the data generated by the research community, contribute to improving forest health and productivity.