Spoken language is central to many human interactions and provides the medium through which activities and events across many humanities and social sciences disciplines are studied. It is also the object of active study in itself. As central to humanity as spoken language is, regulations aimed at mitigating privacy concerns also affect the affordance for collaborations on a national or larger scale based on spoken materials.
The Visible Speech (VISP) platform is a web-based research infrastructure at Humlab, Umeå University, designed to handle audio recordings of speech in compliance with the national implementation of GDPR and security requirements. VISP provides a centralized environment for research of all disciplines in which recordings of spoken language constitute the primary material, meeting both researchers’ needs for efficient workflows and legislators’ demands for secure data management.
One of VISP's primary advantages is its ability to facilitate research on audio recordings that now constitute personally identifiable information (PII) under the application of the GDPR in Sweden. These recordings may contain sensitive content or have been made in sensitive contexts, classifying them as sensitive PII under national legislation. Sensitive contents may occur in relation to, for instance, the ethnicity and religious beliefs of the speaker, and sensitive contexts may occur when the recording is made in a healthcare context or in a context where a person’s membership with a union organization is divulged. While the challenges in conducting larger research efforts on the types of materials are currently aggravated by the implementation of the GDPR locally in Sweden, it is currently not clear to what extent upcoming AI regulation will, in effect, migrate identical or similar constraints to research in other countries in the EU as well.
The VISP platform offers a unified environment for storage, controlled access, direct work, and reproducible speech signal processing. VISP is built on the foundation of earlier research efforts1,2 and includes a comprehensive set of speech and voice analysis procedures within one framework. Thus, national research groups can collectively store interviews or other spoken language recordings, have automatic transcriptions or other speech processing performed, and access the results for complementary manual annotation or analysis simultaneously and securely. Additionally, VISP facilitates the digital archiving of projects through a uniform, documented, and transparent directory structure, reducing barriers to making data available following the FAIR principles. Research projects dealing with sensitive personal data in audio recording form require review by the Ethical Approval Authority and may subsequently take advantage of the VISP facilities.
A significant feature of VISP is its integration with the Swedish Academic Identity Federation (SWAMID, connected to eduGAIN), which enables researchers across Sweden to have secure, federated logins. This national federated login system allows researchers to access project data and collaborate on material processing in ways that were previously not possible. Moreover, VISP supports projects by lowering the step in to digital signal processing and audio analysis of the collected audio signals. This capability allows researchers to perform hands-on processing and analysis without the risk of disseminating sensitive audio recordings. By leveraging SWAMID, VISP ensures that researchers can work seamlessly and securely on collected materials, enhancing collaborative efforts and data handling efficiency. By providing tools for direct manipulation and examination of audio data, VISP ensures that all stages of data handling, from collection to analysis, are conducted within a secure environment, thereby maintaining the integrity and confidentiality of sensitive information.
The work conducted within VISP is part of SweCLARIN, the Swedish node of the European Research Infrastructure Consortium (ERIC) CLARIN. SweCLARIN aims to develop and provide national and European infrastructure for speech and text-based e-science, offering extensive digitized materials and advanced language technology tools. By combining the advanced technologies developed by CLARIN ERIC partners1 with stringent security protocols and leveraging federated login systems, VISP enables efficient and secure research on audio recordings of speech. The VISP components are available for download and setup of local instances and for modification, and the framework, therefore, promises to provide an invaluable tool for researchers, facilitating unprecedented collaboration and data processing within the digital humanities on both a national and larger scale.
References:
[1] R. Winkelmann, J. Harrington, K. Jänsch, EMU-SDMS: Advanced speech database management and analysis in R, Comput. Speech Lang. 45 (2017) 392–410.
[2] R. Winkelmann, J. Harrington, EMU-SDMS: R Centric Semi-automatic Speech Database Processing and Analysis. In Sasha Calhoun, Paola Escudero, Marija Tabain & Paul Warren (eds.) Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019 (pp. 1317--1321). Canberra, Australia: Australasian Speech Science and Technology Association Inc
2025.
9th Conference on Digital Humanities in the Nordic and Baltic Countries (DHNB 2025), Tartu, Estonia, March 5-7, 2025