Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Through the coding-lens: community detection and beyond
Umeå University, Faculty of Science and Technology, Department of Physics.ORCID iD: 0000-0001-7881-2496
2022 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Nätverksklustring, nätverkscentralitet och länkprediktion ur ett kodningsperspektiv (Swedish)
Abstract [en]

We live in a highly-connected world and find networks wherever we look: social networks, public transport networks, telecommunication networks, financial networks, and more. These networks can be immensely complex, comprising potentially millions or even billions of inter-connected objects. Answering questions such as how to control disease spreading in contact networks, how to optimise public transport networks, or how to diversify investment portfolios requires understanding each network's function and working principles.

Network scientists analyse the structure of networks in search of communities: groups of objects that form clusters and are more connected to each other than the rest. Communities form the building blocks of networks, corresponding to their sub-systems, and allow us to represent networks with coarse-grained models. Analysing communities and their interactions helps us unravel how networks function.

In this thesis, we use the so-called map equation framework, an information-theoretic community-detection approach. The map equation follows the minimum description length principle and assumes complete data in networks with one node type. We challenge these assumptions and adapt the map equation for community detection in networks with two node types and incomplete networks where some data is missing. We move beyond detecting communities and derive approaches for how, based on communities, we can identify influential objects in networks, and predict links that do not (yet) exist.

Abstract [sv]

Vi lever i en värld som blir mer och mer sammanlänkad. Vart vi än tittar hittar vi nätverk: sociala nätverk, kollektivtrafiknätverk, telekommunikationsnätverk, finansiella nätverk och så vidare. Dessa nätverk kan vara oerhört komplexa och omfatta potentiellt miljoner eller till och med miljarder sammankopplade objekt. För att kunna besvara frågor som: hur kontrollerar vi sjukdomsspridning i kontaktnät, hur optimerar vi kollektivtrafiksnätverk eller hur diversifierar vi investeringsportföljer, krävs det att vi förstår varje nätverks funktion och principer.

Nätverksforskare analyserar strukturen i nätverk i jakt på kluster: grupper av objekt som är mer kopplade till varandra än till resten av nätverket. Kluster utgör byggstenarna, eller delsystemen, i nätverken och låter oss representera dessa med förenklade modeller. Att analysera kluster och deras interaktioner hjälper oss att ta reda på hur nätverk fungerar.

I denna avhandling vidareutvecklar vi den så kallade kartekvationen, en informationsteoretisk klusterdetekteringsmetod. Kartekvationen följer principen om minsta beskrivningslängd och förutsätter fullständiga data i nätverk som bara består av en typ av noder. Vi utmanar dessa antaganden och anpassar kartekvationen för klusterdetektering i nätverk som består av två typer av noder och ofullständiga nätverk där viss data saknas. Vi dyker också djupare in i kluster och härleder lösningar för hur vi, baserat på kluster, kan identifiera inflytelserika objekt i nätverk och förutsäga länkar som (ännu) inte existerar.

Abstract [de]

Wir leben in einer hochgradig vernetzten Welt und finden Netzwerke wo auch immer wir hinschauen: soziale Netzwerke, öffentliche Verkehrsnetze, Telekommunikationsnetze, Finanznetzwerke und mehr. Diese Netzwerke können immens komplex sein und potenziell Millionen oder sogar Milliarden miteinander verbundener Objekten umfassen. Um beantworten zu können, wie wir die Ausbreitung von Krankheiten in Kontaktnetzwerken kontrollieren, öffentliche Verkehrsnetze optimieren oder Anlageportfolios diversifizieren können, müssen wir die Funktionsweise und Arbeitsprinzipien dieser Netzwerke verstehen.

Netzwerkwissenschaftler analysieren die Struktur von Netzwerken auf der Suche nach Communities: Gruppen von Objekten, die Cluster bilden und stärker miteinander verbunden sind als mit dem Rest. Communities repräsentieren die Bausteine von Netzwerken, entsprechen ihren Subsystemen und erlauben es uns, Netzwerke mit vereinfachten Modellen darzustellen. Communities und ihre Interaktionen untereinander zu verstehen hilft uns dabei, zu enträtseln, wie Netzwerke funktionieren.

In dieser Doktorarbeit verwenden wir die sogenannte Kartengleichung, ein informationstheoretischer Ansatz zur Community-Erkennung. Die Kartengleichung folgt dem Prinzip der minimalen Beschreibungslänge und nimmt an, dass Netzwerke einen Knotentypen haben und ihre zugrundeliegenden Daten vollständig sind. Wir stellen diese Annahmen infrage und passen die Kartengleichung zur Community-Erkennung in Netzwerken mit zwei Knotentypen und unvollständigen Daten an. Darüber hinaus leiten wir Ansätze ab, die, basierend auf Communities, einflussreiche Objekte in Netzwerken identifizieren und (noch) nicht existierende Verbindungen zwischen Objekten vorhersagen.

Place, publisher, year, edition, pages
Umeå: Umeå University , 2022. , p. 103
Keywords [en]
community detection, map equation, Huffman coding, network centrality, link prediction
National Category
Computer Sciences Other Computer and Information Science Computational Mathematics
Identifiers
URN: urn:nbn:se:umu:diva-199631ISBN: 978-91-7855-828-5 (electronic)ISBN: 978-91-7855-827-8 (print)OAI: oai:DiVA.org:umu-199631DiVA, id: diva2:1699181
Public defence
2022-10-21, NAT.D.470, Naturvetarhuset, Umeå, 09:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2022-09-30 Created: 2022-09-27 Last updated: 2022-09-27Bibliographically approved
List of papers
1. Mapping flows on bipartite networks
Open this publication in new window or tab >>Mapping flows on bipartite networks
2020 (English)In: Physical review. E, ISSN 2470-0045, E-ISSN 2470-0053, Vol. 102, no 5, article id 052305Article in journal (Refereed) Published
Abstract [en]

Mapping network flows provides insight into the organization of networks, but even though many real networks are bipartite, no method for mapping flows takes advantage of the bipartite structure. What do we miss by discarding this information and how can we use it to understand the structure of bipartite networks better? The map equation models network flows with a random walk and exploits the information-theoretic duality between compression and finding regularities to detect communities in networks. However, it does not use the fact that random walks in bipartite networks alternate between node types, information worth 1 bit. To make some or all of this information available to the map equation, we developed a coding scheme that remembers node types at different rates. We explored the community landscape of bipartite real-world networks from no node-type information to full node-type information and found that using node types at a higher rate generally leads to deeper community hierarchies and a higher resolution. The corresponding compression of network flows exceeds the amount of extra information provided. Consequently, taking advantage of the bipartite structure increases the resolution and reveals more network regularities.

Place, publisher, year, edition, pages
American Physical Society, 2020
National Category
Physical Sciences
Identifiers
urn:nbn:se:umu:diva-177167 (URN)10.1103/PhysRevE.102.052305 (DOI)000588430400004 ()2-s2.0-85096120887 (Scopus ID)
Available from: 2020-12-08 Created: 2020-12-08 Last updated: 2022-09-27Bibliographically approved
2. Mapping flows on weighted and directed networks with incomplete observations
Open this publication in new window or tab >>Mapping flows on weighted and directed networks with incomplete observations
2021 (English)In: Journal of Complex Networks, ISSN 2051-1310, E-ISSN 2051-1329, Vol. 9, no 6, article id cnab044Article in journal (Refereed) Published
Abstract [en]

Detecting significant community structure in networks with incomplete observations is challenging because the evidence for specific solutions fades away with missing data. For example, recent research shows that flow-based community detection methods can highlight spurious communities in sparse undirected and unweighted networks with missing links. Current Bayesian approaches developed to overcome this problem do not work for incomplete observations in weighted and directed networks that describe network flows. To overcome this gap, we extend the idea behind the Bayesian estimate of the map equation for unweighted and undirected networks to enable more robust community detection in weighted and directed networks. We derive an empirical Bayes estimate of the transitions rates that can incorporate metadata information and show how an efficient implementation in the community-detection method Infomap provides more reliable communities even with a significant fraction of data missing.

Place, publisher, year, edition, pages
Oxford University Press, 2021
Keywords
community detection, directed and weighted networks, incomplete data, the map equation
National Category
Other Physics Topics
Identifiers
urn:nbn:se:umu:diva-194470 (URN)10.1093/comnet/cnab044 (DOI)000797304300006 ()2-s2.0-85128774619 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Knut and Alice Wallenberg FoundationSwedish Research Council, 2016-00796
Note

Errata: "Correction to “Mapping flows on weighted and directed networks with incomplete observations”, Journal of Complex Networks, Volume 10, Issue 2, April 2022, cnac010, https://doi.org/10.1093/comnet/cnac010"

Available from: 2022-05-06 Created: 2022-05-06 Last updated: 2022-12-08Bibliographically approved
3. Map equation centrality: community-aware centrality based on the map equation
Open this publication in new window or tab >>Map equation centrality: community-aware centrality based on the map equation
2022 (English)In: Applied Network Science, E-ISSN 2364-8228, Vol. 7, no 1, article id 56Article in journal (Refereed) Published
Abstract [en]

To measure node importance, network scientists employ centrality scores that typically take a microscopic or macroscopic perspective, relying on node features or global network structure. However, traditional centrality measures such as degree centrality, betweenness centrality, or PageRank neglect the community structure found in real-world networks. To study node importance based on network flows from a mesoscopic perspective, we analytically derive a community-aware information-theoretic centrality score based on network flow and the coding principles behind the map equation: map equation centrality. Map equation centrality measures how much further we can compress the network's modular description by not coding for random walker transitions to the respective node, using an adapted coding scheme and determining node importance from a network flow-based point of view. The information-theoretic centrality measure can be determined from a node's local network context alone because changes to the coding scheme only affect other nodes in the same module. Map equation centrality is agnostic to the chosen network flow model and allows researchers to select the model that best reflects the dynamics of the process under study. Applied to synthetic networks, we highlight how our approach enables a more fine-grained differentiation between nodes than node-local or network-global measures. Predicting influential nodes for two different dynamical processes on real-world networks with traditional and other community-aware centrality measures, we find that activating nodes based on map equation centrality scores tends to create the largest cascades in a linear threshold model.

Place, publisher, year, edition, pages
Springer, 2022
Keywords
Community-aware, Centrality, Map equation, Random walk, Hufman coding
National Category
Computational Mathematics Other Computer and Information Science
Identifiers
urn:nbn:se:umu:diva-199603 (URN)10.1007/s41109-022-00477-9 (DOI)000841239800002 ()2-s2.0-85136094020 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Research Council, 2016-00796
Available from: 2022-09-22 Created: 2022-09-22 Last updated: 2022-09-30Bibliographically approved
4. Similarity-based Link Prediction from Modular Compression of Network Flows
Open this publication in new window or tab >>Similarity-based Link Prediction from Modular Compression of Network Flows
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Node similarity scores constitute a foundation for machine learning in graphs. Besides clustering, node classification, and anomaly detection, they are a basis for link prediction with critical applications in biological systems, information networks, and recommender systems. Recent works on link prediction use vector space embeddings to calculate node similarities. While these methods can provide good performance in undirected networks, they have several disadvantages: limited interpretability, problem-specific hyperparameter tuning, manual model fitting through dimensionality reduction, and poor performance of symmetric similarities in directed link prediction. To address these issues, we propose MapSim, a novel information-theoretic approach to assess node similarities based on modular compression of network flows. Different from vector space embeddings, MapSim represents nodes in a discrete, non-metric space of communities and yields asymmetric similarities suitable to predict directed and undirected links in an unsupervised fashion. The resulting similarities can be explained based on a network's hierarchical modular organisation, facilitating interpretability. MapSim naturally accounts for Occam's razor, leading to parsimonious representations of clusters at multiple scales. Addressing unsupervised link prediction, we compare MapSim to popular embedding-based algorithms across 47 data sets of networks from a few hundred to hundreds of thousands of nodes and millions of links. Our analysis shows that MapSim's average performance across all networks is more than 7% higher than its closest competitor, outperforming all embedding methods in 14 of the 47 networks, and a more than 33% better worst-case performance. Our method demonstrates the potential of compression-based approaches in graph representation learning, with promising applications in other graph learning tasks.

Keywords
machine learning, network analysis, graph learning, representation learning, link prediction, minimum description length
National Category
Computer Sciences Other Computer and Information Science
Identifiers
urn:nbn:se:umu:diva-199604 (URN)
Available from: 2022-09-22 Created: 2022-09-22 Last updated: 2022-09-27

Open Access in DiVA

fulltext(1288 kB)301 downloads
File information
File name FULLTEXT01.pdfFile size 1288 kBChecksum SHA-512
70e636d8ffd33239c3b859d59ba70a4710cd1ade7e8fac6fb313e14389e2703b622d955eda2ae59775fc7a987247d869dac0acc36f145f770277a10be95955b2
Type fulltextMimetype application/pdf
spikblad(84 kB)76 downloads
File information
File name SPIKBLAD01.pdfFile size 84 kBChecksum SHA-512
93d05b8864422be201f7309484bfe6f820a3889abc5f09447979ccb326fd2f807db8c7125816c7a0ec442a5cbf9e6b1c4b39e6814ff61a8e49ff0cc60a6ca59e
Type spikbladMimetype application/pdf

Authority records

Blöcker, Christopher

Search in DiVA

By author/editor
Blöcker, Christopher
By organisation
Department of Physics
Computer SciencesOther Computer and Information ScienceComputational Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 301 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1049 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf