Intra-chromosomal structural communities determined by multiscale community mining using graph wavelets

to appear in BMC Bioinformatics


Multi-scale community mining using graph wavelets

Hi-C data can be represented as graphs where nodes represent DNA loci and the edges connect interacting loci, allowing us to reformulate the question of finding structural domains as a question of finding communities in the DNA interaction network.

We used the multi-scale community detection (MSCD) algorithm based on spectral graph wavelets that previously described and benchmarked against others multi-scale community mining methods from the literature (Tremblay and Borgnat, 2014).

The purpose of detecting communities at different scales using graph wavelets instead of, say, cutting a hierarchical clustering at different levels, is to fit as close to the data as possible. Cutting a hierarchical clustering impose a hierarchical structure to the set of community obtained at the different scales (cutting levels). When using wavelets, we do not suppose beforehand that the data have a hierarchical structure: a community at a coarse scale does not necessarily have to contain communities found at a finer scale.

Source code

The latest version of the MSCD MATLAB toolbox is available on Nicolas Tremblay's website.

The version of the MSCD code used for the publication along with a wrapper MATLAB script allowing to reproduce the full analysis pipeline described in the publication is available for download. See file 'README.txt' in the ZIP archive.

Download of structural communities (BED-like format)
GM06990, 100 kb, hg18
K562, 100 kb, hg18
H1 hESC, 100 kb, hg18
IMR90, 200 kb, hg18
IMR90, 100 kb, hg18
IMR90, 40 kb, hg18
HeLaS3, 100 kb, hg19, G1mid
HeLaS3, 100 kb, hg19, M phase