Introduction & Concepts

Beyond the persistent hairballs visualization:

Rapid technological advances in modern biology have enabled biologists to conduct massively parallel experiments, which generate abundant data about interacting protein pairs, correlatively expressed gene pairs, etc.

"A hair-ball of yeast network of 3025 nodes and 6888 edges"
(created by Cytoscape 2.8.1, using yeastHighQuality.sif)

yeastHighQuality-partial

Networks are typically used to represent these binary-relationship datasets (nodes = elements, edges = connections) to visually interpret them and extract useful biological information. However, these representations often appear as "hairballs"—with a large number of extremely tangled edges—and cannot be visually interpreted. Therefore, an interactive, multi-scale navigation method for large and complicated biological networks is desperately needed!

Have you ever used mapping services on the Internet for geographical maps, e.g., Google Maps? Such services provide appropriately abstracted views at any magnification and enable the user to interactively investigate regions of interest by zooming in and panning out.

Analogously, "NaviClusterCS" is an interactive, multi-scale navigation tool for large biological networks, developed as a Cytoscape plugin. Please proceed to Download & Launch to start using NaviClusterCS now!

How does NaviClusterCS work?

overall chart

NaviClusterCS automatically and rapidly abstracts any portion of a large network of interest to an immediately interpretable extent by the use of two clustering algorithms working in sequence:

  1. Ultrafast graph clustering--abstracts networks of about 100,000 nodes in a second by iteratively grouping densely connected portions
  2. Biological-property-based clustering--takes advantage of biological information often provided for biological entities (e.g., Gene Ontology terms).

After passing the network through two clustering components, NaviClusterCS creates an abstracted Cytoscape network view ready for researchers to flexibly choose nodes/clusters. These processes can be completed in a few seconds on a typical PC with a CPU of ~2 GHz and a memory of ~1 GB for datasets with ~100,000 nodes

Ultrafast Graph Clustering:

First, NaviCluster abstracts the whole network using the ultrafast graph clustering component, Louvain algorithm. It detects topologically dense, connected regions, which may correspond to biologically meaningful clusters, such as protein complexes. It rapidly identifies clusters in huge networks of about 100,000 nodes within a few seconds.

Property-Based Clustering:

Second, in case the abstraction is insufficient because of the characteristics of the biological network, the property-based clustering component further abstracts the cluttered visualization to an extent sufficient for visual interpretation.

This component automatically groups clusters with similar biological properties by utilizing property information, such as Gene Ontology (GO) term, often assigned to biological entities. The new clusters from the property-based clustering are used in the next component instead of those generated by the ultrafast graph clustering component, thereby reducing the number of clusters on the screen.

"An abstracted Cytoscape network view"

abstracted view

An Abstracted Cytoscape Network View:

Third, NaviClusterCS displays an abstracted Cytoscpae network view composed of the resultant clusters/nodes, along with meta-edges and property edges, which represent the numbers of edges that exist between any members of two clusters and the similarities between their properties, respectively.

In case the number of clusters is less than that preferred by the user (specified through #Clusters in the Property-Based Clustering Panel), the biggest cluster is recursively split until the number of the clusters is equal to that in #Clusters or breaking only one more cluster makes the number of the clusters larger than that specified in #Clusters.

While showing the abstracted network view, NaviClusterCS allows researchers to interactively zoom, move laterally beyond cluster boundaries, focus on an arbitrary set of clusters/nodes, etc. Any subset of the entire network of particular interest to the researcher can be fed into the clustering components again and the abstracted view of that subset is displayed.

For Beginners, please read step-by-step tutorials in How to Use.

If you want to see the explanations of each component in NaviClusterCS, please proceed to User Interface.

Examples of navigating some datasets can be found in Examples.