Untangling Hierarchical Organization of Large and Complicated Protein Networks

In this example, using the sample dataset, we demonstrates how the zooming and searching functions can be employed together to lead researchers to the protein of interest, in this case Cla4. It is a p21-activated protein kinase that acts as an effector of Cdc42. Cla4 has been implicated in many important biological processes such as cell polarization, cytokinesis, and exit from mitosis.
  1. Starting from the most abstract view after loading the sample dataset.
  1. Type the name of the protein, Cla4, in the search textbox, and click the Search button.
    Then Cla4 appears in the below panel.
  1. Click the Highlight Selected Nodes Button to highlight the cluster encompassing the protein, which is the protein amino acid phosphorylation cluster.
  1. Double-click on the protein amino acid phosphorylation cluster (or select the cluster and press the Zoom In button). The members of the cluster will then be visualized.

As can be seen, in this view, the cluster containing Cla4 (the protein amino acid phosphorylation cluster) is highlighted automatically.

  1. Double-click on the protein amino acid phosphorylation cluster. The members of the cluster is then shown as expected. Again, the cluster containing Cla4 is highlighted.
  1. Double-click on the protein amino acid phosphorylation [establishment of cell polarity] cluster. Its members are then shown as the right figure. Many clusters labeled with processes related to protein amino acid phosphorylation appear in this level such as pseudohyphal growth and establishment of cell polarity.
  1. Double-click on the establishment of cell polarity cluster. Its members are then shown as the right figure. Some clusters in this level are characterized by actin filaments or budding processes, which are related to cell polarization.
  1. Finally, double-click on the regulation of exit from mitosis cluster to display the most specific view, which illustrates the relationship of Cla4 with other nine proteins.

Discovering Knowledge about Proteins of Interest (Nas6, Rpn14, and Hsm3) from a Large Network Intuitively and Effectively

This example shows how the re-centering function can be used to intuitively and effectively retrieve knowledge on proteins of interest, in this case Nas6, Rpn14, and Hsm3, from large and complicated networks. Recent studies have reported that these three proteins, previously known to bind to regulatory particles of proteasomes, actually function as chaperones, assisting in the regulatory particle assembly in yeast.
  1. Starting from the most abstract view after loading the sample dataset.
  1. Click the Make Custom Graph View button to create a view that consists only of Nas6, Rpn14, and Hsm3.
  1. Select the proteins Nas6, Rpn14, and Hsm3 from the middle-left panel of the dialog. You can select the panel first and start typing the names, instead of scrolling through the long list. In this way, the names starting with characters you have typed will appear immediately.
    After finishing selection, press the OK button.
  1. A view composed of only Nas6, Rpn14, and Hsm3 is constructed and displayed on the screen.
  1. Select all three proteins by dragging mouse to create a rectangle that covers all three proteins. Or you can click on any protein first, and then press the SHIFT button on the keyboard and click on the other two.
    If you select all three proteins correctly, the three proteins' borders are highlighted in red as shown on the right figure.
  1. To see the surrounding proteins interacting with these three proteins, you can use the re-centering function, which is implemented as the Re-Center panel on the left of the NaviCluster window.
    After the selection in the previous step,
    • Adjust the number of hops to one.
    • Click the Run button.
  1. A view re-centered on the three proteins with their interacting proteins is then created as shown on the right.
    Here, the fundamental roles of these three proteins as proteasome-related proteins are delineated explicitly, as they interact with many proteins involved in the ubiquitin-dependent protein catabolic process
  1. To investigate the relationships of the three proteins with other proteins that are farther from them, you can adjust the threshold value of the re-centering function.
    • First, keep selection of the three proteins. If you have deselected them, select them again.
    • Adjust the number of hops to two.
    • Click the Run button.
  1. Then a view re-centered on the three proteins, but in a broader context is created. In this view, you can see clusters of more general processes such as DNA repair, cellular response to heat, and protein folding.
    In fact, the relationships with the protein folding and cellular response to heat clusters suggest the recently reported roles of the three proteins as chaperones as well (proteins denatured due to heat are rescued by chaperones, which help fold and assemble proteins).

Navigating an Arabidopsis Gene Co-Expression Dataset

In this example, a subset of Arabidopsis gene co-expression datasets obtained from the ATTED-II database is navigated. The whole network, consisting of 22003 nodes and 45376 edges, is visualized in two aspects, biological process and cellular component. Then, based on WEE1 (AT1G02970), a gene controlling cell cycle arrest and a DNA replication checkpoint, some parts of the network are hierarchically examined. In addition, genes E2F3 (AT2G36010) and DEL3 (AT3G01330), which are members of the E2F transcription factors) are re-centered to reveal their functions.

Node File: atted-coexp-3highest.node

Edge File: atted-coexp-3highest.edge

  1. After loading the dataset, the most abstracted view based on the biological process namespace is visualized.
    • As can be seen, the largest cluster is annotated with protein amino acid phosphorylation. The second largest cluster is the reg. transcript., or regulation of transcription, DNA-dependent cluster.
  1. Then, adjust the namespace weights so that the cellular component namespace gets 10 and others get zero, then click the Re-Cluster button. The abstract view of the network in the aspect of cellular component will then be created.
    • The largest cluster is the one involved with chloroplast. The second largest is the nucleus cluster.
  1. Re-cluster the network again by focusing on only biological process (putting its weight to the maximum and others to zero). Then, search for the gene WEE1 (AT1G02970). A cluster containing the gene, the regulation of transcription, DNA-dependent cluster, is then highlighted.
  1. Double-click the highlighted cluster to get deeper views continuously (from 1342 -> 547 -> 426 -> 55 -> 4). Clusters containing the interested gene are those labeled with regulation of transcription, response to auxin stimulation, and DNA replication checkpoint. These reveal that the gene AT1G02970 functions in the processes involved with DNA replication and transcription.
  1. The deepest view shows the relationships of the gene AT1G02970 with other three genes related to DNA replication checkpoint.
  1. Assume that the genes E2F3 (AT2G36010) and DEL3 (AT3G01330) are of interest now. Click the Make Custom Graph View button to create a view that consists only these two genes.
  1. A view composed of only AT2G36010 and AT3G01330 is constructed and displayed on the screen.
  1. Select both genes and run the re-centering function by setting the number of hops to two. The result is shown on the right figure. As can be seen, they co-express with many genes in phosphorylation, differentiation, and cell division-related processes.
  1. Run the re-centering function on the two genes again but setting the number of hops to three, so that we get a broader view of the relationships of the two. As expected, we get clusters annotated with broader information, such as DNA repair and ubiquin-dependent protein catabolic process.