Step-by-step Example

In this example, we demonstrate how to use BiVisu to analyze gene expression data using an artificial dataset. The dataset is generated using the following code. In this example, there are four biclusters with the following specifications

Bicluster Type Size Rows Columns
1 Constant rows 40x7 1-40 1-7
2 Constant rows 25x10 41-65 6-15
3 Constant columns 35x8 61-95 13-20
4 Additive model 40x8 96-135 20-27

1. Startup and data loading

Download "" and "Art_Noise0d2_Set1.txt" (inside the zip file of artificial datasets for additive models) under the download section in the main page. Unzip and put the data file "Art_Noise0d2_Set1.txt" in the same directory of the unzipped files. Start Matlab and change the path to the directory of the BiVisu. Execute BiVisu using GUIDE in Matlab. To load the artificial data set, select "File" -> "Load". In the pop-up dialog box, select the file "Art_Noise0d2_Set1.txt" and then click "Open". In "Pre-processing" menu, select "none" so that no pre-processing is performed.

2. Biclustering for additive models

To detect biclusters of additive models, select "Run" -> "Additive Model". In the configuration dialog box, assign the following values to the parameters,

Minimum % of rows: 10.5
Noise threshold: 1
Minimum no. of columns: 5
Maximum overlapping allowed: 30

Click "OK" to start biclustering. When biclustering is finished, the main window changes to the one as shown in Fig. 1.

Fig. 1. Display after biclustering.

3. Display of results

In BiVisu, biclustering results are shown using statistical quantities and PC plots as described in Fig. 1. The statistical quantities are given in the left panel and divided into two parts; overall and individual information. The overall information includes total number of detected biclusters, overall averages of mean squared residual score (MSRS) and average correlation value (ACV). The individual bicluster information includes size, index, MSRS, ACV, row (gene) and column (condition) labels of a currently selected bicluster. In particular, MSRS and ACV provide objective measures to users for homogeneity evaluation of the detected biclusters. Since there is no label of genes and conditions provided in the input file, internal indexing is displayed in label text boxes.

In this example, four biclusters which are identical to those embedded in the dataset are detected. Initially, the information and PC plot of the first bicluster is displayed. The biclusters are ordered from the largest bicluster to the smallest one. To move to the next bicluster, press "Next" below the PC plot. To move to the previous bicluster, press "Prev". To jump to a particular bicluster (says the third one), select its corresponding index (3 in this example) in the drop down menu as shown in Fig.2.

Fig. 2. Drop down menu for selecting a bicluster.

By default, expression values are displayed in the PC plot as provided in Fig. 1 and Fig. 2. To visualize the coherence of genes, the figure can be changed to display difference matrix as shown in Fig. 3 by selecting "Tools" -> "Plot" -> "Difference Matrix".

Fig. 3. The PC plot of difference matrix.

In BiVisu, a particular gene / a set of genes in a bicluster can be highlighted in the PC plot by selecting its/their label(s) in the left panel. For example, by first clicking on the gene label 'R4' and then clicking on the gene label 'R5' with 'Alt' button pressed, genes 'R4' and 'R5' can be displayed in red color as shown in Fig. 4.

Fig. 4. An example of highlighting the representations of a set of genes in the PC plot.

The difference matrix allows visualization of coherence in additive models by considering the element-wise difference between each column and a reference column. The reference column can be selected as anyone of columns in a bicluster through the drop down menu as demonstrated by Fig. 5. Note that the scale of the PC plots can be changed through the zoom in and zoom out functions provided in the tool bar for examination.

Fig. 5. Drop down menu for selecting reference column in the PC plot of difference matrix.

One of analysis using PC plots is to evaluate the homogeneity of detected biclusters. Unlike MSRS and ACV, PC plots can visualize the behaviour of each gene in each condition. If noise threshold which is the coherence control parameter of the biclustering algorithm is set to be too large, says 1.8, in this example, large deviation can be found between a line and the lines in the main group in the PC plot of difference matrix shown in Fig. 6. The large deviation implies the change of noise threshold to be a smaller value of noise threshold.

Fig. 6. Results with the PC plot of difference matrix obtained using larger noise threshold 1.8. Large deviation between some lines and the main group of lines exists.

In order to allow comparison between coherence of genes in a bicluster and incoherence of genes not in the bicluster, PC plots for all genes over the conditions of the bicluster can be displayed by selecting "Show all rows" in the "Display" menu. A separate window shown in Fig. 7 is popped up. The lines in green color represent genes in the bicluster while the lines in blue color describe the other genes. For multiplicative models, BiVisu provides plot of ratio matrix which has the similar functions as the difference matrix for additive models.

Fig. 7. A separate window displaying genes in a bicluster and genes not in the bicluster. The PC plot of difference matrix is selected.

Besides PC plots, a heat map of biclusters, which is one of the most popular techniques for visualization, can be drawn through "Display" -> "Heat map". In the pop-up window as shown in Fig.8(a), the data is re-ordered such that the selected bicluster is displayed in the top-left corner. The bicluster is highlighted by boundaries drawn in cyan color. In some cases, the contrast of visualization can be improved by clicking "Clip outliners" button to remove the outliners at the lower and upper end of data range as illustrated in Fig. 8 (b).

Fig. 8(a). A heat map of biclustering results. (b) The heat map after clipping the outliners by clicking "Clip outliners" button.

4. Other functionalities

In BiVisu, biclusters can be selected subject to minimum number of rows, minimum number of columns, maximum number of biclusters and maximum overlapping allowed. This functionality can be accessed from "Tools" -> "Filter". Fig. 9 shows the results after selection with the following parameters,

Minimum no. of rows: 30
Minimum no. of columns: 5
Maximum no. of biclusters: 4
Maximum % of overlapping allowed: 30

After filtering, only three biclusters remain. Bicluster 2 is filtered out because the number of rows of it is less than 30.

Fig. 9. Display after bicluster selection.

The biclustering results can be exported to text files by selecting "Export" in "File" menu. A dialog is displayed for the selection of output types. There are two output types; information of each bicluster and information of biclusters with certain important statistical properties. Check the two boxes to export all output types as illustrated in Fig. 10. Then two files shown in Fig. 11. are generated. Note that two extra lines for column labels (condition names) and row labels (gene names) are available if the labels are provided in the input data. As there is no label given in the input file in this example, these two lines are omitted.

Fig. 10. A dialog box for output type selection

Fig. 11. Output files: (a) information of all detected biclusters and (b) information of biclusters with certain important statistical properties.

BiVisu also allows the information of parameters in biclustering and filtering to be displayed. By selecting "Configuration" in "Display" menu, the information is displayed as shown in Fig. 12.

Fig. 12. Dialog box showing information about parameters of biclustering and bicluster selection.
Back to main page