Absolute CU value as a measure of overall data homogeneity
Five COG families have been selected. Combinatorically different sets of three families have been chosen out of the five 5C3. Hence a total of 10 data sets were obtained (5C3). In clustering, edges are distinguished depending on whether they are between two nodes of different clusters or are in the same.
Homogeneity denotes the overall degree of mutual similarity between clusters of a data set. It is quantitatively defined as
H = Number of out-edges / Total number of edges
This value vary between 0 and 1. Zero value corresponds a case where all edges are within clusters and none are found to bridge across clusters. The homogeneity range of ten data sets is approximately 0.8, which is very wide.
Perfect clustering results were generated for each data set and corresponding CU vales were measured. Figure 1. indicates that CU value linearly decreases as the homogeneity increases. Maximum value is 3.5 and minimum is approximately -0.5.
Table 1. Five COG family profiles
| COG | Nodes | Description |
| 0001 | 35 | [H] COG0001 Glutamate-1-semialdehyde aminotransferase |
| 0160 | 79 | [E] COG0160 PLP-dependent aminotransferases |
| 0161 | 49 | [H] COG0161 Adenosylmethionine-8-amino-7-oxononanoate aminotransferase |
| 0331 | 38 | [I] COG0331 (acyl-carrier-protein) S-malonyltransferase |
| 0523 | 28 | [R] COG0523 Putative GTPases (G3E family) |

Figure 1. Linear relationship between CU value and data homogeneity
Plots of QI vs. CU


(1) COG 0001, 0160, 0161 (2) COG 0001, 0160, 0331
(3) COG 0001, 0160, 0523 (4) COG 0001, 0161, 0331


(5) COG 0001, 0161, 0523 (6) COG 0001, 0331, 0523


(7) COG 0160, 0161, 0331 (8) COG 0160, 0161, 0523


(7) COG 0160, 0331, 0523 (8) COG 0161, 0331, 0523
Figure 2. Linear correlation of each data sets