11 Search models

Network models can be searched for one that best describes the data as given by the network model assumptions, network constraints, the network score and the data itself. The search option uses a network to start the search and finishes with it updated to the found best fit network. (If a network is not set then a default network is used.) Any constraints to the search must be set when the network is setup, see section 6.

11.1 Options

The options are as follows:

Option Description Default
-search-models do a task to search network models
-search-models-name name label the task with a name Task-n
-search-models-network-name network the name of the network to start the search from previous network (or the default model given by a node for each data variable and no edges if there is no previous network)
-search-models-file search.dat record the network models and scores in the search path to file search.dat
-search-models-random-restarts n do another n searches starting from a random network 0
-search-models-jitter-restarts m after the initial search and every random restart search do another m searches jittered from the recently found network 0

11.2 Greedy search

The greedy search algorithm is the default algorithm for searching through network models, and is currently the only search algorithm.

Number of random restarts for the greedy algorithm

The greedy algorithm can be ran a further number of times from a random starting network. The number of random restarts is set by using the option -search-models-random-restarts.

Number of jitter restarts for the greedy algorithm

Once the greedy algorithm has converged on a final best fit network, the algorithm can be restarted at a network given by slightly modifying the best fit network, also called jittering. This may be useful to avoid the algorithm sticking in a local maximum whilst still retaining more or less the same network. The number of times times the search should be jittered is set by using the option -search-models-jitter-restarts.

Random restarts and jittered restarts can be used together, if there are n random restarts and m jittered restarts then there will be (n + 1) times m searches.

11.3 Example

As an example of searching through network models the parameter file paras-search.txt, which can be found in example.zip, searches through network models starting from the default model given by a node for each data variable and no edges.

#input continuous data
-input-data
-input-data-file example-cts.dat
-input-data-cts

#input discrete data
-input-data
-input-data-file example-discrete.dat
-input-data-discrete

#input SNP data as discrete data
-input-data
-input-data-file example.bed
-input-data-discrete-snp

#search network models
-search-models

This can be executed as usual

./bayesnetty paras-search.txt

and will output something as follows

BayesNetty: Bayesian Network software, v1.00
--------------------------------------------------
Copyright 2015-present Richard Howey, GNU General Public License, v3
Institute of Genetic Medicine, Newcastle University

Random seed: 1551700554
--------------------------------------------------
Task name: Task-1
Loading data
Continuous data file: example-cts.dat
Number of ID columns: 2
Including (all) 2 variables in analysis
Each variable has 1500 data entries
Missing value: not set
--------------------------------------------------
--------------------------------------------------
Task name: Task-2
Loading data
Discrete data file: example-discrete.dat
Number of ID columns: 2
Including the 1 and only variable in analysis
Each variable has 1500 data entries
Missing value: NA
--------------------------------------------------
--------------------------------------------------
Task name: Task-3
Loading data
SNP binary data file: example.bed
SNP data treated as discrete data
Total number of SNPs: 2
Total number of subjects: 1500
Number of ID columns: 2
Including (all) 2 variables in analysis
Each variable has 1500 data entries
--------------------------------------------------
--------------------------------------------------
Task name: Task-4
Searching network models
--------------------------------------------------
Loading defaultNetwork network
Network type: bnlearn
Network score type: BIC
Total number of nodes: 5 (Discrete: 3 | Factor: 0 | Continuous: 2)
Total number of edges: 0
Network Structure: [express][pheno][mood][rs1][rs2]
Total data at each node: 1495
Missing data at each node: 5
--------------------------------------------------
Network: defaultNetwork
Search: Greedy
Random restarts: 0
Random jitter restarts: 0
Network Structure: [mood][rs1][rs2][express|rs1:rs2][pheno|express:mood]
Network score type: BIC
Network score = -8213.45
--------------------------------------------------

Run time: less than one second

The above shows the data input and then the default network input consisting of a node for each data variable given by the data and no edges. The network with the highest network score is shown in the output.