Find the Topological Ordering of parameters, which are assumed to come from a linear SEM (also known as SCM) where it is assumed that the independent noise terms have equal error variance.

top_order(X, method = "TD", ...)

Arguments

X

A matrix containing the oberved variables.

method

The estimation method. Possible choises are "TD", "BU", "HTD" and "HBU".

...

Terms passed to the method specific estimation steps.

If the method "HBU" is specified it is possible to specify a tuning parameter M to be used in the organic lasso.

If the method "HTD" is specified it is posible to specify a max.degree and a search. The max.degree specifies the assumed maximal in-degree in the true causal graph. The parameter search may be set to either "full", "B&B" or "OMP" indicating how the algorithem should search for max.degree number of parameters to produce the lowest conditional variance.

The "full" search method simply looks at all possible subsets of the current ansestreal set of size max.degree, the "B&B" method searches via a bound and branch method, and lastly "OMP" uses orthogonal matching pursuit (forward selection) to find max.degree variables from the ansestreal set.

Value

A vector of length equal to the number of parameters with the estimated topological ordering is returned.

Details

The function top_order estimated the topological order of the causal graph. The method is based on the theory and pseudo code from Chen, Drton and Wang (2018). The 4 methods "TD" (Top Down), "BU" (Bottom Up), "HTD" (High dimensional Top Down) and "HBU" (High dimensional Bottom Up) are implemented in this function.

The "TD" and "BU" methods only work when the number of observations n is larger then the number of parameters p. Moreover the consistency results shown in the preprint only hold if \(n > p^2 log(p)\).

The "HTD" and "HBU" both require extra assumption either on the maximal in-degree or on the maximal size of markov blankets in the graph respectivly. Hence these methods both has parameters that allows the user to tune and use different sub procedures that fit the assumptions on the graph.

The "HBU" needs a tuning parameter M which is used in the organic lasso to estimate the conditional variance: $$\hat{\sigma}^2_{j,\Theta} = \min((1/n)||X_j - X_{V \ (\Theta \cup \{j\})}\beta||_2^2 + 2\lambda||\beta||_1^2)$$ where \(\lambda = (2M(1/n)log(p))^{1/2}\).

The "HTD" needs both a max.degree equal to the assumed maximal in-degree in the graph and a search parameter. One of three different search procedures "full", "B&B" or "OMP" must be used. All three procedures seek to find a subset of variables of size max.degree that minimize the conditional variance of the current variable in question given this subset.

The "full" search method simply goes through all poissible subsets of size max.degree. The "B&B" searched by bound and branch methods. Lastly the "OMP" procedure findes max.degree variables via forward selection. -

References

Chen, W., Drton, M. & Wang, Y. S. (2018). On Causal Discovery with Equal Variance Assumption. arXiv preprint arXiv:1807.03419.

See also

The wrapper function graph_estcombines top_order and the model selection function graph_from_top into one function that estimates the causal graph from the data.

Examples

n <- 1000 p <- 5 sigma <- 1 # an example where all nodes in the graph are root nodes X <- matrix(rnorm(n * p, 0, sigma), ncol = p) order <- top_order(X) # an example where pa(X_i) = X_j for all j < i for (i in seq_len(p)[-1]) { X[ ,i] <- X[ ,i-1] + X[ ,i] } top_order(X, method = "TD")
#> [1] 1 2 3 4 5
top_order(X, method = "BU")
#> [1] 1 2 3 4 5
top_order(X, method = "HTD", max.degree = 2L, search = "B&B")
#> [1] 1 2 3 4 5
top_order(X, method = "HBU")
#> [1] 1 3 4 2 5