Approaches for shortest hyperpath and minimum-hyperedge factories in directed hypergraphs
Overview
Signaling pathways and metabolic networks are cornerstones of molecular and cellular biology. Hypergraphs, hyperpaths, and factories provide a precise model for these networks and pathways but have been underutilized, due to their theoretical intractability and the lack of practical algorithms. The best previously-available method to find a shortest hyperpath, which is NP-complete, can only find acyclic hyperpaths, even though cycles are abundant in real signaling pathways. No available methods find a factory using the fewest reactions, and no approach to find hyperpaths or factories takes into account negative regulation.Methods
We give here source code for several new algorithms that find pathways in metabolic and signaling networks.- A new formulation of the minimum-hyperedge factory problem as a mixed-integer linear program (MILP), which includes two new orders of negative regulation that take into account direct and indirect regulation (see more here).
- A new algorithm to find exact shortest hyperpaths which for the first time may contain cycles (see more here).
- An efficient heuristic for finding shortest hyperpaths that is optimal on over 99% of instances from two standard signaling databases (see more here), along with a new hyperpath enumeration algorithm, which exhaustively explores all possible s,t-hyperpaths.
- We also provide the datasets and the hypergraphs constructed from them, for all datasets referenced in our papers.
Minimum-hyperedge factory
A factory in a metabolic network is a collection of reactions that produce a target substance starting from a set of source substances, properly taking into account the stoichiometries of intermediate metabolites in reactions. The reactions in the factory may form cycles, and effectively can proceed simultaneously. Our approach for finding minimum-hyperedge factories solves an MILP that minimizes the number of hyperedges, while ensuring flux constraints on intermediate metabolites are satisfied (either accumulation, where intermediate metabolites cannot be consumed, or conservation, where they also cannot be produced in excess). We also incorporate first-order negative regulation -- where a factory cannot create any metabolite that negatively regulates a reaction that is also in that factory -- as constraints in our MILP. For second-order negative regulation, we also exclude factories where molecules that negatively regulate a reaction in the factory can be created from the sources the factory uses.
Performance with negative regulation
In comprehensive experiments on all instances from several metabolic and signaling networks, we found that our MILP-based approach is remarkably fast in practice, with a median running time of only a few seconds even including first-order and second-order negative regulation.
Optimal shortest hyperpaths in directed hypergraphs
We formulate Shortest Hyperpaths as an integer linear program (ILP) based on a characterization of the problem in terms of superpaths. We solve this ILP with a cutting plane algorithm that starts with an initial set of constraints that are a subset of those in the full ILP, and iteratively adds violated constraints until a hyperpath is returned, which will always be provably optimal.
Integer linear program
The variables of our ILP encode the hyperedges in a superpath \(F\). For every hyperedge \(e \in E\), there is a variable \(x_e\), which takes on integer values \(x_e \in \{0,1\}\). An assignment of values to these variables encodes a superpath \(F\) by \(x_e =1\) iff \(e \in F\). We represent the collection of all variables in the ILP by a vector \(x = (x_e) e \in E\), where \(x \in \{0,1\}^{|E|}\). The constraints consist of one constraint for each \(s,t\)-cut, encoded by ensuring the indicator variables for edges crossing the cut sum to a value greater than 1.
Cutting plane algorithm
For a hypergraph of \(n\) vertices and \(m\) hyperedges, the integer linear program given above has \(m\) variables and \(\Theta (2^n)\) constraints (corresponding to the number of \(s,t\)-cuts). Thus for a large hypergraph, we cannot even feasibly write down the corresponding ILP, due to its exponentially-many constraints. Nevertheless, we can actually compute optimal solutions to this full ILP in practice, even for very large inputs, using an approach known as a cutting plane algorithm. A cutting plane algorithm computes over a subset of the constraints of the full integer linear program, and solves a series of less-constrained problems, stopping once it detects it has a solution to the full ILP.
Hyperpath heuristic
Our fast heuristic often finds a provably optimal hyperpath on real signaling pathways. Based on Dijkstra's algorithm for shortest paths in ordinary graphs, our hyperpath heuristic finds a possibly-cyclic hyperpath in under a minute, even on large cell-signaling hypergraphs with nearly 10,000 hyperedges.
Hyperpath enumeration algorithm
Our hyperpath enumeration algorithm maintains a queue of hyperpath problem instances consisting of insets and outsets. An outset for a problem instance is a set of hyperedges that are not allowed to be in the solution. An inset is a set of hyperedges that cannot be placed in the outset of child problem instances spawned from the current instance. Together, these sets ensure that the solution to a problem instance is ever computed twice.
Hyperpath heuristic performance
On datasets from two common pathway databases, Reactome and NCI-PID, our hyperpath heuristic (in less than a minute of computation) finds a demonstrably optimal solution on 99.5% of all possible source, target instances. Furthermore, these experiments reveal the best prior methods, which were limited to acyclic hyperpaths, failed to find any solution for many instances where all source-target hyperpaths are in fact cyclic. For all such cyclic biological instances, our heuristic hyperpath is optimal.
Source code
The minimum-hyperedge factory with negative regulation (Odinn) source code is available on GitHub.The optimal hyperpaths (Mmunin) source code is available on GitHub.
The heuristic (Hhugin) and hyperpath enumeration source code is available on GitHub.
Terms of Use
The hyperpaths source code is free for noncommercial use, and comes with neither warranty nor guarantee. The hyperpaths source code cannot be redistributed in any form without consent of the authors. If you wish to use the hyperpaths source code for commercial purpose, you must first obtain the permission from all authors. All noteworthy uses of the hyperpaths source code should cite the related paper.
Citation
Noteworthy uses of the minimum-hyperedge factory source code should cite the following publication:
Spencer Krieger and John Kececioglu, “Computing optimal factories in metabolic networks with negative regulation.” Proceedings of the 28th International Conference on Intelligent Systems for Molecular Biology (ISMB), Bioinformatics 38, 2022. link
Noteworthy uses of the optimal hyperpaths source code should cite the following publications
- Spencer Krieger. Algorithmic inference of cellular reaction pathways and protein secondary structure. Department of Computer Science, The University of Arizona, July 2022. link
- Spencer Krieger and John Kececioglu. “Computing shortest hyperpaths for pathway inference in cellular reaction networks.” Proceedings of the 27th International Conference on Research in Computational Molecular Biology (RECOMB), Springer LBNI 13976, 155-173, 2023. link
Noteworthy uses of the hyperpath heuristic or hyperpath enumeration algorithm source code should cite the following publication: Spencer Krieger and John Kececioglu, “Heuristic shortest hyperpaths in cell signaling hypergraphs,” Algorithms for Molecular Biology, vol. 17, no. 1, p. 12, 2022. link
Funding
Research supported by the US National Science Foundation through grants IIS-2041613 and CCF-1617192.