OverviewSignaling pathways and metabolic networks are cornerstones of molecular and cellular biology. Hypergraphs, hyperpaths, and factories provide a precise model for these networks and pathways but have been underutilized, due to their theoretical intractability and the lack of practical algorithms. The best previously-available method to find a shortest hyperpath, which is NP-complete, can only find acyclic hyperpaths, even though cycles are abundant in real signaling pathways. No available methods find a factory using the fewest reactions, and no approach to find hyperpaths or factories takes into account negative regulation.
MethodsWe give here source code for several new algorithms that find pathways in metabolic and signaling networks. A new formulation of the minimum-hyperedge factory problem as a mixed-integer linear program (MILP), which includes two new orders of negative regulation that take into account direct and indirect regulation. A new heuristic of finding shortest hyperpaths which for the first time may contain cycles. A new hyperpath enumeration algorithm, which exhaustively explores all possible s,t-hyperpaths. We also provide the datasets and the hypergraphs constructed from them, for all datasets referenced in our papers.
A factory in a metabolic network is a collection of reactions that produce a target substance starting from a set of source substances, properly taking into account the stoichiometries of intermediate metabolites in reactions. The reactions in the factory may form cycles, and effectively can proceed simultaneously. Our approach for finding minimum-hyperedge factories solves an MILP that minimizes the number of hyperedges, while ensuring flux constraints on intermediate metabolites are satisfied (either accumulation, where intermediate metabolites cannot be consumed, or conservation, where they also cannot be produced in excess). We also incorporate first-order negative regulation -- where a factory cannot create any metabolite that negatively regulates a reaction that is also in that factory -- as constraints in our MILP. For second-order negative regulation, we also exclude factories where molecules that negatively regulate a reaction in the factory can be created from the sources the factory uses.
Performance with negative regulation
In comprehensive experiments on all instances from several metabolic and signaling networks, we found that our MILP-based approach is remarkably fast in practice, with a median running time of only a few seconds even including first-order and second-order negative regulation.
Our fast heuristic often finds a provably optimal hyperpath on real signaling pathways. Based on Dijkstra's algorithm for shortest paths in ordinary graphs, our hyperpath heuristic finds a possibly-cyclic hyperpath in under a minute, even on large cell-signaling hypergraphs with nearly 10,000 hyperedges.
Hyperpath enumeration algorithm
Our hyperpath enumeration algorithm maintains a queue of hyperpath problem instances consisting of insets and outsets. An outset for a problem instance is a set of hyperedges that are not allowed to be in the solution. An inset is a set of hyperedges that cannot be placed in the outset of child problem instances spawned from the current instance. Together, these sets ensure that the solution to a problem instance is ever computed twice.
Hyperpath heuristic performance
On datasets from two common pathway databases, Reactome and NCI-PID, our hyperpath heuristic (in less than a minute of computation) finds a demonstrably optimal solution on 99.5% of all possible source, target instances. Furthermore, these experiments reveal the best prior methods, which were limited to acyclic hyperpaths, failed to find any solution for many instances where all source-target hyperpaths are in fact cyclic. For all such cyclic biological instances, our heuristic hyperpath is optimal.
Source codeThe minimum-hyperedge factory with negative regulation source code is available on GitHub.
The heuristic and hyperpath enumeration source code is available on GitHub.
The hyperpaths source code is free for noncommercial use, and comes with neither warranty nor guarantee. The hyperpaths source code cannot be redistributed in any form without consent of the authors. If you wish to use the hyperpaths source code for commercial purpose, you must first obtain the permission from all authors. All noteworthy uses of the hyperpaths source code should cite the related paper.
Noteworthy uses of the minimum-hyperedge factory source code should cite the following publication: Spencer Krieger and John Kececioglu, “Computing optimal factories in metabolic networks with negative regulation”, In submission for proceedings of Intelligent Systems in Molecular Biology (ISMB 2022).
Noteworthy uses of the hyperpath heuristic or hyperpath enumeration algorithm source code should cite the following publication: Spencer Krieger and John Kececioglu, “Fast approximate shortest hyperpaths for inferring cell signaling pathways”, Proceedings of the 21st International Workshop on Algorithms in Bioinformatics (WABI 2021), 20:1-20.
Research supported by the US National Science Foundation through grants IIS-2041613 and CCF-1617192.