Research
Two projects measuring what automated reasoning can and can't do — one pointed at DeFi exploits, one at materials discovery.
BRIDGE-bench — LLM reasoning against compositional bridge hacks
A benchmark of real cross-chain bridge exploits, scored on whether an analyzer can flag the vulnerability that was actually used. The exploits are compositional: the bug lives in the interaction of several components rather than in any single line, which is exactly where pattern-matching static analysis falls down. The pipeline pairs a static pre-filter — to cut the search space — with an LLM that reasons over the surviving candidates.
Result: static analysis alone scores ~0% F1 on the compositional bridge hacks; a static-pre-filtered LLM reaches ~40% F1.
[code] · DOI: 10.5281/zenodo.20604295
gnome-materials — active learning for materials discovery
A GNoME-style active-learning loop for materials discovery over a pretrained CHGNet potential. Rather than labeling the full candidate pool, the loop selects the most informative candidates to evaluate, recovering most of the top-ranked materials at a fraction of the labeling cost.
Result: 95% top-100 recall at a 20% labeling budget.
[code] · dataset: pending
Security write-ups on the security page; background on the about page.