Research

Two projects measuring what automated reasoning can and can't do — one pointed at DeFi exploits, one at materials discovery.

BRIDGE-bench — LLM reasoning against compositional bridge hacks

A benchmark of real cross-chain bridge exploits, scored on whether an analyzer can flag the vulnerability that was actually used. The exploits are compositional: the bug lives in the interaction of several components rather than in any single line, which is exactly where pattern-matching static analysis falls down. The pipeline pairs a static pre-filter — to cut the search space — with an LLM that reasons over the surviving candidates.

Result: static analysis alone scores ~0% F1 on the compositional bridge hacks; a static-pre-filtered LLM reaches ~40% F1.

[code] · DOI: 10.5281/zenodo.20604295

gnome-materials — active learning for materials discovery

A GNoME-style active-learning loop for materials discovery over a pretrained CHGNet potential. Rather than labeling the full candidate pool, the loop selects the most informative candidates to evaluate, recovering most of the top-ranked materials at a fraction of the labeling cost.

Result: 95% top-100 recall at a 20% labeling budget.

[code] · dataset: pending

Security write-ups on the security page; background on the about page.