Computational tools in drug discovery are typically tested on clean and often flawed benchmarks, especially with machine-learning based tools, leading to optimistic characterisation of the tool's ability. My research interests are exploring how this difference in proposed performance and real-life performance can be accounted for and measured, specifically for predicting the binding affinity between a small molecule drug and a protein target. I will hopefully build upon this to improve the accuracy and generalisability of these models and then incorporate structural uncertainty into these models decision making.