Transfer learning has emerged to be crucial in various computer vision tasks benefiting from the vast availability of pre-trained deep learning models. However, selecting an optimal model for a particular downstream task remains a challenging problem. Existing methods for measuring model transferability rely on statistical correlations between encoded static features and task labels, but overlook the impact of underlying representation dynamics during fine-tuning, showing unreliable results, especially for self-supervised pre-trained models. Therefore, this paper aims to address the challenges of properly ranking self- supervised models. Through the lens of potential energy, we reframe the challenge in model selection and propose a novel physics-inspired method that models the fine-tuning dynamics without network optimization. Specifically, our method views the effect of separating different feature clusters as a repulsion-based force and we can capture the movement of dynamic representations by physical motion simulation, leading to a more stable observation for transferability estimation. The experimental results on 10 downstream tasks and 12 self-supervised models demonstrate that, our approach can be integrated into existing ranking techniques and boost their performance, revealing its potential of understanding the mechanism in transfer learning.



