Performance modeling is an important and active area of research in high-performance computing (HPC). It helps in better job scheduling and also improves overall performance of coupled applications. Sufficiently rich analytical models are challenging to develop, however, because of interactions between different node components, network topologies, job interference, and application complexity. When analytical performance models become restrictive because of application dynamics and/or multicomponent interactions, machine-learning-based performance models can be helpful. While machine learning (ML) methods do not require underlying system or application knowledge, they are efficient in learning the unknown interactions of the application and system parameters empirically using application runs. We present a benchmark study in which we evaluate eleven machine learning methods for modeling the performance of four representative scientific applications that are irregular and with skewed domain configurations on four leadership-class HPC platforms. We assess the impact of feature engineering, size of training set, modern hardware platforms, transfer learning, extrapolation on the prediction accuracy, and training and inference times. We find that bagging, boosting, and deep neural network ML methods are promising approaches with median R2 values greater than 0.95 and these methods do not require feature engineering. We demonstrate that cross-platform performance prediction can be improved significantly using transfer learning with deep neural networks.