If we dvide the data into 50%, how do we decide which one to use?
At the end, we develop k models. Which one we should use to predict the label of an unlabeled data?
None of the k models. K-fold cross-validation is only used to estimate the performance of our model. For real prediction, we should use the entire dataset for training, and that trained model should be used for prediction.
Underfitting model is too simple, both training and test errors are high
Overfitting model is too complex, training error is small, but test error is large.