>In machine learning, overfitting generally results from having too much test data or focusing on irrelevant metrics.
Huh? Overfitting usually happens when your training set is too small. The size of the test set does not affect overfitting because the test set is, by definition, only used to evaluate the accuracy of the final learned function.
In addition, overfitting doesn't happen because of "focusing on irrelevant metrics". It happens because your data set is noisy, or because your learning model is too simplistic to fully model the observed phenomenon (which is known as deterministic noise).
If your model focuses on irrelevant metrics, that won't actually be a problem as long as your training set is large enough to reveal their irrelevance. After training, those metrics will not have much bearing on the output function.
This misinterpretation of overfitting really hurts the analogy.
You're right, I had a brain hiccup with respect to the test/training sets (I used it correctly later on). However, it was my understanding that too many attributes can cause overfitting, and the wiki article suggests this, too. Where am I wrong?
"Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. "
> I had a brain hiccup with respect to the test/training sets (I used it correctly later on).
Just to be clear, it's not just that you said "test data" instead of "training data", but that you said that too much is a bad thing. More data is always a good thing for ML.
[Edit: Actually, there are times where it may not be. If you're doing something like image classification and your data is being created by hand qualitatively, you can actually get overfitting from adding data. As far as I understand, this is because the measurement based on fuzzy perceptual qualities is biased, so the algorithm will overfit to that bias. Maybe this applies with your analogy; I'm not sure.]
>"Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. "
Well, that's in reference to overfitting as it applies to statistical models, not machine learning. To apply that reasoning to machine learning you have to look at the output of the machine learning algorithm rather than the parameters fed into the algorithm itself. That is, an overfit learned function will often be characterized by excessive complexity, but this is not a result of telling the learning ML algorithm to look at too many parameters. It's a result of telling the ML algorithm to train for too long given the size of its training set.
A key point to note is that an overfit function can be excessively complex even based on very few input parameters if it builds the learned function out of overly complex relationships between those parameters. Conversely, it can build a very simple function, even if many of the parameters prove to be irrelevant, by simply not making the learned function depend on those parameters at all.
Huh? Overfitting usually happens when your training set is too small. The size of the test set does not affect overfitting because the test set is, by definition, only used to evaluate the accuracy of the final learned function.
In addition, overfitting doesn't happen because of "focusing on irrelevant metrics". It happens because your data set is noisy, or because your learning model is too simplistic to fully model the observed phenomenon (which is known as deterministic noise).
If your model focuses on irrelevant metrics, that won't actually be a problem as long as your training set is large enough to reveal their irrelevance. After training, those metrics will not have much bearing on the output function.
This misinterpretation of overfitting really hurts the analogy.