>In machine learning, overfitting generally results from having too much tes...

MattRogish · on Aug 17, 2012

You're right, I had a brain hiccup with respect to the test/training sets (I used it correctly later on). However, it was my understanding that too many attributes can cause overfitting, and the wiki article suggests this, too. Where am I wrong?

"Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. "

http://en.wikipedia.org/wiki/Overfitting

mistercow · on Aug 17, 2012

> I had a brain hiccup with respect to the test/training sets (I used it correctly later on).

Just to be clear, it's not just that you said "test data" instead of "training data", but that you said that too much is a bad thing. More data is always a good thing for ML.

[Edit: Actually, there are times where it may not be. If you're doing something like image classification and your data is being created by hand qualitatively, you can actually get overfitting from adding data. As far as I understand, this is because the measurement based on fuzzy perceptual qualities is biased, so the algorithm will overfit to that bias. Maybe this applies with your analogy; I'm not sure.]

>"Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. "

Well, that's in reference to overfitting as it applies to statistical models, not machine learning. To apply that reasoning to machine learning you have to look at the output of the machine learning algorithm rather than the parameters fed into the algorithm itself. That is, an overfit learned function will often be characterized by excessive complexity, but this is not a result of telling the learning ML algorithm to look at too many parameters. It's a result of telling the ML algorithm to train for too long given the size of its training set.

A key point to note is that an overfit function can be excessively complex even based on very few input parameters if it builds the learned function out of overly complex relationships between those parameters. Conversely, it can build a very simple function, even if many of the parameters prove to be irrelevant, by simply not making the learned function depend on those parameters at all.