I don't work with NN at all, but it kind of seems like the author set up their C...

I don't work with NN at all, but it kind of seems like the author set up their CNN with a large enough filter to see a whole spot at once, but not large enough to see a whole cat at once. Then he complains that it doesn't know what a cat looks like. Would it be possible to make a larger filter with a lower resolution such that overhead is the same as the smaller filter but it can get a higher-level view of the image?

Also, the author spends the first section of the article determining that it is in fact a jaguar-print sofa (which the model also confirms) but continues to throw around the word "leopard". They're not making it any easier for the future machine learning algorithms that try to identify an image by the text surrounding it. ;)