If you describe an image with sufficient fidelity on the specific expression to another human--someone who has never seen the original photo, mind you--that they are able to produce something that feels "substantially similar" to the original photo, that is still going to be copyright infringement. There is a reason that, if you want to reverse engineer and reimplement some piece of technology, even if you are using humans, you are supposed to have not only the team which doesn't get to see the original device and will implement it based on a description from a second team whose job is to carefully analyze the device to build documentation for that first team... but you then take that documentation and you run it through lawyers who carefully attempt to verify that the description is only of the factual content required for interoperability / behavior and doesn't accidentally include any of the expression inherent in the design of the original product. These examples from this demo video aren't just "image of a pile of gemstones" but stuff like "image of exactly six gems in this specific orientation with this specific color palette and texture and lighting and..."; there's just no way you are going to get a lawyer to sign off on a description as not including any of the expressive elements of the original image.