I'm trying something like this with 3.5; the problem is that it keeps changing its mind about what the API "should" look like for the other parts it can't remember in context. As I'm using it as a collaborator, this is actually fine, but it wouldn't work if I was trying to 1-shot it or otherwise use pure LLM output.
I wouldn't expect this to work well with GPT-3.5. And, with GPT-4 available, I wouldn't bother trying; there's too big a capability difference between the two. However, it's nice to know 3.5 is still somewhat useful here.