The setup was essentially an app (c++ engine on osx running models), ingesting camera feeds, outputting skeletons, running sound->mouth poses, to unity which puppeteered 2D sprites.
On-set we output to a monitor which mixed the real shots with our graphics.
Nothing crazy! But it was a solid setup (except for overheating DSLRs)