A single uncompressed frame of 1080p video occupies 28MB in RAM, so 1 minute of 24fps video will take up 40GB. If you want to be able to run 4 cores at once it's 3 times that. You won't be doing that any time soon on your laptop or smartphone.
I forgot where I got 28 from but it's indeed a mistake. For normal display you could get away with 1920 * 1080 * 8bit = 6MB. For a 10bit display it would be around 8MB. You do indeed often use 32bit float for high-quality processing but since what we're storing here is the output frame you would finish all that processing and then go down to 8 or 10bit per channel. So recalculating the math that's 8GB for 1 minute of video, still way too impractical.
I think the grandparent post is talking about decoding to RGB with a full 32-bit float per channel, which is 12 bytes per pixel rather than 8. The high precision is needed for HDR and for the extra processing you have to do to the pixels after they're decodeed - motion compensation, gamma correction, etc.
The maximum number of references frames, i.e. how much the Decoded Picture Buffer has to hold, is 16. So even if a GOP is 1 minute long you would have to hold at most 16 pictures in memory to have enough information to stream over that 1-minute segment.
So I still do not see how this would prohibit parallel processing.
Not sure how that would work. You have a thread that's decoding the frames 1 minute in front of where playback is, so if you're not decoding full frames and storing them until you need to display them what is that thread doing?
transcoding or video editing in slices is a common application.
you cut the video into a handful of parts at keyframes, process the parts individually in a streaming manner and then splice the partial results together.
If we're talking about playback then creating seeking-thumbnails could similarly benefit from parallel processing.