Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My favorite is the parallel jobs feature of xargs. For example, say you want to run a script you wrote called process-video.sh to do some processing on all the video files in a directory (extracting audio to MP3, converting format, etc.). You want to use all 8 of your cores. You could write a Makefile and run it with -j9, or you can do this:

   find . -name "*.flv" | xargs -n 1 -P 9 ./process-video.sh
This immediately forks 9 instances of process-video.sh on the first 9 .flv files in the current directory, then starts a new instance whenever a running instance completes, so 9 instances are always in flight. (I usually set to number of cores plus one for CPU-bound tasks, hence 9 for my i7 with eight cores [1].)

If you add -print0 to the find command and -0 to the xargs command, it uses null-terminated filenames (which does the right thing when filenames contain whitespace).

[1] Logical cores. Most i7's have four physical cores which become eight logical cores through the magic of hyperthreading.



If you like xargs, but want more flexibility, I'd highly suggest GNU parallel. Such flexibility includes running jobs on multiple computers, running intensive command using all available CPU's (like xargs -P), and creating unique scripts to handle multiple parameters.

http://www.gnu.org/software/parallel/man.html


Parallel also lets you transparently run stuff on remote servers, automatically handling stuff like copying files back and forth. When I have some heavy ad-hoc data processing to do my new favorite trick is spinning up 50-100 ec2 spot instances, point GNU parallel at them and just fire and forget.


May I ask, what do you gain by setting it to +1 cores?


Even though CPU is usually the resource that limits throughput for video processing in my experience, each process will presumably do some amount of I/O as well.

If you have 8 cores running 8 jobs, then whenever one of those jobs needs to do I/O you have a core sitting idle while that job waits for the disk. If you have 8 cores running 9 jobs, then all your cores will still be fully utilized when a single job is doing I/O.

I think you actually want to do plus a small percentage, perhaps 5%, so ceil(1.05*N) jobs on N cores. That is, I have a gut feeling that 64 cores doing 65 jobs would still result in underutilization, 67-68 jobs would be better (provided the workload doesn't become I/O-limited with that much CPU power, and provided the number of videos to be converted is still much larger than the number of cores, so you don't run afoul of Amdahl's law).

These are really just rules of thumb based on my gut feelings and mental models of how the system works; it might be fun to actually benchmark it and see if my ideas correspond to reality. (You probably want to reboot first, or a bunch of unrelated I/O, to flush the disk cache.)


I definitely agree with the n+1, however when you're dealing with HyperThreading I don't know that I'd bother. The extra 'cores' provided by HyperThreading are not full cores.

The idea behind HyperThreading is basically the same as your n+1 idea. A HyperThreaded core has duplicated circuitry to handle state (registers, etc) but not the execution resources. When one thread is stalled/not using the execution resources (e.g., waiting on disk IO, waiting on a fetch from RAM, etc), the other thread is already in processor ready to go and will be executed instead.

Obviously this is a bit of a simplification, but your n+1 idea is essentially already implemented in the hardware.

That said, if enough of your cores are still stalled waiting on disk/memory access and are stalled long enough to make the context switch worth it, it may be beneficial. If not, however, you might actually end up seeing some minor slow down as the processor is forced to switch between the competing processes.

I'd benchmark it.


I'd like to get a better sense of how well hyperthreading works in practice on stuff like this. Is it really the case that my physical cores' pipelines are < 50% utilized by video processing, such that the hyperthreads are equivalent to having a whole other core? Because I'd have guessed that, IO aside, you really want slightly fewer processes than virtual cores.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: