Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I got interested in this so whipped up a notebook for my interpretation of the rules against random sequences of different lengths. Failure rates are the number of sequences that contained an error, not the number of errors in all the sequences (so a sequence that had 3 errors counts as one failure).

A couple of rule descriptions were ambiguous, rules 5 & 6 (at least to me).

I'll upload a PDF output when I get latex installed again (downloading the html file is probably the easiest way of quickly seeing the output)

Edit - updated.

PDF:

http://files.figshare.com/2196604/Analysis_of_the_false_posi...

PDF/HTML/notebook

http://dx.doi.org/10.6084/m9.figshare.1499204



Damn, missed the edit window. Fixed rule #5

http://files.figshare.com/2196656/Analysis_of_the_false_posi...

Same DOI, new version.


I'm probably missing something obvious by why random sequences when the Nelson rules appear to be aimed at measurements of real world properties?


Well the assumption here is that we've got something we're measuring with a steady mean and either inherently noisy variation or some measurement error on top.

Lots of real-world samples follow the normal distribution, and anything that does should look roughly like that sim.

So there's no real need to use random numbers, but it's a very quick way of me getting data that looks like real data and I know I've got the standard deviation & mean correct and that there should be no anomalies.

My sim can only show one side of the story though, it can't show how often real issues are picked up. For that, we'd probably want to look at real-world data and investigate each reported issue to see what proportion are important (and then possibly try and see how many were missed).


A small random variance or a bunch of uncorrelated errors ought to produce something very similar to a normal distribution, which we can model with random generation.

The Nelson rules are basically an attempt to determine whether relatively 'healthy' data is actually coming from some specific forcing events (e.g. oscillation instead of random variance), so it looks for breaks with normal distributions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: