That's not true unless the distribution is uniform. If it were always true, then compression would not work. Compression, in a simple case, often compresses data stored as byte sized chunks into smaller space, precisely because those 8 bit integers do not have 8 bits of entropy each.
Hmm, how many bits has the space of all positive integers?
That's not what the (clumsily written) article is about. It's not about sampling the integers to choose an integer, it's about sampling the prime factors of an integer, as a measure of evenness of distribution of prime factors.
It's measuring the information in the prime factorization, the information in the number as a value.
Having ripcorded out after realizing the author was trying to prove that water was wet, I'll assume that it's "normalized entropy", in a range of 0-1, indicative of the distribution across the space.
The comments section on the author's book 'How to not be wrong' is one of the best things I have read in ages. I am so glad the author left it public. Imagine releasing a book called 'How to not be wrong' and you have like 200 people telling you that you are wrong. Posting in the comments section of your minimal personal blog.
like, the part where they get a_i log p_i ,
well, the sum of this over i is gives the number,
but it seemed like they were treating this as… a_i being a random variable associated to p_i , or something? I wasn’t really clear on what they were doing with that.
Take an $n$, chosen from $[N,2N]$. Take it's prime factorization $n = \prod_{j=1}^{k} q_j^{a_j}$. Take the logarithm $\log(n) = \sum_{j=1}^{k} a_j \log(q_j)$.
Divide by $\log(n)$ to get the sum equal to $1$ and then define a weight term $w _ j = a_j \log(q_j)/\log(n)$.
Think of $w_j$ as "probabilities". We can define an entropy of sorts as $H_{factor}(n) = - \sum_j w_j \log(w_j)$.
Heuristics (such as Poisson-Dirichlet) suggest this converges to 1 as $N \to \infty$.
OpenAI tells me that the reason this might be interesting is that it's giving information on whether a typical integer is built from one, or a few, dominant prime(s) or many smaller ones. A mean entropy of 1 is saying (apparently) that there is a dominant prime factor but not an overwhelming one. (I guess) a mean to 0 means dominant prime, mean to infinity means many small factors (?) and oscillations mean no stable structure.
Around 2001 I was working at Broadcom's networking division in San Jose. The switch chip we were working on (10Gbps x 8 ports) was understaffed and we hired a contractor at $120/hr to do verification of the design. He was pretty young, but he came across as confident and capable, and every weekly meeting he was reporting good progress.
Unfortunately we weren't reviewing his work, just trusting his reports, as we were overworked getting our own parts done. After a three months of this I said to the project lead: something smells wrong, because he hasn't filed a single bug against my design yet.
So we looked at his code, lots of files and lots of code written, all of it plumbing and test case generation, but he hadn't built the model of the chip's behavior. At the heart of it was a function which was something like:
bool verify_pins(...) {
return true;
}
We asked him what was going on, and he said he was in over his head and had been putting off the hard part. Every morning was lying to himself that that was the day we was going to finally start tackling building the model for the DUT. His shame seemed genuine. My boss said: we aren't paying you for the last pay period, just go away and we won't sue you.
My boss and I literally slept at work for a month, with my boss building the model and fixing other TB bugs, and I addressed the RTL bugs in the DUT as he found them.
Here's an intuitive description of the entropy, [log(log(n)) -sum(log(p_i) log(log(p_i)))]:
The entropy of a random integer N is the volume of the gap between how much space N takes up and how much space its internal components take up.
This can be visualized as The City of N, in base 2. (OP used log_e, but that's too hard to draw.)
1. The Foundation (The Factors)
Take a random number N and break it into its prime factors. We write these prime factors in binary, side-by-side, along the bottom of a page.
The total width of this baseline is roughly log_2(N)(the number of bits in N).
2. The Cloud Ceiling (The Potential)
We write down the length of (N written down in base 2) in base 2. (If N = 46 = 101110 (base 2), its length is ~6 = 110 (base 2),
We write that number vertically (110) to set the Maximum Ceiling Height.
Finally, we look at the number N itself.
3. The Buildings (The Structure):
Above each prime factor, we construct a building.
* The Width: The width of the building is simply the length of that prime factor in bits.
* The Height: To determine how tall the building is, we look at its width and write that number down vertically in binary.
To normalize, we zoom our camera so the length (log) of N fills the view.
The Entropy (The Visible Sky):
The Sky: This is the empty space between the tops of the buildings and the top of the picture (cloud ceiling).
The Entropy of N is exactly the total area of the visible sky.
If N is prime, the building is as wide and tall as the whole city and touches the cloud ceiling. No Sky. Zero Entropy.
If N is a random integer, it usually has one wide building (the largest prime) that is almost as tall as the ceiling, and a few tiny huts (small primes) that leave a massive gap of blue sky above them.
Here is the visualization for N = 46. (Binary 101110, length ~6).
(Visualization not exact due to rounding of logarithms, and because)
Interpretation:
Building 23 is tall. It reaches Level 3 (101 is length 5). It touches the ceiling (Level 3). There is zero sky above it.
Building 2 is short. It only reaches Level 2 (10 is length 2). There is one unit of sky visible above it.
Total Entropy: The total empty area above the buildings is small (just that gap above factor 2), which matches the math: 46 is "low entropy" because it is dominated by the large factor 23.
A number with High Entropy would look like a row of low, equal-height huts, leaving a massive amount of open sky above the entire city.
In John C. Baez "What is Entropy?" (A 122 page PDF best suited for reading on airplanes without wifi and in flight entertainment), he states:
> It’s easy to wax poetic about entropy, but what is it? I claim it’s the amount of information we don’t know about a situation, which in principle we could learn.
The entropy of a random integer being 1 makes intrinsic sense to me, given I didn't spend years in theoretical math classes