import numpy as np
from scipy.stats import binomtest
= 10000
repetitions
def single_simulation(n, m, k):
"""Compute the sum modulo k of m elements in [1, n]."""
# Check input
assert n % k == 0, f"Expected n % k == 0 but got {n % k} with {n=} and {k=}"
assert m <= n, f"Excpected m <= n but got {m=} and {n=}"
1= np.random.choice(1 + np.arange(n), size=(m,), replace=False)
picks = np.sum(picks) % k
result
return result
def compute_success_probability(n, m, k, repetitions=10000, confidence_level=0.95):
"""Compute a Monte-Carlo approximation of the probability of success."""
= 0
num_success for idx in range(repetitions):
+= single_simulation(n, m, k) == 0
num_success
= num_success / repetitions
probability
2= binomtest(k=num_success, n=repetitions).proportion_ci(confidence_level=confidence_level)
result = result.low
low = result.high
high
return probability, low, high
Investigating a conjecture from Stack Overflow 1/2
A user on Stack Overflow proposes an interesting problem featuring modulos (and thus arithmetic) and sampling without replacement. I attempt to understand it better by identifying easy cases and performing some simulations.
1 Problem statement
There are \(n\) cards which have numbers \(1\)~\(n\) on each. You pick \(m\) cards from it, and you don’t put it back once you pick from it. Is the probability that their sum is divisible by \(k\) always \(\dfrac 1 k\), while \(k|n\)? If not, how do we generalize the probability?
The key features of this problem are:
- picking without replacement \(X_i \in [1, n]\).
- \(k\) possible values for the output of \(\sum X_i \bmod k\).
We would like to show that a single value has probability \(1/k\). This hints at the distribution being uniform over the \(k\) possibilities, but maybe something more complex is going on.
To investigate such questions, unless I happen to glance exactly the right analysis straight away, my process is to:
- look for easy cases
- perform simulations and try to identify patterns.
2 Simulation code
The situation is straightforward to implement in python using numpy.
3 A simple counter-example
There are \(n \choose m\) possibilities for the selection of the cards. These possibilities are equally likely. Thus, the final probability must be some integer multiple of \(\left[ n \choose m \right]^{-1}\). Thus, if \(n \choose m\) is not divisible by \(k\), then the conjecture is false.
For example, for the triplet \(n=6, m=3, k=3\), we have \({n \choose m} = 20\) and the conjecture must be false.
This is confirmed by a simulation:
from scipy.special import comb
= 6
n = 3
m = 3
k = compute_success_probability(n=n, m=m, k=k)
probability, low, high
print(f"Number of combinations: {comb(n, m) = }")
print(f"Number of combinations makes conjecture impossible: {comb(n, m) % k = }")
print(f"Estimated probability: {np.round(probability, 3)} (0.95 interval: {np.round(low,3), np.round(high,3)})")
print(f"Conjectured probability: {np.round(1/k, 3)}")
Number of combinations: comb(n, m) = 20.0
Number of combinations makes conjecture impossible: comb(n, m) % k = 2.0
Estimated probability: 0.396 (0.95 interval: (0.386, 0.405))
Conjectured probability: 0.333
4 The simple cases: \(m=1\), \(m=n\), \(m=n-1\)
Some cases are immediate.
If \(m=1\), then the result is a uniform variable with \(k\) possibilities. The conjecture is thus immediately true. [^If the sampling was with replacement instead of without, then the result would be immediately true due to this argument.]
If \(m=n\), there is no randomness: we just select all cards. The final result depends on the parity of \(k\).
- If \(k\) is odd, then the terms cancel out in the sum by pairing them: \(1\) with \(k-1\), etc. The final sum is thus always 0.
- If \(k\) is even, then we again pair the terms, but the term \(k/2\) is left standing alone. The final sum is thus:
\[ ((n / k) \bmod 2) * k / 2 \]
If \(m=n-1\), we can represent this situation as selecting a single card to exclude from the total sum. Thus, by the same argument as with \(m=1\), we know that the probability is \(1/k\).
5 An exhaustive exploration
We can also run a simple exhaustive experiment:
- test all values of \(k\) in \([2,7]\),
- test \(n\) in \(k * [1, 5]\),
- test \(m\) in \([2, n-2]\) (excluding the easy cases discussed above),
- if \({n \choose m} \bmod k = 0\) but the probability is different (which we measure using an exact 0.99 confidence interval) from \(1/k\), print out the details of the results.
Note that we expect there to be roughly 1 percent of false positives.
from IPython.display import Markdown
from tabulate import tabulate
= []
table for k in range(2, 7 + 1):
for n in range(k, k * 5 + 1, k):
for m in range(0, n+1):
if m in [0, 1, n-1, n]:
continue
= compute_success_probability(n=n,m=m,k=k,confidence_level=0.99)
p, l, h = int(comb(n, m))
c = 1/k
conjecture = l <= conjecture <= h
conjecture_validated if c % k == 0 and not conjecture_validated:
= [
line
(n, m, k),round(conjecture,4),
np.round(l,4), np.round(h,4)),
(np.
]
table.append(line)
Markdown(
tabulate(
table, =[
headers"(n, m, k)",
"Conjectured proba $1/k$",
"Probability 0.99 confidence interval"
],
) )
(n, m, k) | Conjectured proba \(1/k\) | Probability 0.99 confidence interval |
---|---|---|
(4, 2, 2) | 0.5 | (0.3157, 0.34) |
(8, 2, 2) | 0.5 | (0.4168, 0.4424) |
(8, 4, 2) | 0.5 | (0.5302, 0.556) |
(8, 6, 2) | 0.5 | (0.4187, 0.4443) |
(10, 4, 2) | 0.5 | (0.5036, 0.5294) |
(10, 6, 2) | 0.5 | (0.4573, 0.4831) |
(9, 3, 3) | 0.3333 | (0.3438, 0.3686) |
(9, 6, 3) | 0.3333 | (0.3433, 0.368) |
(8, 2, 4) | 0.25 | (0.2073, 0.2286) |
(8, 6, 4) | 0.25 | (0.2068, 0.2281) |
(12, 6, 4) | 0.25 | (0.2272, 0.2492) |
(12, 9, 4) | 0.25 | (0.2522, 0.275) |
(16, 2, 4) | 0.25 | (0.222, 0.2439) |
(16, 14, 4) | 0.25 | (0.222, 0.2439) |
(12, 2, 6) | 0.1667 | (0.1457, 0.1644) |
(12, 7, 6) | 0.1667 | (0.1442, 0.1629) |
(12, 10, 6) | 0.1667 | (0.1421, 0.1606) |
We find many examples where the \({n \choose m} \bmod k = 0\) conjecture is not true. For some of these, it is straightforward to prove that the probability is not \(1/k\).
For example, consider the triplet \(n=4, m=2, k=2\).
- The winning pairs are (2, 4) and (1, 3).
- There are 6 total pairs.
- The true probability is thus \(1/3\).
Other cases with \(m=2\) are similarly easy to analyze.
6 Mathew Spam improvement
In of the answers, Matthew Spam proposes the following observation:
If \(\operatorname{gcd}(m,n)=1\) then the probability is equal to \(\frac{1}{k}\).
and offers a proof: there is a symmetry that enables us to show that the distribution of \(\sum X_i \bmod k\) is uniform over \([1,k]\).
This is straightforward to extend to the case \(\operatorname{gcd}(m, k) = 1\).
In view of this, let’s rerun the analysis focusing only on cases where \(\operatorname{gcd}(m,k) \neq 1\).
= []
table for k in range(2, 7 + 1):
for n in range(k, k * 5 + 1, k):
for m in range(1, (n-1)+1):
= np.gcd(m, k)
divisor if divisor == 1:
continue
= compute_success_probability(n=n,m=m,k=k,confidence_level=0.99)
p, l, h = int(comb(n, m))
c = 1/k
conjecture = l <= conjecture <= h
conjecture_validated if not conjecture_validated:
= [
line
(n, m, k),
divisor,round(conjecture,4),
np.round(l,4), np.round(h,4)),
(np.
]
table.append(line)
Markdown(
tabulate(
table, =[
headers"(n, m, k)",
"gcd(m, k)",
"Conjectured proba $1/k$",
"Probability 0.99 confidence interval"
],
) )
(n, m, k) | gcd(m, k) | Conjectured proba \(1/k\) | Probability 0.99 confidence interval |
---|---|---|---|
(4, 2, 2) | 2 | 0.5 | (0.3133, 0.3375) |
(6, 2, 2) | 2 | 0.5 | (0.3941, 0.4195) |
(6, 4, 2) | 2 | 0.5 | (0.5812, 0.6066) |
(8, 2, 2) | 2 | 0.5 | (0.4154, 0.441) |
(8, 4, 2) | 2 | 0.5 | (0.5287, 0.5545) |
(8, 6, 2) | 2 | 0.5 | (0.4125, 0.438) |
(10, 2, 2) | 2 | 0.5 | (0.438, 0.4637) |
(10, 4, 2) | 2 | 0.5 | (0.5112, 0.537) |
(10, 6, 2) | 2 | 0.5 | (0.4639, 0.4897) |
(10, 8, 2) | 2 | 0.5 | (0.5448, 0.5705) |
(6, 3, 3) | 3 | 0.3333 | (0.3866, 0.4119) |
(9, 3, 3) | 3 | 0.3333 | (0.3414, 0.3661) |
(9, 6, 3) | 3 | 0.3333 | (0.3335, 0.3581) |
(12, 9, 3) | 3 | 0.3333 | (0.3385, 0.3632) |
(15, 12, 3) | 3 | 0.3333 | (0.3346, 0.3592) |
(4, 2, 4) | 2 | 0.25 | (0.1572, 0.1765) |
(8, 2, 4) | 2 | 0.25 | (0.2069, 0.2282) |
(8, 4, 4) | 4 | 0.25 | (0.2506, 0.2734) |
(8, 6, 4) | 2 | 0.25 | (0.2032, 0.2244) |
(12, 2, 4) | 2 | 0.25 | (0.2123, 0.2338) |
(12, 10, 4) | 2 | 0.25 | (0.2057, 0.227) |
(16, 2, 4) | 2 | 0.25 | (0.2225, 0.2444) |
(16, 8, 4) | 4 | 0.25 | (0.2504, 0.2732) |
(16, 14, 4) | 2 | 0.25 | (0.2268, 0.2488) |
(20, 2, 4) | 2 | 0.25 | (0.226, 0.248) |
(20, 16, 4) | 4 | 0.25 | (0.2509, 0.2737) |
(20, 18, 4) | 2 | 0.25 | (0.2205, 0.2423) |
(6, 2, 6) | 2 | 0.1667 | (0.1274, 0.1452) |
(6, 3, 6) | 3 | 0.1667 | (0.1846, 0.2051) |
(6, 4, 6) | 2 | 0.1667 | (0.1895, 0.2102) |
(12, 2, 6) | 2 | 0.1667 | (0.1444, 0.1631) |
(12, 8, 6) | 2 | 0.1667 | (0.1693, 0.1892) |
(12, 10, 6) | 2 | 0.1667 | (0.1383, 0.1566) |
(24, 22, 6) | 2 | 0.1667 | (0.1471, 0.1659) |
In these results, we observe that, when \(\operatorname{gcd}(m, k) \neq 1\):
- We often have that the probability does not match \(1/k\).
- The pattern is less apparent when \(n\) grows, but it could be due to the missmatch being smaller.
7 Conclusions
The conjecture is false, as is: we identified several counter-examples.
If \({n \choose m} \bmod k != 0\) then the conjecture is clearly wrong.
There are cases where \({n \choose m} \bmod k = 0\) but the conjecture is wrong. For example: \(n=4, m=2, k=2\) where the probability is provably \(1/3\) instead of \(1/2\).
If \(\operatorname{gcd}(m, k)=1\) the conjecture is provably true.
From a simulation study, it seems possible that when \(\operatorname{gcd}(m, k) \neq 1\), the probability is always different from \(1/k\).
These final observations has put me on the right track, and I am now able to move on to rigorous mathematical analysis. I will do so in a second post to keep things organized.