I need to generate random numbers between 100 and 500 where the mean is 150. I’ve been reading on distribution curves and probability density functions but can’t find a concrete solution to this question. Any help is appreciated.
Advertisement
Answer
I can think of two possible approaches you might take. One would be to take something like a normal distribution but that might lead to values outside your range. You could discard those samples and try again, but doing so would alter your mean. So this is probably the more complicated approach.
The other alternative, the one I would actually suggest, is to start with a uniform distribution in the range 0 to 1. Then transform that number so the result has the properties you want.
There are many such transformations you could employ. In the absence of any strong rationale for something else, I would probably go for some formula such as
y = pow(x, a) * b + c
In that formula x
would be uniformly distributed in the [0, 1] range, while y
should have the bounds and mean you want, once the three parameters have been tuned correctly.
Using b=400
and c=100
you can match the endpoints quite easily, because a number from the [0, 1] range raised to any power will again be a number from that range. So now all you need is determine a
. Reversing the effect of b
and c
you want pow(x, a)
to have an mean of (150 – c) / b = 1/8 = 0.125.
To compute the mean (or expected value) in a discrete distribution, you multiply each value with its probability and sum them up. In the case of a continuous distribution that becomes the integral over value times probability density function. The density function of our uniform distribution is 1 in the interval and 0 elsewhere. So we just need to integrate pow(x, a)
from 0 to 1. The result of that is 1 / (a + 1) so you get
1 / (a + 1) = 1 / 8 a + 1 = 8 a = 7
So taking it all together I’d suggest
return Math.pow(random.nextDouble(), 7) * 400 + 100
If you want to get a feeling for this distribution, you can consider
x = pow((y - c) / b, 1 / a)
to be the cumulative distribution function. The density would be the derivative of that with respect to y
. Ask a computer algebra system and you get some ugly formula. You might as well ask directly for a plot, e.g. on Wolfram Alpha.
That probability density is actually infinite at 100, and then drops quickly for larger values. So one thing you don’t get from this approach is a density maximum at 150. If you had wanted that, you’d need a different approach but getting both density maximum and expected value at 150 feels really tricky.
One more thing to consider would be reversing the orientation. If you start with b=-400
and c=500
you get a=1/7
. That’s a different distribution, with different properties but the same bounds and mean. If I find the time I’ll try to plot a comparison for both of these.