Random and Randomness

Juan Chamero , juan.chamero@intag.org , lectures

Discrete Structures & Algorithms, ds_005, as of June 2006

Monte Carlo Approach to compute the Pi number



Introduction

Most nature processes are at random. From an infinite packaged order -the Big Bang- the Universe is relentlessly going to chaos, total absence of life, of movement. However randomness in lines, planes and volumes at a certain scale, seen from “upwards” may look like ordered, neat, with well defined lines surfaces and volumes. Dead tend to make all things the same. However, following a sequence of infinite “however”, we suppose that our Universe is causal and as such everything that happens in a system, either open or closed, is supposed to be “thermodynamically” deterministic, that is nature processes would be at last “pseudo random”, with all events being “at large” (at God scale perhaps?) predictable.


Pseudo Random Numbers: Randomness could be emulated by a computer program, for instance to generate (if binary) a sequence of (2^128 -1) strings of 128 bits each, all different but “at large” cycling in a predetermined sequence, from a given “seed” to go back to the same seed after (2^128-1) steps. They are predictable in the sense that given one string you may “guess” the string m steps ahead via a mathematic transformation, that is they are pseudo random generated by a virtual machines with (2^n – 1) different states S(j) at “time-steps” j’s. All those states are predictable from a given “seed” S(0) by an expression of the following type: S(j) = (T^j).S(0) where T^j is the “j power” of a unitary transformation T0 that obtain S(j+1) from S(j).


Random numbers are used for games, operations research, simulation, scientific experimenting and for the creation of crypto keys. In these cases, by obvious reasons, numbers should be unpredictable or at least extremely difficult to predict/unveil!. Ideally crypto keys should be created by True Random Numbers generators. True random numbers are those generated by nature process, for instance the position of a gel particle within a Brownian movement. Most nature processes are “entropy sources” in the sense that entropy is a state function.

Each state, for instance an ice cube, meanwhile its temperature is T has certain entropy that depends only of the temperature, no matter how the ice cube was formed. In the other extreme of complexity we may imagine a man of today with entropy that would depends on his “state” as a man alive, no matter how he achieves this state. Leaving this ice cube in an environment at a higher temperature, let’s say 20 Centigrade degrees, it will experiment a “degradation” becoming water, dying as ice.


If we isolate the ice cube and the room where the ice cube is on a table and consider both as a system we may measure that the air around the ice will frozen a little and we note that the ice cube start to melt becoming water that slides along the table wasting some energy: thermal energy transforms in kinetic energy. When a new equilibrium state is attained many things changed, no more ice cube being the most substantial change!. Globally we intuit that the order of the Universe, even though infinitesimally, has been degraded. This degradation is measured by the increase of its entropy, the more entropy the more the disorder is. In some extent entropy is also considered the “arrow of time” because as we go forward in time the entropy of an isolated system can only increase or remains the same, never decrease!.
Entropy is in this concern the clock of the Universe.


Our concern is particularly related to
Information Theory where entropy means how much randomness (or uncertainty) an event has. If we have to guess a number (or any object) out of a sequence of 16 our uncertainty “degree of ignorance” has a value of 16 if all numbers are equally probable. Now we are going to program a sort of “guessing game” with the aid of an informant that will provide us information -in the most economic way imagined- in order to reduce our ignorance gradually (we were tempted to say “bitwise”). First of all to impress the “public” we have to organize a little the scenario: the objects (letters) will be presented along a sequence in positions 0 to 15, representing pairs [position object] as depicted below:



Position

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Number

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Letter

z

%

4

a

h

w

c

*

#

1

&

5

9

@

+

=



Let’s program now the game supposing that we ignore the positional binary system. At most we were trained to know that a 0 means that something is in the lower half of a sequence and that a 1 means the contrary. To make things easier our game will deal with sets having 2^n objects, 2, 4, 8, 16, 32… 2^n, being n a natural number.


a) Someone of the public selects an object and informs our assistant either the object or the position chosen.

b) Our assistant must provide us the minimum but sufficient information to “guess” the chosen object, of course avoiding any unfair form.


Be the object chosen the number 11 (or the symbol &). Our assistant must provide us pieces of
information to reduce our “ignorance” progressively; let’s say along a binary progression of the following type: 16 => 8 => 4 => 2 => 1. If now our assistant whisper the binary sequence [1010] in our ears to inform us that the right answer is the symbol located in position 10 in decimal we may say that we passed from an uncertainty measured by 16 to the certainty, namely from 16 to 1. The same information could be obtained progressively if previously we agreed with our assistant the procedure (as discussed above) : 1 meaning that the answer is in the upper half of the uncertainty range. So the sequence [1010] will mean:

 

Step 1: [1] the number is in the upper half [9 10 11 12 13 14 15 16] being 8 the level of our uncertainty after receiving the first “bit” of information.

Step2: [0] Once received the second bit as 0, according to the agreed procedure, our uncertainty is now reduced to 4, being certain that our number will be in the lower half, among numbers [9 10 11 12];

Step 3: [1] Our informant whisper now the third bit as a 1 and we know then that our number will be one of two, namely [11 12];

Step 4: [0] finally our informant whisper the fourth bit that permit us to be certain that the right answer is 11!.

 

Shannon defined Information Entropy as a variable related to the probabilities of events, outcomes of a random process like for instance flipping a coin. If all the symbol outcomes are equally likely then increasing the number of symbols should the entropy increases.



Information - Formal definition

Claude E. Shannon in its Mathematical Theory of Communications (a reprint from The Bell System Technical Journal) defines entropy H(x) in terms of a discrete random event x, with n possible states or outcomes as:


H(x)=\sum_{i=1}^np(i)\log_2 \left(\frac{1}{p(i)}\right)=-\sum_{i=1}^np(i)\log_2 p(i).\,\!


Where p(i) are probabilities of outcomes i’s. In our last examples n1=16, p1(i) = 1/16 for i=1, 2, 3,…, 16, if all outcomes were equally probable resulting H1=4, and if n2=2, (flipping a coin) p2(0-Head)= ½ , p2(1-Tail) = ½ , giving H1=1, being H in both cases the number of bits necessary to represent precisely the uncertainty of the guessing. Effectively with four bits we may represent 16 equally different outcomes, from 0000 to 1111 and with one bit the two flipping of coin outcomes, 0 for Head and 1 for Tail.



True Random Numbers

There are many Internet services providing True Random Numbers, using natural sources like radiation, Brownian movement, and quantum mechanics effects. Below we list some of them:

1. HotBits is a site from Switzerland at FermiLab. They said textually about themselves:

HotBits is an Internet resource that brings genuine random numbers, generated by a process fundamentally governed by the inherent uncertainty in the quantum mechanical laws of nature, directly to your computer in a variety of forms. HotBits are generated by timing successive pairs of radioactive decays detected by a Geiger-Müller tube interfaced to a computer.

The HotBits generation hardware produces data at a modest rate (about 30 bytes per second). Once the random bytes are delivered, they are immediately discarded--the same data will never be sent to any other user and no records are kept of the data at this or any other site.

A really good source of entropy is a radioactive source. The points in time at which a radioactive source decays are completely unpredictable, and can be sampled and fed into a computer, avoiding any buffering mechanisms in the operating system. In fact, this is what the HotBits people at Fermilab in Switzerland are doing.

Another sources of entropy could be atmospheric noise from a radio, like that used here at random.org , or even just background noise from an office or laboratory. The lavarand people at Silicon Graphics have been clever enough to use lava lamps to generate random numbers, so their entropy source not only gives them entropy, it also looks good! The latest random number generator to come online (both lavarand and HotBits precede random.org) is Damon Hart-Davis' Java EntropyPool which gathers random bits from a variety of sources including HotBits and random.org, but also from web page hits received by the EntropyPool's web server.

2. The LavaRND people at Silicon Graphics maintain a site that provides random numbers from lava lamps (kidding!). They proudly announce that LavaRND is a cryptographically sound random number generator. At its heart, they use not lava but some small thermal noise sources like some chips easy to find in Web cams like CCD.

3. Random.org . They use atmospheric noise to generate random numbers. They proudly announce themselves as the only service which offers a large (16K) block of numbers at once. They tuned a radio into a frequency where nobody is transmitting.

The atmospheric noise picked up by the receiver is fed into a workstation through the microphone port where it is sampled by a program at a given rate and the upper seven bits of each sample are discarded. The remaining bits are turned into a stream of bits with supposedly has a high entropy value. Skew Correction is performed on the bit stream, in order to ensure that there is an approximately even distribution of 0s and 1s.

4. ID Quantique . Photons - light particles - are sent one by one onto a semi-transparent mirror and detected. The exclusive events (reflection - transmission) are associated to "0" - "1" bit values. The operation of Quantis is continuously monitored to ensure immediate detection of a failure and disabling of the random bit stream. A High bit rate of 4Mbits/sec (up to 16Mbits/sec for PCI card) could be obtained.



The Quantum random bits generation process. See its justification



.

Random Numbers Products and Manufacturers

The Marsaglia CD-ROM

One of the best sources of random numbers is the George Marsaglia CD-ROM containing 600 megabytes of random numbers. These were produced by many sources (for example rap music).

Some suspected key ideas were used on it such as “if we have two collections of all possible bytes intentionally unordered (as octets) XOR them may kill all possible remaining order”. Marsaglia used three hardware sources identified by region origin: Canada, Germany and California.


Note: Some randomness gurus have found serious failures to this generator. One source of error could be in programs that manage data because some tricky details. For instance one of these experts found that Canadian and German generators failed when tested and inspecting the series they discover the following pattern: each byte containing a 10 was preceded by a byte containing a 13!. What could have happened?. One explanation was: when you write a program to write bytes to disk, you must open it as binary. If not the program thinks you are trying to write an end-of-line and the convention on a PC is to represent this by a carriage return (13) followed by a line feed (10).


Here are some manufacturers’ web sites:

Colorado http://www.comscire.com/
Holland http://valley.interact.nl/av/com/orion/home.html
Canada http://www.tundra.com/ (find the data encryption section, then look under RBG1210. My device is an NM810 which is 2?8? RBG1210s on a PC card)
Protego from Sweden, http://www.protego.se
ID Quantique, http://www.idquantique.com/products/quantis.htm

Skew Correction

Recommended source: Request for Comments, RFC 1750


We are going to describe here a De-Skew technique, originally due to von Neumann. Suppose the probability of getting a 0 is equal to ½ plus a given error e and that all bits are independent. To eliminate skew, John von Neumann suggested to following algorithm:


• Read the bits two at a time.

• Skip 00’s and 11’s pairs.

• For 01 and 10 outputs the first bit.


Meaning to exam a bit stream as a sequence of non-overlapping pairs. We could then discard any 00 or 11 pairs found, interpret 01 as a 0 and 10 as a 1. Assume the probability of a 1 is 0.5 + e and the probability of a 0 is 0.5 - e where e is the eccentricity of the source. Then the probability of each pair is as follows:


pair

probability

00

(0.5 – e)^2 = 0.25 – e – e^2

01

(0.5 – e)*(0.5 + e) = 0.25 – e^2

10

(0.5 + e)*(0.5 - e) = 0.25 – e^2

11

(0.5 + e)^2 = 0.25 + e + e^2


So the probability of a 0 in the de-skewed sequence is the same as for a 1!. This technique will completely eliminate any bias but at the expense of taking an indeterminate number of input bits for any particular desired number of output bits. The probability of any particular pair being discarded is 0.5 + 2e^2 so the expected number of input bits to produce X output bits is X/(0.25 - e^2).



Defining Randomness

Sequences at random


Let’s consider infinite sequences of x’s pertaining to a GF2

.

Definition 1: Density D(x, n)

D(x, n) = 1/n. (x(i)), for i<n

Where the operator stands for Summa extended over i. This is like the “average” of ones. The limiting density of x’s is Lim(n)D(x, n) if that limit exists.

A true random sequence should have a limiting density ½ (though one might wonder why the limit should actually exist in the strict sense of convergence in mathematic analysis sense).

In fact, one would look for a certain degree of convergence: the D(x, n) should tend to ½

even though not too rapidly.


Definition 2:
An infinite sequence of x’s pertaining to a GF2 is Mises random if the limiting density of any subsequence (xij) is ½ where the subsequence is selected by a rule named Auswahlregel.

For example, all the following sequences should have limiting density ½.


x0, x1, x2, . . . , xn, . . .

x0, x2, x4, . . . , x2n, . . .

x1, x4, x7, . . . , x3n+1, . . .

x0, x1, x4, . . . , xn2, . . .


Auswahlregeln Rule

This rule must be defined as a function N => N without any knowledge of x: otherwise

we can simply pick a subsequence of all 0’s. Now suppose we have a countable system of Auswahlregeln and our sequence passes all these tests. In other words, we have accounted for many increasing functions f : N => N and for each of these: lim(n)D((xf(n)), n) = ½ apply. Then we can think the sequence of x’s as being true random.



Some random numbers tables



1000 numbers ranged between 1 and 10^6 - First Half

402731

355255

154123

584556

751892

903723

25205

760565

897040

213990

441393

677505

513222

477979

575753

153682

927392

462215

326299

154300

7113

538794

659094

940086

739828

164042

345218

924871

276389

877453

112344

699253

160452

245731

935663

131555

97529

453863

237865

446778

582918

659913

579583

267023

236868

776264

579878

158904

245705

314755

242246

644370

29066

55101

727869

10294

713041

664440

47941

581262

106445

963934

422305

389524

858656

312743

463920

143510

626996

75941

44558

322874

665393

370611

884583

700677

644060

841021

455419

285948

852984

651258

181086

601009

311878

348734

364168

817977

278107

638529

503425

222469

879648

542972

36080

72762

220882

229886

511935

113689

820528

65764

771115

868606

297175

886905

822500

895128

103563

82383

454158

540267

436079

20744

10822

967651

544366

804219

451866

861483

973190

671425

239898

457862

271177

301561

594764

467726

624052

513146

651157

26436

969007

730913

564395

402834

569136

716368

195069

982773

941901

502563

94654

766071

433906

695429

168568

957576

728272

148213

564964

731952

555924

755902

244080

250173

889551

96071

421689

313064

244314

167124

910755

109523

879437

869055

141570

221321

885127

792362

234977

366080

467802

234934

971705

389998

312822

502094

970228

508835

58146

707416

176557

411654

84764

988424

125248

106819

229198

302001

635963

851857

436919

875096

116619

170512

131842

205997

955607

786409

798168

616861

920186

747493

371193

89771

99944

701803

337795

321898

335517

629836

76756

32623

561271

765490

573580

903601

416345

585323

942989

541129

110975

10323

514031

491242

34238

374835

334675

627431

651885

743818

304934

663833

998674

577361

313107

399227

64378

271688

956407

831546

109956

433681

280359

728984

831667

482074

247478

748427

80184

406763

5193

530165

841365

43432

689502

968111

767587

244725

663521

171253

27660

684025

882715

759566

26623

768598

19665

484674

339912

355548

535437

673437

868915

529508

436557

12535

386206

817160

165361

204483

393140

773686

392295

201825

760986

137368

435325

743865

465678

440884

605953

584710

851201

253401

966675

200924

985525

439910

929779

823883

874702

661457

274790

748398

957099

196545

409026

273619

13936

861360

427853

193261

469031

690423

643138

682794

573367

153231

208994