Tuesday, 16 October 2018

A intuitive explanation for the chain rule

The chain allows us to take derivative of composition of two functions.  It has the form
$$(f\circ g)'(x)=f'(g(x))g'(x)$$

Intuitively, why should it be a product of the two function's individual derivatives?

We can answer that by looking at Taylor expansions.
If we wiggle \(x\) in \(g(x)\) we get
g(x+\delta)\approx g(x)+g'(x)\delta

That means if we wiggle \(x\) by \(\delta\), f gets wiggled by \(g'(x)\delta\).

A function \f(y)\ would change in a very similar way if we wiggle y:
f(y+\delta') \approx f(y) + f'(y)\delta'

We replace \(\delta'\) by \(g'(x)\delta\) because that is how much the "y" is perturbed if we perturb \(x\) by \(\delta\)

Overall, we would get
f(g(x+\delta)) \approx f(g(x))+f'(g(x))g'(x)\delta

Thursday, 28 December 2017

Permutations and n-dimensional rotations

The question that motivates today's post is, how do we define rotations in n-dimensions?

Let's the standard Euclidean bases as \((e_1, e_2\ ...)\).

In two dimensions, a sequence of counter-clockwise rotations takes the standard basis thru the following sequence:

$$(e_1, e_2), (e_2, -e_1), (-e_1, e_2), (-e_2, e_1)$$

So each rotation switches the basis and negates the y-axis.

Similar situation happens in 3-dimensional rotation along the three axes of rotations.

This means it make sense to think of rotations (counter-clockwise) as operations that preserve an alternating tensor product.  WLOG, we can take that alternating product to be the determinant of all the basis vectors.

But rotation through the origin also has the property that it preserves distances.  Namely an isometry.

$$\| Rx \| = \|x\|$$

We can quickly derive the following properties:
If \(x \cdot y = 0 \Rightarrow Rx \cdot Ry = 0 \).
\( \| Re_i\| = 1 \Rightarrow [R^tR]_{(i,i)} = \pm1 \)

This means that \(R^tR\) must be a diagonal matrix with \(\pm1\) on the diagonal.  Hence its determinant is also \(\pm1\).  Restricting it to +1 would make the isometry a proper rotation.  If we allow -1, it is not hard to see that it would introduce reflections into R.

This all make sense.  If we "locally rotate" two axes in n-dimensions, which means fix all other n-2 axes, and only rotate the remaining 2 in the ordinary sense, we are actually preserving the sign of the determinant.  This means that our intuitive sense of 2-dimensional rotation has been successfully extended to n-dimensions.

Monday, 13 November 2017

Two failed modifications of Paxos: towards a better understanding of the protocol

Two failed attempts at modifying Paxos.

The Paxos algorithm solves the following problem:
There are n processes.  Each process starts with a random value of 0 or 1.
Processed communicate via messages.  Messages might be lost or arbitrarily delayed without bound.
At some time in the future, all processes need to decide on one of the two value.  There must be a way to find out when the decision has been made and what value it is.  Furthermore, to prevent trivial solutions, the decision outcome must be a subset of the starting values.  (I.e., if all processes start with 0, they cannot ever decide on 1).  Once a decision is reached, it must never change.

The following algorithm solves it assuming that a majority of servers can communicate with each other.  If they cannot, it just means that a decision will not be reached (liveness), there won't be a case where a decision is reached but later changed (safety).

Each machine keeps a local counter counter and also is assigned a unique server_id offline before everything starts.
Each server tries to make proposals:
1. Send a "prepare-to-propose" message to all machines (or a majority) containing its ballot number which is the pair (counter, server_id).  Increase its counter.
2. When it receives reply from a majority, do the following:
Each reply should contain a (max_ballot_accepted, and accepted_value) but might be null.  If everyone in the majority is null, then send propose message containing the ballot (counter, server_id) with any value.  However, if not everyone in the majority replied null, then sort by max_ballot_accepted, and propose that particular accepted_value.

Each server should also listen for prepare-to-propose message and proposals.  It should also keep track of (max_ballot_accepted), max_ballot_promised, and accepted_value.
1. Upon receiving prepare-to-propose, if the ballot is smaller than max_ballot_promised, ignore, if it is greater then, update max_ballot_promised and reply with (max_ballot_accepted, and accepted_value).
2. Upon receiving a proposal, check to see if it is greater or equal to max_ballot_promised, if so, accept the proposal and update our state.

When is consensus reached.  Consensus is reached when a majority of acceptors accepted the same ballot.  We can find out when that happens if all the processes for example, send out a status update whenever a proposal is accepted.  Of course, the status update might be lost and there is no way of knowing whether the message is lost.  But we are guaranteed that the consensus value will never change once it is changed (safety).

Some intuition on why it works:
1. The prepare-to-propose serves as a barrier to ensure that any accepted ballots will be "known" by later ballots.
2. By only proposing the largest ballot from a majority, we ensure that newer proposed values will remain the consensus value if consensus have been reached.

Once some majority M has accepted a ballot, then any proposer proposing higher ballot numbers must not have sent the "prepare-to-propose" message to this majority until after the acceptance.  Otherwise the prepare-to-propose message servers as an obstacle to getting this particular ballot accepted in at least one common member.
Lower ballot numbers also suffer the same issue.
Now, after receiving reply to prepare-to-propose messsages from a majority M', M' has at least one member in common with M.  We can do a strong induction on the ballot number.  If the current ballot is n, we can assume all proposed ballots between the original accepted ballot and n-1 (inclusive) all propose the same consensus value.  The common member between M' and M would have replied a ballot number at least as great as the original accepted ballot number.  The proposer must propose the value corresponding to the largest ballot number, hence it must also propose the consensus value.

To help us understand better, let's ask whether we can modify the protocol in two ways.
1. Instead of sending out a "prepare-to-propose" message which extracts a promise not to accept lower numbers, can we simply query for proposed values instead of asking for a promise.  We will block out lower ballots at the acceptance step.
The issue is that ballots with the same number might override the consensus value.
We could reach a state each row represents one process, their accepted ballot and value)
1, v1
1, v1
1, v1
2, v2
Consider the following sequence of messages (p1 stands for process one).
p1.PrepareToPropose(ballot 1).SendTo(1, 2, 3)
p2.PrepareToPropose(ballot 2).SendTo(1, 2, 3)
p1.Received.PrepareResponse(null, null, null).Propose(1, v1).SendTo(1, 2, 3)
(p1, p2, p3).Accept(ballot 1, v1)
p2.Received.PrepareResponse(null, null, null).Propose(2, v2).SendTo(4) // messages to other members of the majority are lost
p4.Accept(ballot 2, v2)

This is happening because p2 does not know that ballot 1 is already in progress.

2. Is it possible to declare that consensus is reached as soon as a majority has picked the same value?  Right now, we require majority to pick the same ballot.
No.  After majority picked the same value, the consensus value might not remain the same.
We again describe accepted ballot, accepted number, put also show the internal "promise" counter.
To simplify we only show global state transitions.  The user can fill in the messages and lost messages.

1: v1 (1)
1: v1 (1)
2: v2 (2)
2: v2 (2)
null (2) (proposal is lost, but prepare was received)

1: v1 (3) // proposal three only reaches the fifth process
1: v1 (3)
2: v2 (2)
2: v2 (2)
3: v1 (3) // here, we are wrong to think that consensus has been reached

Proposer now prepares and gets response from p1, p2, p3.  It will now propose v2.

4: v2 (4)
4: v2 (4)
4: v2 (4)
2: v2 (2)
3: v1 (3) // consensus value changed

Sunday, 18 September 2016

Reducible state space Markov chains

While reviewing a paper last night, I started thinking about reducible state space Markov chains.  Most undergraduate probability students are fed theorems about irreducible Markov chains.  To summarize these theorems.

  • If two states communicate, then they are either both recurrent or both transient.
  • If a MC is irreducible and all states are positive recurrent, then there exists a unique stationary distribution.
  • If a MC is irreducible, aperiodic, and has a stationary distribution, then it converges to that.

We can ask a set of questions about reducible Markov chains to test our intuition for the subject.
  • Let's setup a trivial 4 state Markov chain where the first two and last two states form separate communicating classes.
  • How many stationary distributions exist?
  • Do limiting distributions exist?
  • Does the limit distribution depend on the initial distribution?  How many possibilities are there?

Let's try to answer them.
  • There are an infinite number of stationary distributions.  If you take the stationary distribution of the first class and the stationary distribution of the second class, then any linear combination (equal to 1) of these two would be a stationary distribution would be a stat. distribution of the full 4-state Markov chain.
  • Limiting distributions do exist.  Let's think about the intuition here.  If the total prob. mass on the first two states is m_0 and the rest is 1-m_0, then starting from this initial distribution, each iteration of the chain would "stir" this vector but only stir within their respective classes.  So the first two states of this vector would converge to its limit distribution and same with the last two.  This means the overall limit distribution would be a mixture of the individual limit distributions weighted by the (m_0, 1-m_0) vector.

Wednesday, 19 December 2012

Course Catalog: Classical Composition

Description: This is a course on classical music composition that will teach the principles of melody, harmony, and orchestration.  However, this course does not teach you to form you musical ideas.  That comes from your talent.  If you don't have it, you can become an orchestrator.  The distinguished alumni of this course usually breaks with the principle we teach them and write great music nevertheless.  (Shostakovich, Debussy)

Course Credit: None

Pre-Requisites: No technical pre-requisite.  However, one should try to live a life of calamity, solitude, and mental instability through the achievement of at least one of the following:

  1. Unfulfilling / unrequited love for a string of women who are outside of your social class, for example members of the aristocracy or the royals.  See Beethoven.
  2. If you are homosexual, the unfortunate need to oppress your orientation due to social pressures of your society.  See Tchaikovsky.
  3. Have a high infant mortality rate.  See Bach.
  4. Losing of either vision or hearing.  (See Beethoven or Rodrigo)
  5. Live under a repressive Communist dictatorship (Shostakovich)
  6. Lastly, having your wife engaging in an affair with one of your best friend.  (Schumann)

Tuesday, 18 December 2012

AI-assisted political science

In the wake of the Sandy Hook shooting, there has been much debate on the radio about the gun policy in the United States.  The gun advocates stress that banning ownership of guns will not take it out the hands of criminals and decrease violent crime.  Both sides of the debate totes different statistics in order to support their own side.  During the presidential debates, there were numerous occasions where numbers were presented and its truthfulness contested by Obama and Romney.  One instance that comes to mind was when the factual accuracy of Romney's tax policy for the middle class was in dispute.

I think this is one area where information retrieval / data science /weak AI (doesn't mean crappy AI, but just to distinguish it from the well-defined Strong AI term) can really contribute to the public sphere.  Imagine a Google or Wolfram Alpha type program that parses the content of a debate and automatically checks for factual accuracy: gun crime rate, economic data, demographics, tax rates, tax codes etc, perhaps via an overlay on the screen.  This really would sway public discussions from "what are the facts" vs "what is the right policy".  It also prevents cheating and mis-representation of facts.

To push this idea further, imagine an AI-assisted data retrieval / analytics program for the common masses.  To take an example, I wanted to draw a simple correlation graph of gun ownership rates by country vs homicide rate by country.  Just to answer this one simple question, I had to manually enter / merge two separate tables found on Wikipedia, and then filter the data for missing entries using MATLAB, and perform a plot.  This set of computer skill, in the short future, is beyond the common knowledge of the average citizen, yet they need these types of questions answered concretely in order to vote on the right candidate / policy for this country.  How about a more advanced search  / analytics that would, given a question in natural language format, e.g. "Has the crime rate in NYC decreased after 1990?", it would automatically retrieve & clean the relevant data and perform the relevant statistical analysis.  (Regression, classification, etc.)  Currently Google / Bing can answer factual questions, but they are still one step away from answering analytical questions.

Sunday, 25 November 2012

Richard Dawkins on the evolutionary basis of morality

I was just watching a Richard Dawkins lecture at Randolph-Macon Women's College on YouTube.  He was asked a question about the basis of atheistic morality.  Dawkins proposed that our moral sense comes from the fact that our pre-historic ancestors lived mostly with next-of-kins hence 1) altruism will benefit the preservation of similar genes 2) there's a high probability of being in long-term contact with these individuals who can then reciprocate the good will.
I would propose that altruism serves to improve upon another evolutionary survival objective, the preservation of the species.  When pre-historic human beings lived in tribes, there was a high risk of destruction from other animals, diseases, and natural elements.  Hence at some point, competition to extinguish a rival member of the same species is out-weighed by the risk that the size of the species might fall below a critical threshold and threaten extinction.  For example, you don't want to kill Adam for stealing your girlfriend because he and you can work together when the tigers attack tomorrow night.
If we think about tribes and animal social organizations as entire units, then that unit would be evolving to maximize its survival as a whole.  If a society randomly generated a moral sense that does not include any altruism in an "attempt" to maximize individual benefits, that group as a whole (and consequently the individual) might have a lower probability of survival because the survival benefits that are derived from the group would be lost (i.e. strength in numbers).  In another words, when the survival of the individual is linked to a certain critical mass and critical health of his immediate social organization unit, this species would likely evolve some moral sense due to the pressure of natural selection even though this moral sense, on the surface, appears to be anti-beneficial to the individual upon a first-order examination.