patternMinor
How Does Populating Pastry's Routing Table Work?
Viewed 0 times
pastrypopulatingworkdoeshowroutingtable
Problem
I'm trying to implement the Pastry Distributed Hash Table, but some things are escaping my understanding. I was hoping someone could clarify.
Disclaimer: I'm not a computer science student. I've taken precisely two computer science courses in my life, and neither dealt with anything remotely complex. I've worked with software for years, so I feel I'm up to the implementation task, if I could just wrap my head around the ideas. So I may just be missing something obvious.
I've read the paper that the authors published [1], and I've made some good progress, but I keep getting hung up on this one particular point in how the routing table works:
The paper claims that
A node’s routing table, $R$, is organized into $\lceil \log_{2^b} N\rceil$
rows with $2^b - 1$ entries each. The $2^b - 1$ entries at
row $n$ of the routing table each refer to a node whose nodeId shares
the present node’s nodeId in the first n digits, but whose $n + 1$th
digit has one of the $2^b - 1$ possible values other than the $n + 1$th
digit in the present node’s id.
The $b$ stands for an application-specific variable, usually $4$. Let's use $b=4$, for simplicity's sake. So the above is
A node’s routing table, $R$, is organized into $\lceil \log_{16} N\rceil$ rows
with $15$ entries each. The $15$ entries at
row $n$ of the routing table each refer to a node whose nodeId shares
the present node’s nodeId in the first n digits, but whose $n + 1$th
digit has one of the $2^b - 1$ possible values other than the $n + 1$th
digit in the present node’s id.
I understand that much. Further, $N$ is the number of servers in the cluster. I get that, too.
My question is, if the row an entry is placed into depends on the shared length of the key, why the seemingly random limit on the number of rows? Each nodeId has 32 digits, when $b=4$ (128 bit nodeIds divided into digits of b bits). So what happens when $N$ gets high enough that $\lceil\log_{16} N\rceil > 32$? I realise it would tak
Disclaimer: I'm not a computer science student. I've taken precisely two computer science courses in my life, and neither dealt with anything remotely complex. I've worked with software for years, so I feel I'm up to the implementation task, if I could just wrap my head around the ideas. So I may just be missing something obvious.
I've read the paper that the authors published [1], and I've made some good progress, but I keep getting hung up on this one particular point in how the routing table works:
The paper claims that
A node’s routing table, $R$, is organized into $\lceil \log_{2^b} N\rceil$
rows with $2^b - 1$ entries each. The $2^b - 1$ entries at
row $n$ of the routing table each refer to a node whose nodeId shares
the present node’s nodeId in the first n digits, but whose $n + 1$th
digit has one of the $2^b - 1$ possible values other than the $n + 1$th
digit in the present node’s id.
The $b$ stands for an application-specific variable, usually $4$. Let's use $b=4$, for simplicity's sake. So the above is
A node’s routing table, $R$, is organized into $\lceil \log_{16} N\rceil$ rows
with $15$ entries each. The $15$ entries at
row $n$ of the routing table each refer to a node whose nodeId shares
the present node’s nodeId in the first n digits, but whose $n + 1$th
digit has one of the $2^b - 1$ possible values other than the $n + 1$th
digit in the present node’s id.
I understand that much. Further, $N$ is the number of servers in the cluster. I get that, too.
My question is, if the row an entry is placed into depends on the shared length of the key, why the seemingly random limit on the number of rows? Each nodeId has 32 digits, when $b=4$ (128 bit nodeIds divided into digits of b bits). So what happens when $N$ gets high enough that $\lceil\log_{16} N\rceil > 32$? I realise it would tak
Solution
The idea of a routing table in Pastry (and all structured P2P networks) is to minimize its size, while guaranteeing a quicker routing.
The routing algorithm of Pastry goes as follows:
Step A. A node u searches for an object A by firstly looking it up in its leaf set.
Step B. If it was not available, then the query is forwarded to a known node that shares "a number of prefixes with $A$ that is at least larger than what node u shares with A".
Step C. If such a record is not found, then the query is forwarded to a node in the leaf set that is numerically closest to $A$.
This is why a node $u$ stores addresses of nodes organizes its table as follows:
1.Each record in row $i$ of the routing table of node $u$ is a node identifier that shares $i$ prefix bits with the identifier of $u$.
2.The $(i + 1)^{th}$ bit of the records of a row $i$ is unique and is taken from the set $\{0, …, 2^{b} – 1\}$.
Example in a typical scenario:
if u address is 1111 and object $A$ has identifier 4324: then here is what will happen: (we assume that it is of the base of 4. (i.e. addresses are from [1-4][1-4][1-4][1-4]).
Node $u$ shares 0 prefix with object $A$. Therefore, it looks in row 0.
According to rule 2 above, node $u$ stores addresses of nodes 1XXX, 2XXX, 3XXX, 4XXX, where X is a "dont-care" value. The closest among these nodes to $A$ is 4XXX. - Let's say this 4XXX is actually 4013. Then $u$ forwards to $u _1$ with address 4013. Now you are going to repeat the same thing again at node $u _1$ with address 4013.
To make it simpler, here is again an example of how it will go in 4013. $u _1$ will first look for the size common prefix between 4013 and 4324 which is 1. So it goes to row 1, which contain values such that 41XX, 42XX, 43XX, 44XX. The closes among them to $A$ is 43XX. - if this was 4331 then it will be forwards toward it.
The maximum number of hops here in is 4 hops (XXXX) ! in Pastry terms, it is $log_{2^b}$. So it is reduced as $b$ increases. But the size of the rows which are $2^b$ will increases ! -- so the authors said that $b$ = 4 is a good balance !
Practical scenarios are usually not typical as that. There may be situations in which there are not many nodes in the network. this is why we follow step C above. - However, what you need to guarantee to make this algorithm correct is that each node be connected to the closest two nodes to it (in term of identifiers). This will form a ring of ordered nodes [e.g. 1->3->4->9->10->11->1]
The routing algorithm of Pastry goes as follows:
Step A. A node u searches for an object A by firstly looking it up in its leaf set.
Step B. If it was not available, then the query is forwarded to a known node that shares "a number of prefixes with $A$ that is at least larger than what node u shares with A".
Step C. If such a record is not found, then the query is forwarded to a node in the leaf set that is numerically closest to $A$.
This is why a node $u$ stores addresses of nodes organizes its table as follows:
1.Each record in row $i$ of the routing table of node $u$ is a node identifier that shares $i$ prefix bits with the identifier of $u$.
2.The $(i + 1)^{th}$ bit of the records of a row $i$ is unique and is taken from the set $\{0, …, 2^{b} – 1\}$.
Example in a typical scenario:
if u address is 1111 and object $A$ has identifier 4324: then here is what will happen: (we assume that it is of the base of 4. (i.e. addresses are from [1-4][1-4][1-4][1-4]).
Node $u$ shares 0 prefix with object $A$. Therefore, it looks in row 0.
According to rule 2 above, node $u$ stores addresses of nodes 1XXX, 2XXX, 3XXX, 4XXX, where X is a "dont-care" value. The closest among these nodes to $A$ is 4XXX. - Let's say this 4XXX is actually 4013. Then $u$ forwards to $u _1$ with address 4013. Now you are going to repeat the same thing again at node $u _1$ with address 4013.
To make it simpler, here is again an example of how it will go in 4013. $u _1$ will first look for the size common prefix between 4013 and 4324 which is 1. So it goes to row 1, which contain values such that 41XX, 42XX, 43XX, 44XX. The closes among them to $A$ is 43XX. - if this was 4331 then it will be forwards toward it.
The maximum number of hops here in is 4 hops (XXXX) ! in Pastry terms, it is $log_{2^b}$. So it is reduced as $b$ increases. But the size of the rows which are $2^b$ will increases ! -- so the authors said that $b$ = 4 is a good balance !
Practical scenarios are usually not typical as that. There may be situations in which there are not many nodes in the network. this is why we follow step C above. - However, what you need to guarantee to make this algorithm correct is that each node be connected to the closest two nodes to it (in term of identifiers). This will form a ring of ordered nodes [e.g. 1->3->4->9->10->11->1]
Context
StackExchange Computer Science Q#3138, answer score: 5
Revisions (0)
No revisions yet.