Talk:On Presence updates/User Profiles/Collaboration: Difference between revisions

From OLPC
Jump to navigation Jump to search
(Scalability)
 
(6 intermediate revisions by 2 users not shown)
Line 6: Line 6:


I appreciate that on a mesh network, every node can be a router, and the routing protocol may require that every router know the source and/or destination. However, that data will be encrypted, and so it will be impossible to identify it without traffic analysis. Requiring a malicious eavesdropper to perform traffic analysis of encrypted streams seems a reasonable bar to set for privacy. [[User:Bemasc|Ben]] 11:40, 29 April 2008 (EDT)
I appreciate that on a mesh network, every node can be a router, and the routing protocol may require that every router know the source and/or destination. However, that data will be encrypted, and so it will be impossible to identify it without traffic analysis. Requiring a malicious eavesdropper to perform traffic analysis of encrypted streams seems a reasonable bar to set for privacy. [[User:Bemasc|Ben]] 11:40, 29 April 2008 (EDT)


:Actually I should 've made it more clear that the activities that are shared this way are only activities the user shares publicly in the mesh network. These are shared activities, '''not''' activities that the user runs privately, or shared privately with other users. Privately shared activities are shared on a per-invitation basis. The invitation mechanism is also in place, but I have not described it here.
--[[User:Ypod|Ypod]] 11:53, 29 April 2008 (EDT)


== Scalability ==
== Scalability ==


You note that the broadcast approach provides n^2 scalability, which improves to near-linear in a dense mesh. However, as you have noted, meshes have fundamental scaling limits. In particular, I have never seen a proposal for mesh routing that does not require each node to maintain memory at least linear in the number of nodes. Thus, we may expect that the internet will not be using a mesh routing approach any time soon. On a wired network, the scaling behavior is again n^2, and wired networks are where the largest n is likely to occur.
You note that the broadcast approach provides n^2 scalability, which improves to near-linear in a dense mesh. However, as you have noted, meshes have fundamental scaling limits. In particular, I have never seen a proposal for mesh routing that does not require each node to maintain memory at least linear in the number of nodes. Thus, we may expect that the internet will not be using a mesh routing approach any time soon. On a wired network, the scaling behavior is again n^2, and wired networks are where the largest n is likely to occur.

:You lost me here. "broadcast approach provides n^2 scalability": which "broadcast" are you referring to now? There is sometimes a confusion between layer-2 and layer-3 broadcast. The proposed data transport protocol scales linearly in both a sparse and dense network.


We can do better than n^2 using server-based caching, like Daf's [[Gadget]]. Support for this should not be an afterthought. [[User:Bemasc|Ben]] 11:49, 29 April 2008 (EDT)
We can do better than n^2 using server-based caching, like Daf's [[Gadget]]. Support for this should not be an afterthought. [[User:Bemasc|Ben]] 11:49, 29 April 2008 (EDT)

:"mesh. However, as you have noted, meshes have fundamental scaling limits.": True. The capacity of a mesh network (referred to as "ad-hoc" network in the literature) scales inversely to N^1/2. However, all this discussion is done within this margin. In other words, the mesh network can offer some specific capacity affected by the number of nodes in the network. This document attempts to discuss the best that can be done within the limits of the capacity available to us, before we resort to other solutions like using an XMPP server, use multiple channels to spread network traffic, etc.--[[User:Ypod|Ypod]] 12:28, 29 April 2008 (EDT)
:: OK. I was just responding to "Scalability MUST be at the core of any proposed presence and collaboration mechanisms". Scaling to hundreds of thousands of users is not going to be achieved by the mechanisms you've described so far. We need to design mesh presence systems that scale to as large as possible, but we also need to plan for those mesh presence systems to integrate seamlessly and efficiently with presence via internet, perhaps using XMPP servers. [[User:Bemasc|Ben]] 21:40, 1 May 2008 (EDT)

== Number of protocols ==

You allocate 2 bytes for "activity type". This seems far from sufficient. Since different activities are not communicating with one another, they must have different protocol numbers. However, since they are developed in a distributed fashion, there is no central authority that can hand out protocol numbers. Moreover, each version of an activity may have a new protocol number, if the protocol changes with each version, or if Develop assumes that the protocol has changed by default (for safety). Once there are 256 activity versions, the probability of collisions in 2 bytes would become large. Collisions are pretty bad, as they imply that you will be offered the ability to join some shared activity, but when you try (with the wrong client) it will always fail, and perhaps crash both sides.

I think you need a lot more than 2 bytes. I recommend a cryptographic-style hash for protocol identification. [[User:Bemasc|Ben]] 11:54, 29 April 2008 (EDT)

:Again, this is a good point. Indeed, the "activity type" size should be increased according to your line of thought, but before doing so, I 'd like to get consensus that this _is_ actually the way to do collaboration. This is not merely the first bug, but I consider this an implementation issue.
--[[User:Ypod|Ypod]] 12:03, 29 April 2008 (EDT)

Latest revision as of 01:40, 2 May 2008

Privacy

I would argue strongly that the list of activities in which I am participating is private. You wouldn't want your IRC client telling me what other applications you're running. The Sugar design calls for implementation of sharing Activities that are only visible to a specific group, where each group corresponds to a single unique keypair.

One possible way of implementing this is to set up a distinct, parallel presence system for each group. Offhand, I cannot think of any acceptably efficient way of implementing this design, especially since every non-public activity generates a new group. Instead, we may encrypt each entry in the list of shared activities with the private key of the group that has permission to join it. With this system, I can tell whether two people are in the same activity, but I cannot tell what activity it is. To improve privacy further, we may pad out each Activity entry to a fixed length with random data before encryption, so that two people joined to the same activity show different entries in their presence data.

I appreciate that on a mesh network, every node can be a router, and the routing protocol may require that every router know the source and/or destination. However, that data will be encrypted, and so it will be impossible to identify it without traffic analysis. Requiring a malicious eavesdropper to perform traffic analysis of encrypted streams seems a reasonable bar to set for privacy. Ben 11:40, 29 April 2008 (EDT)


Actually I should 've made it more clear that the activities that are shared this way are only activities the user shares publicly in the mesh network. These are shared activities, not activities that the user runs privately, or shared privately with other users. Privately shared activities are shared on a per-invitation basis. The invitation mechanism is also in place, but I have not described it here.

--Ypod 11:53, 29 April 2008 (EDT)

Scalability

You note that the broadcast approach provides n^2 scalability, which improves to near-linear in a dense mesh. However, as you have noted, meshes have fundamental scaling limits. In particular, I have never seen a proposal for mesh routing that does not require each node to maintain memory at least linear in the number of nodes. Thus, we may expect that the internet will not be using a mesh routing approach any time soon. On a wired network, the scaling behavior is again n^2, and wired networks are where the largest n is likely to occur.

You lost me here. "broadcast approach provides n^2 scalability": which "broadcast" are you referring to now? There is sometimes a confusion between layer-2 and layer-3 broadcast. The proposed data transport protocol scales linearly in both a sparse and dense network.

We can do better than n^2 using server-based caching, like Daf's Gadget. Support for this should not be an afterthought. Ben 11:49, 29 April 2008 (EDT)

"mesh. However, as you have noted, meshes have fundamental scaling limits.": True. The capacity of a mesh network (referred to as "ad-hoc" network in the literature) scales inversely to N^1/2. However, all this discussion is done within this margin. In other words, the mesh network can offer some specific capacity affected by the number of nodes in the network. This document attempts to discuss the best that can be done within the limits of the capacity available to us, before we resort to other solutions like using an XMPP server, use multiple channels to spread network traffic, etc.--Ypod 12:28, 29 April 2008 (EDT)
OK. I was just responding to "Scalability MUST be at the core of any proposed presence and collaboration mechanisms". Scaling to hundreds of thousands of users is not going to be achieved by the mechanisms you've described so far. We need to design mesh presence systems that scale to as large as possible, but we also need to plan for those mesh presence systems to integrate seamlessly and efficiently with presence via internet, perhaps using XMPP servers. Ben 21:40, 1 May 2008 (EDT)

Number of protocols

You allocate 2 bytes for "activity type". This seems far from sufficient. Since different activities are not communicating with one another, they must have different protocol numbers. However, since they are developed in a distributed fashion, there is no central authority that can hand out protocol numbers. Moreover, each version of an activity may have a new protocol number, if the protocol changes with each version, or if Develop assumes that the protocol has changed by default (for safety). Once there are 256 activity versions, the probability of collisions in 2 bytes would become large. Collisions are pretty bad, as they imply that you will be offered the ability to join some shared activity, but when you try (with the wrong client) it will always fail, and perhaps crash both sides.

I think you need a lot more than 2 bytes. I recommend a cryptographic-style hash for protocol identification. Ben 11:54, 29 April 2008 (EDT)

Again, this is a good point. Indeed, the "activity type" size should be increased according to your line of thought, but before doing so, I 'd like to get consensus that this _is_ actually the way to do collaboration. This is not merely the first bug, but I consider this an implementation issue.

--Ypod 12:03, 29 April 2008 (EDT)