Wireless Issues Apr08: Difference between revisions
No edit summary |
m (Marked up ticket numbers) |
||
Line 21: | Line 21: | ||
'''Tickets:''' |
'''Tickets:''' |
||
* 6774 - Read fails to transfer document when using salut. [this seems specific to Read] |
* <trac>6774</trac> - Read fails to transfer document when using salut. [this seems specific to Read] |
||
* 5459 - second circle in sugar home view provides false information [apparently an UI issue - check overall UI behaviour] |
* <trac>5459</trac> - second circle in sugar home view provides false information [apparently an UI issue - check overall UI behaviour] |
||
* what else? |
* what else? |
||
Line 31: | Line 31: | ||
'''Tickets:''' |
'''Tickets:''' |
||
* 4153 - Connect to linklocal instead of school mesh, DHCP failure. [why?] |
* <trac>4153</trac> - Connect to linklocal instead of school mesh, DHCP failure. [why?] |
||
* 5963 - Laptop fails to reliably connect with school mesh portal [this is closed and it is a duplicate of 4153]. |
* <trac>5963</trac> - Laptop fails to reliably connect with school mesh portal [this is closed and it is a duplicate of <trac>4153</trac>]. |
||
* 6287 - Associating with one mesh prevents you from successfully with a different one.[bad title, this is another instance of "cannot find the school server"] |
* <trac>6287</trac> - Associating with one mesh prevents you from successfully with a different one.[bad title, this is another instance of "cannot find the school server"] |
||
* 5908 - Laptop unable to connect to schoolserver jabber server. [The real question of reliability of ejjaberd should be addressed, instead of suggesting that a reactive protocol is not a good choice (even if this is the case)]. |
* <trac>5908</trac> - Laptop unable to connect to schoolserver jabber server. [The real question of reliability of ejjaberd should be addressed, instead of suggesting that a reactive protocol is not a good choice (even if this is the case)]. |
||
==Middleware== |
==Middleware== |
||
Line 40: | Line 40: | ||
'''Summary:''' |
'''Summary:''' |
||
We have two major categories of problems here. (1) we have a scalability issue, due to the use of MDNS and due to the avahi implementation and (2) we have bugs in the presence/telepathy-salut/gabble to address. A decision should be made if we are switching to another presence mechanism (like cerebro) because if we are not, we need to keep optimizing/fixing the current one. Take, for example, the 6572 optimization, do we need it or not? |
We have two major categories of problems here. (1) we have a scalability issue, due to the use of MDNS and due to the avahi implementation and (2) we have bugs in the presence/telepathy-salut/gabble to address. A decision should be made if we are switching to another presence mechanism (like cerebro) because if we are not, we need to keep optimizing/fixing the current one. Take, for example, the <trac>6572</trac> optimization, do we need it or not? |
||
'''Tickets:''' |
'''Tickets:''' |
||
*5335 - More mdns traffic then expected [anedoctal, could be merged with 5078] |
* <trac>5335</trac> - More mdns traffic then expected [anedoctal, could be merged with 5078] |
||
*5078 - A more mesh-friendly presence protocol for salut [same issue as above] |
* <trac>5078</trac> - A more mesh-friendly presence protocol for salut [same issue as above] |
||
*6553 - No XOs in the mesh view and avahi seemed crashed [no build information in the ticket] |
* <trac>6553</trac> - No XOs in the mesh view and avahi seemed crashed [no build information in the ticket] |
||
*6572 - Replace key with hash to reduce avahi TXT size [will we invest time in avahi?] |
* <trac>6572</trac> - Replace key with hash to reduce avahi TXT size [will we invest time in avahi?] |
||
*6889 - Using wired connection, gabble does not attempt reconnect to jabber server |
* <trac>6889</trac> - Using wired connection, gabble does not attempt reconnect to jabber server |
||
*6888 - Laptop connects to presence server, but not seen by other laptops |
* <trac>6888</trac> - Laptop connects to presence server, but not seen by other laptops |
||
*6886 - Laptop stops running Gabble and reverts to Salut |
* <trac>6886</trac> - Laptop stops running Gabble and reverts to Salut |
||
*6881 - Laptop unable to connect to schoolserver presence service |
* <trac>6881</trac> - Laptop unable to connect to schoolserver presence service |
||
*6882 - Laptop was running both salut and gabble at same time |
* <trac>6882</trac> - Laptop was running both salut and gabble at same time |
||
*6883 - Other laptops aren't displayed in Neighborhood View |
* <trac>6883</trac> - Other laptops aren't displayed in Neighborhood View |
||
*6884 - Incorrect number of laptops shown in neighborhood view |
* <trac>6884</trac> - Incorrect number of laptops shown in neighborhood view |
||
*6855 - Need to extend the network scan to look for school server as AP [actually this is asking salut to shut up in infra mode] |
* <trac>6855</trac> - Need to extend the network scan to look for school server as AP [actually this is asking salut to shut up in infra mode] |
||
*6750 - Incorrect wireless setting after resume [NM and suspend/resume interaction problem] |
* <trac>6750</trac> - Incorrect wireless setting after resume [NM and suspend/resume interaction problem] |
||
*6872 - 703 and 702 builds - network manager keeps trying to connect to mesh network even after associated w/ AP [at first, not consistent with my observations] |
* <trac>6872</trac> - 703 and 702 builds - network manager keeps trying to connect to mesh network even after associated w/ AP [at first, not consistent with my observations] |
||
*6855 - Need to extend the network scan to look for school server as AP [what is requested? - it seems another instance of shut salut down] |
*<trac>6855</trac> - Need to extend the network scan to look for school server as AP [what is requested? - it seems another instance of shut salut down] |
||
* what else? |
* what else? |
||
Line 67: | Line 67: | ||
'''Tickets:''' |
'''Tickets:''' |
||
*4901 - Ctest cannot see wireless APs [antenna sensitivity reduced, *when compared to other XO*] |
* <trac>4901</trac> - Ctest cannot see wireless APs [antenna sensitivity reduced, *when compared to other XO*] |
||
*4068 - Range of communication between 2 XOs is limited only to 20 meters [I tested this two XOs and they behaved as expected] |
* <trac>4068</trac> - Range of communication between 2 XOs is limited only to 20 meters [I tested this two XOs and they behaved as expected] |
||
==Libertas driver/firmware== |
==Libertas driver/firmware== |
||
Line 78: | Line 78: | ||
'''Deals with''' issues that prevent association or correct operation under infrastructure mode |
'''Deals with''' issues that prevent association or correct operation under infrastructure mode |
||
'''Summary:''' We currently have a known compatibility issues with preN routers ( |
'''Summary:''' We currently have a known compatibility issues with preN routers (<trac>5527</trac>) and an assortment of association problems (support for cloaked access point, failure to associate in channel different from 1,6,or 11 and others). No clear patterns or major issues it seems. |
||
'''Tickets:''' |
'''Tickets:''' |
||
*6279 - Cannot see Linksys AP on channel 9 |
* <trac>6279</trac> - Cannot see Linksys AP on channel 9 |
||
*2097 - Can't do DHCP at vmware @ 5CC |
* <trac>2097</trac> - Can't do DHCP at vmware @ 5CC |
||
*5527 - G1G1 users complain that the XO affects their local network |
* <trac>5527</trac> - G1G1 users complain that the XO affects their local network |
||
*4975 - Association fails |
* <trac>4975</trac> - Association fails |
||
*6811 - WLAN doesn't reassociate with known access points |
* <trac>6811</trac> - WLAN doesn't reassociate with known access points |
||
*6117 - Can't connect to Access Point if SSID is not broadcast [same as 6537] |
* <trac>6117</trac> - Can't connect to Access Point if SSID is not broadcast [same as 6537] |
||
*6537 - Support for Cloaked Access Points [a duplicate of 6117, but more detailed discussion] |
* <trac>6537</trac> - Support for Cloaked Access Points [a duplicate of 6117, but more detailed discussion] |
||
===Path discovery issues=== |
===Path discovery issues=== |
||
'''Deals with:''' issues (design and bugs) in the reactive path discovery mechanism. |
'''Deals with:''' issues (design and bugs) in the reactive path discovery mechanism. |
||
'''Summary:''' Apart from a bug ( |
'''Summary:''' Apart from a bug (<trac>6589</trac>) the issue here is the inherent burstiness of a reactive protocol. Optimizations are being studied and include changing the route expiration time (from 10 to 20s) and some other timing tweaks (rreq_delay) and possibly adjustments in the link costs. |
||
'''Tickets:''' |
'''Tickets:''' |
||
*6589 - xo stops responding to mesh path requests frames |
* <trac>6589</trac> - xo stops responding to mesh path requests frames |
||
===Improvements to scalability=== |
===Improvements to scalability=== |
||
Line 103: | Line 103: | ||
'''Tickets:''' |
'''Tickets:''' |
||
*4927 - [firmware] beacon interval gets reset by other operations [beacon control is fixed in 22.p8] |
* <trac>4927</trac> - [firmware] beacon interval gets reset by other operations [beacon control is fixed in 22.p8] |
||
===Active antenna=== |
===Active antenna=== |
||
Line 119: | Line 119: | ||
'''Tickets:''' |
'''Tickets:''' |
||
*6709 - beaconing while monitoring [driver patch waiting for approval] |
* <trac>6709</trac> - beaconing while monitoring [driver patch waiting for approval] |
||
*6666 - ethtool -S msh0 returning noise [driver patch waiting for approval] |
* <trac>6666</trac> - ethtool -S msh0 returning noise [driver patch waiting for approval] |
||
===Interface issues=== |
===Interface issues=== |
||
'''Deals with:''' issues where the network design choices conflict with other design choices. |
'''Deals with:''' issues where the network design choices conflict with other design choices. |
||
'''Summary:''' Interface with suspend/resume feature. Right now, the activity in this front is the introduction of a multicast filter on the firmware (22.p8), so an XO will wake up only to certain multicast frames (not to all of them). This needs support in the kernel (driver should inform the multicast addresses), otherwise collaboration will break |
'''Summary:''' Interface with suspend/resume feature. Right now, the activity in this front is the introduction of a multicast filter on the firmware (22.p8), so an XO will wake up only to certain multicast frames (not to all of them). This needs support in the kernel (driver should inform the multicast addresses), otherwise collaboration will break <trac>6818</trac> |
||
'''Tickets:''' |
'''Tickets:''' |
||
*6818 - Driver does not set link level multicast addresses into firmware when ip address assigned to mesh interfaceMesh view not working with 22.p8/p9 |
* <trac>6818</trac> - Driver does not set link level multicast addresses into firmware when ip address assigned to mesh interfaceMesh view not working with 22.p8/p9 |
||
===Miscellanea=== |
===Miscellanea=== |
||
Line 136: | Line 136: | ||
'''Tickets:''' |
'''Tickets:''' |
||
*6529 - Multicast ping over eth0 (not mesh) sometimes produces duplicate packets [this may provide us with useful information, but it is hardly an issue by itself] |
* <trac>6529</trac> - Multicast ping over eth0 (not mesh) sometimes produces duplicate packets [this may provide us with useful information, but it is hardly an issue by itself] |
||
*6527 - Mesh does not forward multicast packets (most of the time) - [same as above - duplicated ipv6 pings are not an issue in itself] |
* <trac>6527</trac> - Mesh does not forward multicast packets (most of the time) - [same as above - duplicated ipv6 pings are not an issue in itself] |
||
===Tickets in Limbo=== |
===Tickets in Limbo=== |
Revision as of 04:04, 23 April 2008
This is an attempt to start a detailed taxonomy of the network-related problems we face as of April-08. It is a network-centric view (please see the "focus" box, bellow) so, detailing is higher on the network issues and lower or absent on another aspects (example, suspend and resume, datastore, etc).
DISCLAIMER: Please contribute, so this gets bigger than the perspective of one person.
An issue here, may be:
- implementation issues (bugs).
- design issues (poor choices).
In terms of details, expect to find the following:
- Focus: Libertas driver/firmware
- Slightly blurred: School server and middleware issues
- Completely Blurred: Application issues. UI issues, Hardware issues.
Applications and Sugar
Deals with: problems that should be fixed/enhanced in activities (Read, Chat, etc) or in the User Interface (mesh view).
Summary: Some of our collaboration issues are related to application bugs. One good example is #6774. We don't seem to have many of the UI issues on the mesh view as we've had in the past (dialogue boxes for entering keys, etc) but the information provided in the home view and the blinking of some circles may be incorrect/incoherent. We need to check this.
Tickets:
- <trac>6774</trac> - Read fails to transfer document when using salut. [this seems specific to Read]
- <trac>5459</trac> - second circle in sugar home view provides false information [apparently an UI issue - check overall UI behaviour]
- what else?
School server
Deals with: applications that are part of or related to the school server.
Summary: As I gather, there are two major problem here: (1) DHCP fails sometimes and we don't know exaclty why (we should try to find the root cause) and (2) Ejabberd fails frequently. This is clearly a limitation of the software. Attentive reading through the lists and tickets support that the ejjaber server is not stable.
Tickets:
- <trac>4153</trac> - Connect to linklocal instead of school mesh, DHCP failure. [why?]
- <trac>5963</trac> - Laptop fails to reliably connect with school mesh portal [this is closed and it is a duplicate of <trac>4153</trac>].
- <trac>6287</trac> - Associating with one mesh prevents you from successfully with a different one.[bad title, this is another instance of "cannot find the school server"]
- <trac>5908</trac> - Laptop unable to connect to schoolserver jabber server. [The real question of reliability of ejjaberd should be addressed, instead of suggesting that a reactive protocol is not a good choice (even if this is the case)].
Middleware
Deals with: Everything that cannot be fixed in the libertas driver/firmware or at the application belongs here. (NetworkManager, Sugar Presence Service, Telepathy salut, Telepathy gabble, Avahi/MDNS)
Summary: We have two major categories of problems here. (1) we have a scalability issue, due to the use of MDNS and due to the avahi implementation and (2) we have bugs in the presence/telepathy-salut/gabble to address. A decision should be made if we are switching to another presence mechanism (like cerebro) because if we are not, we need to keep optimizing/fixing the current one. Take, for example, the <trac>6572</trac> optimization, do we need it or not?
Tickets:
- <trac>5335</trac> - More mdns traffic then expected [anedoctal, could be merged with 5078]
- <trac>5078</trac> - A more mesh-friendly presence protocol for salut [same issue as above]
- <trac>6553</trac> - No XOs in the mesh view and avahi seemed crashed [no build information in the ticket]
- <trac>6572</trac> - Replace key with hash to reduce avahi TXT size [will we invest time in avahi?]
- <trac>6889</trac> - Using wired connection, gabble does not attempt reconnect to jabber server
- <trac>6888</trac> - Laptop connects to presence server, but not seen by other laptops
- <trac>6886</trac> - Laptop stops running Gabble and reverts to Salut
- <trac>6881</trac> - Laptop unable to connect to schoolserver presence service
- <trac>6882</trac> - Laptop was running both salut and gabble at same time
- <trac>6883</trac> - Other laptops aren't displayed in Neighborhood View
- <trac>6884</trac> - Incorrect number of laptops shown in neighborhood view
- <trac>6855</trac> - Need to extend the network scan to look for school server as AP [actually this is asking salut to shut up in infra mode]
- <trac>6750</trac> - Incorrect wireless setting after resume [NM and suspend/resume interaction problem]
- <trac>6872</trac> - 703 and 702 builds - network manager keeps trying to connect to mesh network even after associated w/ AP [at first, not consistent with my observations]
- <trac>6855</trac> - Need to extend the network scan to look for school server as AP [what is requested? - it seems another instance of shut salut down]
- what else?
Hardware
Deals with: Problems that cannot/should not be fixed by software.
Summary: There are some reports of bad wireless interfaces (poor radio sensitivity). Are they frequent? It doesn't seem so. We cannot expect every device to have the same range. The only report I've seem was not validated (see 4068). My feeling is: low priority.
Tickets:
- <trac>4901</trac> - Ctest cannot see wireless APs [antenna sensitivity reduced, *when compared to other XO*]
- <trac>4068</trac> - Range of communication between 2 XOs is limited only to 20 meters [I tested this two XOs and they behaved as expected]
Libertas driver/firmware
Those that need to be fixed/enhanced in the libertas driver/firmware.
Since this is the focal point of this page a more detailed itemization follows:
Infra mode issues
Deals with issues that prevent association or correct operation under infrastructure mode
Summary: We currently have a known compatibility issues with preN routers (<trac>5527</trac>) and an assortment of association problems (support for cloaked access point, failure to associate in channel different from 1,6,or 11 and others). No clear patterns or major issues it seems.
Tickets:
- <trac>6279</trac> - Cannot see Linksys AP on channel 9
- <trac>2097</trac> - Can't do DHCP at vmware @ 5CC
- <trac>5527</trac> - G1G1 users complain that the XO affects their local network
- <trac>4975</trac> - Association fails
- <trac>6811</trac> - WLAN doesn't reassociate with known access points
- <trac>6117</trac> - Can't connect to Access Point if SSID is not broadcast [same as 6537]
- <trac>6537</trac> - Support for Cloaked Access Points [a duplicate of 6117, but more detailed discussion]
Path discovery issues
Deals with: issues (design and bugs) in the reactive path discovery mechanism.
Summary: Apart from a bug (<trac>6589</trac>) the issue here is the inherent burstiness of a reactive protocol. Optimizations are being studied and include changing the route expiration time (from 10 to 20s) and some other timing tweaks (rreq_delay) and possibly adjustments in the link costs.
Tickets:
- <trac>6589</trac> - xo stops responding to mesh path requests frames
Improvements to scalability
Deals with: everything that can improve scalability (by freeing airtime or implementing various adaptive behaviours).
Summary: Air time is precious - control over management frames is very important. Control over probe response retries was introduced in 22.p6 and Adaptive Contention Window based on the number of neighbours was introduced in 22p8. Control over beacon frequency were fixed in 22.p8. Current research in this item is focusing on the route discovery mechanism.
Tickets:
- <trac>4927</trac> - [firmware] beacon interval gets reset by other operations [beacon control is fixed in 22.p8]
Active antenna
Deals with: problems specific to the use of the standalone active antennae
Summary:
Tickets:
- programming
Improvements to testability
Deals with: issues on testing capabilities
Summary: Two issues just waiting for driver patch approval. (1) capturing traffic from XO is ineffective (because it keeps sending out beacon frames during the capture) and (2) NIC statistics - an important debug information - are garbage.
Tickets:
- <trac>6709</trac> - beaconing while monitoring [driver patch waiting for approval]
- <trac>6666</trac> - ethtool -S msh0 returning noise [driver patch waiting for approval]
Interface issues
Deals with: issues where the network design choices conflict with other design choices.
Summary: Interface with suspend/resume feature. Right now, the activity in this front is the introduction of a multicast filter on the firmware (22.p8), so an XO will wake up only to certain multicast frames (not to all of them). This needs support in the kernel (driver should inform the multicast addresses), otherwise collaboration will break <trac>6818</trac>
Tickets:
- <trac>6818</trac> - Driver does not set link level multicast addresses into firmware when ip address assigned to mesh interfaceMesh view not working with 22.p8/p9
Miscellanea
Deals with: issues that seem related to driver/firmware but cannot be clearly classified
Summary:
Tickets:
- <trac>6529</trac> - Multicast ping over eth0 (not mesh) sometimes produces duplicate packets [this may provide us with useful information, but it is hardly an issue by itself]
- <trac>6527</trac> - Mesh does not forward multicast packets (most of the time) - [same as above - duplicated ipv6 pings are not an issue in itself]
Tickets in Limbo
I am writing another page with a list of tickets that we should close or update (and possibly bring to this page).