Projects/xomail

From OLPC
< Projects
Revision as of 23:47, 30 March 2008 by Shikhar (talk | contribs) (Notes on other features)
Jump to: navigation, search

This page is geared towards Summer of Code 2008 work on an email activity. I am convinced it is possible to develop a functional and usable email client that implements a core feature set in the 10-week period that Google kindly sponsors.

I welcome your comments and feedback on the Talk page.

Introduction

Currently there is a Gmail activity but no real email client that can be used in Sugar. The possibility of accessing/composing emails offline does not exist. An email client with mesh integration like direct sending to mesh buddies and other fancy features would be great, but the basic groundwork of a usable email activity is needed.

Collaboration tools are a very important part of the OLPC software bundle and an activity which brings email to the XO desktop and ties in with the environment would be a very useful addition.

Background

  • Dead Email project and related discussion Talk:Email
  • Notes from a former OLPC intern examining different email clients and recommendations. [1]
  • Tinymail has an email client that could do with better Sugar integration

Use cases

In general the use cases of an email client for OLPC users should be quite clear :-)

  • collaborating on projects
  • sending/receiving attachments
  • pen pals
  • participating in discussions on mailing lists

Deliverables

Broadly,

  • A lightweight, functional email client with a child-friendly GUI
  • A daemon should be developed for sending of unsent messages and receiving of new email. [The rationale behind this is we can't assume the child will open the email activity when internet access is available.]

Other requisites:

  • POP, SMTP, and IMAP support, and also with Transport Layer Security
  • Should support ASCII and MIME-encoded Unicode. Sane selection during composition.
  • Easy configuration on first run and later
  • Search should be central and helpful
  • Should have at least a basic address book
  • Should be able to handle large volumes of email and generally perform well

Approach

Email organization

I would like to center email organization around tags and not folders. The idea behind using tags and not folders is well-articulated here

The Journal already uses tags, and for this activity I would like to extend them to have a visual representation as a GTK widget. They should be easily managed visually, for example dragging-and-dropping a tag onto a message should apply it.

Email sending/receiving, MIME-parsing and message construction

  • Python's email module for can be used for MIME parsing of incoming email, and message construction.
  • The Python libaries smtplib, poplib, imaplib can be utilized for email sending/receiving.

These libraries are RFC-compliant and were in my experimentation, reliable.

Storage

Develop an abstraction layer for storage-related requests.

It seems to me that traditional mailbox formats like mbox, maildir; are not very suitable if email is organized around tags.

sqlite can be used for storage in a database. Using a database for email storage is not a new idea, here is an account of someone's successful experiment for his purposes: http://www.sqlite.org/cvstrac/wiki?p=ExperimentalMailUserAgent. I have examined this client and it is a proof-of-concept that convinced me.

The database schema would of course have to be very well thought out. There can be several tables in the database so that, for example, large blob's of email content do not impact scanning of email metadata.

The implementation will be with pysqlite initially but can be ported to C if time permits.

Service descriptors

To make it easy and extensible to configure on first run for services such as Gmail, a file format for a service descriptor can be formalized.

The service descriptor would contain details about servers, protocols, junk-headers provided by the service, etc. Thus the only information required upon selection of a service should be username and password.

It should be possible to specify certain details in the service descriptor such as whether the service sets SpamAssassin headers, which IMAP folders are not to be downloaded. For example the Gmail service descriptor could specify that email in the 'All Mail', 'Spam' and 'Trash' folder is not to be downloaded, and that other folder names are to be interpreted as tags, since Gmail provides IMAP.

Search, Filters, Smart tags

Having a flexible, efficient, and robust search back end would have many benefits for this activity.

Something interesting would be to formalize a common grammar for searches, filters and smart tags. For example, to be able to search for "received:today", and also as easily create a smart tag called "today's email" using that string, or create a filter that applies tag "papa" to all emails I receive with "from:dad at smthn.org".

While it's probably not beyond kids to pick up a simple domain specific language ;-), the UI for these tasks should be easy-to-use for constructing these search strings.

Search

For full text search, an option is to index incoming email with the sqlite fts module. This could potentially be expensive in terms of flash space (tbd)

Filters

Filters are rules that are applied to incoming email. It should be easy to specify filters in the user interface. A big use of filters is mailing lists and to that extent there should be automatic tagging based on mailing list headers.

Smart tags

Smart tags as first class tags, except they can't be applied to messages since they are dynamically evaluated for the query they represent, and in that sense are like a saved search. It can be made possible to 'keep' a search as a smart tag. This is an optional but should not be very hard to implement provided the search back-end is developed as planned.

Contacts

A simple address book should be implemented for address auto-completion. Can later be made more of a real address book, or share data with a (future?) contacts activity.

Optionals

Message Threading

jwz's threading algorithm [2] can be used. It was proposed in the imapext-thread Internet Draft. There is also some python code for the same.[3]

It should be possible for the user to manually thread by drag-and-drop where the algorithm gets it wrong.

Spam filtering

In this stage of development I think it would be best to 'outsource' the spam filtering. So SpamAssassin headers can be supported. Using POP/IMAP with Gmail, spam is already filtered out by Gmail.

User Interface

The activity will have an intuitive and discoverable user interface in light of children.

Currently as I visualize what I want to develop; I foresee four main tabs: for writing email, for reading email, for managing contacts, and for configuration. Associated icons for these tabs will be displayed in the icon toolbar.

The interface for writing email will be quite similar to what is available today with most email clients, with some simplifications.

The interface for reading email will be different, since there are no folders.

A tag toolbar will be visible displaying all the tags (as mentioned above, a custom widget for tags will be developed). Clicking on a tag will display the messages associated with that tag. Sorting on metadata is possible with the use of a similar interface to [4].
Clicking on a message in the list displays that message. The tag toolbar will still be visible, and dragging-and-dropping a tag onto the message will apply it. Simply clicking on a tag will lead back to the list of messages for that tag.

<more to follow>

Language of choice

A request on the Email page for a recursive name got an interesting reply: GUBOP Underperforms Because Of Python. While this might be an apt comment, I have a feeling that a well-designed Python activity can perform decently even in the context of an email client.

For the activity, I initially propose to use Python entirely, enabling me to also focus on usability. The daemon can be coded in C and only fire up a Python interpreter when internet access is available.

If my mentor is in agreement with this approach, I would definitely make clean abstractions so that if performance does indeed turn out to be an issue, the bottleneck code can be ported to C. This I plan to do for the mail storage layer in any case (if time permits), and it should not be very hard to do because I would be initially relying on pysqlite.

Schedule

Before SOC

I will be collaborating with my mentor and community on a good design.

I would like to be able to dive into coding when the SOC period begins, so in the interim period I will further familiarize myself with pygtk/GTK+, sqlite and the Sugar activity API.

  • experiment with email storage in a database so I can have a database schema to start work with
  • work on a grammar for searches and experiment with fts-indexing
  • have sending/receiving code ready
  • have a prototype activity that I can build upon ready

Milestones

In the SOC period I can spend 8+ hours everyday on this project.

  • May 29: I can officially start coding
  • July 1: Sending/receiving/storage/tagging works
  • July 21: Search/filtering/smart tag will work. Some optimizations to the database schema would have been made.
  • August 1: Configuration will have been made easy with the use of service descriptors
  • August 10: Contacts support (address auto-completion, management) will work
  • August 17: Daemon will work
  • August 18: Google's "pencil's down" date. The database schema and service descriptor file format is frozen. Work continues but not under the GSoC umbrella anymore :-)

(Tentative. Associated UI work is implied.)

Beyond SOC

I would of course like to continue working on this project beyond the SOC period on, initially primarily on:

  • documenting
  • testing
  • optimizing for performance and making the activity more usable
  • see below :-)

A vision for future direction

<todo>