Data Mining for Viability -- Junk Monkey
About · FAQ · Calendar · Contact Projects · Organizations · Mentors · Interns · Volunteers
- Interns - If you are interested in this project, add your name to the Interested interns section below along with a brief description of why you're interested and why you'd be a good mentor for this project, along with any specific ideas for execution you might have beyond the project description.
- Mentors - If you are interested in this project, add your name to the Interested mentors section below along with a brief description of why you're interested and why you'd be a good mentor for this project, along with any specific ideas for execution you might have beyond the project description.
- Others - If you are interested in this project in a role other than that of potential mentor or potential intern (example: you are an organization, a potential end-user/tester, may have helpful resources, or want to be notified if the project is chosen), add your name to the Other interested parties section below with contact information and details.
- Everyone - Contribute to the project description on this page, or discuss this project on the associated talk page (click the "discussion" tab on top).
The deadline for editing this proposal or adding yourself to the list is 11:59pm EST (GMT-5) on August 6, 2007.
Junk Monkey
OLPC's goal of providing computers to underdeveloped countries will be immensely supported by thin-client software. If users can have an internet signal, they can then access enormous amounts of information, tools and software informing users of worldwide events while aiding local, community impacts.
The rise of blogs, e-newsletters and online newspapers compliments the rise of thin-client software. Much of the news information we receive on the internet has very little obvious metadata, such as background about the author or story subject.
This has bothered me and I believe it presents a major challenge for the future of information, not least in the developing world where blogs and online newsletters will undoubtedly flourish as computing service become cheaper and more readily available. To address this problem, I'd like to program a plug-in for Firefox that mines body text of news websites and cross-references data against databases.
The proposed databases can be already existing (such as PubMed, Wikipedia or Sourcewatch) or stimulated by the formulation of such a plug-in (a database of all journalists and major blog writers, for example).
I believe the project will be very challenging theoretically, while not incredibly difficult technically. I've tested some basic algorithms by taking body text, creating a few rules then analyzing the text. The results have been very promising and I hope to explore this idea with any interested peeps.
Examples
Theoretical examples are here on the Project's wiki.
Interested interns
Intern name
Contact information, why you'd be good for the job, any specific plans, variants, or details you would personally like to implement and why
Hemant Goyal- contact information
Interested mentors
Mentor name
Contact information, why you'd be good for the job, any specific plans, variants, or details you would personally like to implement and why
Other interested parties
Coogan Brennan
coogan(dot)brennan(at)columbia(dot)edu -- project proposer