Projects/OLPC ALBANET
Cross-Lingual Meaning to Word dictionary & Collection of NLP Tools
Team Participants
- Ervin Ruci, eruci@univlora.edu.al, +355-692035216,
http://www.univlora.edu.al/personel/eruci Address: L. Partizani, Rr Drashovica, Nr 48, Vlore, Albania. Past Experience/Qualifications: Webmaster, Mount Allison University, Sackville, NB (1997-2000) Applications Developer, CIRA (Canadian Internet Registration Authority, Ottawa, ON (2001-2005) Founder, Geocoder.ca, a free geocoding solution for North America (2006) Education: Mount Allison University (Computer Science), Carleton University Graduate Studies (MSC Computational Geometry) Current Employer and/or School: University of Vlora, Department of Computer Science
- Tanush Shaska, shaska@univlora.edu.al, shaska@okaland.edu,
http://www.albmath.org/users/shaska/index.html Address: 546 Science and Engineering Bld. Department of Mathematics and Statistics Oakland University Rochester, MI, 48309-4485, USA Phone: 248-370-3436 Past Experience/Qualifications: Summer 07 Visiting Professor, Dep. Computer Science, Maria Curie-Sklodowska Univ., Lublin, Poland 2003-05 Assistant Professor of Mathematics, Department of Mathematics, University of Idaho 2001-03 Visiting Assistant Professor of Mathematics, University of California at Irvine 2000 Deutsche Forschungsgemeinschaft Fellow, Dep. of Mathematics, Univ. of Erlangen, Germany. Currently: Professor, University of Vlora and Assistant Professor Oakland University Education: PHD, University of Florida
- Eustrat Zhupa, ezhupa@univlora.edu.al
http://www.univlora.edu.al/personel/ezhupa Address: University of Vlora, Faculty of Sciences, Vlore, Albania Past Experience/Qualifications: Lecturer, University of Bari, Italy (2004-2008). Currently: Professor, University of Vlora. Dean of the Faculty of Sciences, University of Vlora Education: PHD, University of Bari.
Objectives
Project Objectives: (please list specific, measurable objectives for your project)
- Develop a cross-lingual natural language processing system with easily plug-and-extend
functionality based on the Global Wordnet project. The software will enable a user to define a particular word or concept in their own language and obtain the word that matches their definition in any language installed in the software's knowledge base. A sample application illustrating this as proof of concept can be found on the web at : http://fjalor.kerkoje.com
- Develop tools for extending and improving knowledge bases such as the Albanet
project : http://albanet.univlora.edu.al and other wordnets in the user's native language. All this input will be aggregated in a centralized web service that will keep track of changes and extensions of the knowledge bases in this collaborative effort to improve the quality of the Global Wordnet.
- Make all code and databases well documented and develop a SVN repository for
tracking changes to the software by the community.
Plan of Action
We will direct our students to modify and adapt the current software in a standalone version for the XO Laptop platform, then extend its current functionality to develop this application into a collaborative Wordnet extension platform.
Plan and Procedure for Achieving the Stated Objectives:
- Develop the documentation and the basic technical design for the system
- Divide the coding tasks between 20 of our best students.
- Integrate and streamline all work done into a single standalone application that will
mostly work in off-line mode, but sync the data changes to a central repository whenever online.
Needs:
Linguistic tools are an important educational tool in the under-developed
world. These tools will make global knowledge more accessible to all, regardless of the language this knowledge is compiled under.
This project will collect local knowledge bases to create a network of interconnected
concepts across different languages and dialects.
This project will provide a software platform that can be
extended and used in other tasks as well, such as Information Retrieval, Publishing and Sharing Creative work.
Will invite greater participation in collaborative linguistic knowledge bases
development by making the software available to other platforms/environments. Why can't this project be done in emulation using non-XO machines? We wish to use the lowest end possible machines, so as to make sure that all interface functions behave properly inviting greater participation from students in the underdeveloped world who have the creative energy but not the tools to participate in large Natural Language Processing collaborative development efforts.
Why are you requesting the number of machines you are asking for? We need one laptop for each
student who will be working on the project.
We will consider salvaged/rebuilt and/or damaged XO laptops as we are looking to make our software function in the lowest common denominator, and our students will gain even greater skill in facing the extra challenge of fixing and reconfiguring XO laptops that are in not near optimal shape.
Sharing Deliverables:
Project URL: http://albanet.univlora.edu.al
All results will be posted on this website in a quarterly basis.
The final package will be distributed as a single self installable software package tested and verified to work properly on any XO-laptop.
Our work will have many possible applications outside the XO community, especially in the areas of cross-lingual named entity extraction, and cross-lingual information retrieval, both areas of current active research.
We are part of the Global wordnet project (http://globalwordnet.org/), and we will announce the progress of our work through regular contacts with the Global wordnet community.
There are no nearby XO Lending libraries we can rely on at the moment, it seems like Albania is off the map when it comes to the existence of such support groups.
Moreover
- Our project will benefit from testing and documentation efforts of a wide range of
people and will achieve its true goals only after it has been widely distributed to the community.
- Teachers (especially foreign language teachers) will provide valuable input on how to
use this tool as part of their curricula)
- We will promote our work on the University of Vlora Research page as well as various
conferences such as the Kosova Freedom Software conference where Ervin Ruci and Eustrat Zhupa are scheduled to present a paper on cross-language entity recognition systems in August 2009. “Different languages divide us, but information technology erases that division”.
- We are always looking for mentors and supporters in our quest to develop better tools
for information management and processing, so as to get closer to our goal of using technology to improve the quality of information we receive across different languages.
- The mentor will be someone with access to the natural language processing groups in
the world who can provide valuable advice and guidance i our work.
8. Timeline
- Designing the main outline of the interfaces and the systems for sharing and gathering
information. (2 months)
- Coding the Algorithms that will create language independent functionality across
different language bases using Hidden Markov Models and probabilistic learning algorithms for analysing and processing information across different languages. (8 months)
- Testing, Documenting, optimizing the software to function on laptops with low
processing power and storage capacity. (4 months)
- Porting the application to other platforms and developing the off-line technology for
syncing all work done by individuals into a central repository. (4 months)