Projects/OLPC ALBANET: Difference between revisions

From OLPC
Jump to navigation Jump to search
(NLP tools for OLPC, Meaning to Word Multi-lingual Dictionary [Ervin Ruci, ALBANIA])
 
No edit summary
Line 1: Line 1:
= Cross-Lingual Meaning to Word dictionary & Collection of NLP Tools =
1. Project Title & Shipment Detail
Name of Project: Cross-Lingual Meaning to Word dictionary & Collection of NLP Tools
Shipping Address You've Verified:
Att: Ervin Ruci, Universiteti i Vlores, Departamenti i Shkencave Kompjuterike dhe
Inxhinjerise Elektrike, Sheshi Pavaresia, Vlore, Albania.
Phone: +355 – 692035216


== Team Participants ==
Number of Laptops You Request to Borrow: 20

Loan Length—How Many Months: 18
* Ervin Ruci, eruci@univlora.edu.al, +355-692035216,
2. Team Participants (In list form)
Name(s) & Contact Info: (include email addresses & phone numbers)
1. Ervin Ruci, eruci@univlora.edu.al, +355-692035216,
http://www.univlora.edu.al/personel/eruci
http://www.univlora.edu.al/personel/eruci
Address: L. Partizani, Rr Drashovica, Nr 48, Vlore, Albania.
Address: L. Partizani, Rr Drashovica, Nr 48, Vlore, Albania.
Line 20: Line 13:
Computational Geometry)
Computational Geometry)
Current Employer and/or School: University of Vlora, Department of Computer Science
Current Employer and/or School: University of Vlora, Department of Computer Science
2. Tanush Shaska, shaska@univlora.edu.al, shaska@okaland.edu,
* Tanush Shaska, shaska@univlora.edu.al, shaska@okaland.edu,
http://www.albmath.org/users/shaska/index.html
http://www.albmath.org/users/shaska/index.html
Address:
Address:
Line 38: Line 31:
Currently: Professor, University of Vlora and Assistant Professor Oakland University
Currently: Professor, University of Vlora and Assistant Professor Oakland University
Education: PHD, University of Florida
Education: PHD, University of Florida
3. Eustrat Zhupa, ezhupa@univlora.edu.al
* Eustrat Zhupa, ezhupa@univlora.edu.al
http://www.univlora.edu.al/personel/ezhupa
http://www.univlora.edu.al/personel/ezhupa
Address: University of Vlora, Faculty of Sciences, Vlore, Albania
Address: University of Vlora, Faculty of Sciences, Vlore, Albania
Line 47: Line 40:




3. Objectives
== Objectives ==
Project Objectives: (please list specific, measurable objectives for your project)
Project Objectives: (please list specific, measurable objectives for your project)
1. Develop a cross-lingual natural language processing system with easily plug-and-extend
* Develop a cross-lingual natural language processing system with easily plug-and-extend
functionality based on the Global Wordnet project. The software will enable a user to
functionality based on the Global Wordnet project. The software will enable a user to
define a particular word or concept in their own language and obtain the word that
define a particular word or concept in their own language and obtain the word that
Line 55: Line 48:
sample application illustrating this as proof of concept can be found on the web at :
sample application illustrating this as proof of concept can be found on the web at :
http://fjalor.kerkoje.com
http://fjalor.kerkoje.com
2. Develop tools for extending and improving knowledge bases such as the Albanet
* Develop tools for extending and improving knowledge bases such as the Albanet
project : http://albanet.univlora.edu.al and other wordnets in the user's native language.
project : http://albanet.univlora.edu.al and other wordnets in the user's native language.
All this input will be aggregated in a centralized web service that will keep track of
All this input will be aggregated in a centralized web service that will keep track of
changes and extensions of the knowledge bases in this collaborative effort to improve
changes and extensions of the knowledge bases in this collaborative effort to improve
the quality of the Global Wordnet.
the quality of the Global Wordnet.
3. Make all code and databases well documented and develop a SVN repository for
* Make all code and databases well documented and develop a SVN repository for
tracking changes to the software by the community.
tracking changes to the software by the community.

[edit] 4. Plan of Action (One or more paragraphs)
== Plan of Action ==

We will direct our students to modify and adapt the current software in a standalone version for
We will direct our students to modify and adapt the current software in a standalone version for
the XO Laptop platform, then extend its current functionality to develop this application into a
the XO Laptop platform, then extend its current functionality to develop this application into a
collaborative Wordnet extension platform.
collaborative Wordnet extension platform.

Plan and Procedure for Achieving the Stated Objectives:
== Plan and Procedure for Achieving the Stated Objectives: ==
1. Develop the documentation and the basic technical design for the system

2. Divide the coding tasks between 20 of our best students.
* Develop the documentation and the basic technical design for the system
3. Integrate and streamline all work done into a single standalone application that will
* Divide the coding tasks between 20 of our best students.
* Integrate and streamline all work done into a single standalone application that will
mostly work in off-line mode, but sync the data changes to a central repository
mostly work in off-line mode, but sync the data changes to a central repository
whenever online.
whenever online.
Line 75: Line 72:




[edit] 5. Needs:
== Needs: ==

Linguistic tools are an important educational tool in the under-developed
Linguistic tools are an important educational tool in the under-developed
world. These tools will make global knowledge more accessible to all, regardless of the language this
world. These tools will make global knowledge more accessible to all, regardless of the language this
knowledge is compiled under.
knowledge is compiled under.

Locally? This project will collect local knowledge bases to create a network of interconnected
This project will collect local knowledge bases to create a network of interconnected
concepts across different languages and dialects.
concepts across different languages and dialects.

In the greater OLPC/Sugar community? This project will provide a software platform that can be
This project will provide a software platform that can be
extended and used in other tasks as well, such as Information Retrieval, Publishing and Sharing
extended and used in other tasks as well, such as Information Retrieval, Publishing and Sharing
Creative work.
Creative work.

Outside the community? Will invite greater participation in collaborative linguistic knowledge bases
Will invite greater participation in collaborative linguistic knowledge bases
development by making the software available to other platforms/environments.
development by making the software available to other platforms/environments.
Why can't this project be done in emulation using non-XO machines? We wish to use the lowest end
Why can't this project be done in emulation using non-XO machines? We wish to use the lowest end
Line 90: Line 91:
participation from students in the underdeveloped world who have the creative energy but not the tools
participation from students in the underdeveloped world who have the creative energy but not the tools
to participate in large Natural Language Processing collaborative development efforts.
to participate in large Natural Language Processing collaborative development efforts.


Why are you requesting the number of machines you are asking for? We need one laptop for each
Why are you requesting the number of machines you are asking for? We need one laptop for each
student who will be working on the project.
student who will be working on the project.

We will consider salvaged/rebuilt and/or damaged XO laptops as we are looking to make our software
We will consider salvaged/rebuilt and/or damaged XO laptops as we are looking to make our software
function in the lowest common denominator, and our students will gain even greater skill in facing the
function in the lowest common denominator, and our students will gain even greater skill in facing the
extra challenge of fixing and reconfiguring XO laptops that are in not near optimal shape.
extra challenge of fixing and reconfiguring XO laptops that are in not near optimal shape.
Will you consider (1) salvaged/rebuilt or (2) damaged XO Laptops?





[edit] 6. Sharing Deliverables:
== Sharing Deliverables: ==


Project URL: http://albanet.univlora.edu.al
Project URL: http://albanet.univlora.edu.al

How will you convey tentative ideas & results back to the OLPC/Sugar community, prior to

completion? All results will be posted on this website in a quarterly basis.
All results will be posted on this website in a quarterly basis.
How will the final fruits of your labor be distributed to children or community members worldwide?

The final package will be distributed as a single self installable software package tested and verified to
The final package will be distributed as a single self installable software package tested and verified to
work properly on any XO-laptop.
work properly on any XO-laptop.

Will your work have any possible application or use outside our community?
Our work will have many possible applications outside the XO community, especially in the areas of
Our work will have many possible applications outside the XO community, especially in the areas of
cross-lingual named entity extraction, and cross-lingual information retrieval, both areas of current
cross-lingual named entity extraction, and cross-lingual information retrieval, both areas of current
active research.
active research.

If yes, how will these people be reached?
We are part of the Global wordnet project (http://globalwordnet.org/), and we will announce the
We are part of the Global wordnet project (http://globalwordnet.org/), and we will announce the
progress of our work through regular contacts with the Global wordnet community.
progress of our work through regular contacts with the Global wordnet community.

Have you investigated working with nearby XO Lending Libraries or Project Groups?
There are no nearby XO Lending libraries we can rely on at the moment, it seems like Albania is off
There are no nearby XO Lending libraries we can rely on at the moment, it seems like Albania is off
the map when it comes to the existence of such support groups.
the map when it comes to the existence of such support groups.


== Moreover ==
[edit] 7.
1. Our project will benefit from testing and documentation efforts of a wide range of
* Our project will benefit from testing and documentation efforts of a wide range of
people and will achieve its true goals only after it has been widely distributed to the
people and will achieve its true goals only after it has been widely distributed to the
community.
community.
2. Teachers (especially foreign language teachers) will provide valuable input on how to
* Teachers (especially foreign language teachers) will provide valuable input on how to
use this tool as part of their curricula)
use this tool as part of their curricula)
3. We will promote our work on the University of Vlora Research page as well as various
* We will promote our work on the University of Vlora Research page as well as various
conferences such as the Kosova Freedom Software conference where Ervin Ruci and
conferences such as the Kosova Freedom Software conference where Ervin Ruci and
Eustrat Zhupa are scheduled to present a paper on cross-language entity recognition
Eustrat Zhupa are scheduled to present a paper on cross-language entity recognition
systems in August 2009. “Different languages divide us, but information technology
systems in August 2009. “Different languages divide us, but information technology
erases that division”.
erases that division”.
4. We are always looking for mentors and supporters in our quest to develop better tools
* We are always looking for mentors and supporters in our quest to develop better tools
for information management and processing, so as to get closer to our goal of using
for information management and processing, so as to get closer to our goal of using
technology to improve the quality of information we receive across different languages.
technology to improve the quality of information we receive across different languages.
5. The mentor will be someone with access to the natural language processing groups in
* The mentor will be someone with access to the natural language processing groups in
the world who can provide valuable advice and guidance i our work.
the world who can provide valuable advice and guidance i our work.
Would your Project benefit from Support, Documentation and/or Testing people?


Page 4



[edit] 8. Timeline (Show start to finish)
== 8. Timeline ==
1. Designing the main outline of the interfaces and the systems for sharing and gathering
* Designing the main outline of the interfaces and the systems for sharing and gathering
information. (2 months)
information. (2 months)
2. Coding the Algorithms that will create language independent functionality across
* Coding the Algorithms that will create language independent functionality across
different language bases using Hidden Markov Models and probabilistic learning
different language bases using Hidden Markov Models and probabilistic learning
algorithms for analysing and processing information across different languages. (8
algorithms for analysing and processing information across different languages. (8
months)
months)
3. Testing, Documenting, optimizing the software to function on laptops with low
* Testing, Documenting, optimizing the software to function on laptops with low
processing power and storage capacity. (4 months)
processing power and storage capacity. (4 months)
4. Porting the application to other platforms and developing the off-line technology for
* Porting the application to other platforms and developing the off-line technology for
syncing all work done by individuals into a central repository. (4 months)
syncing all work done by individuals into a central repository. (4 months)

Revision as of 21:57, 16 March 2009

Cross-Lingual Meaning to Word dictionary & Collection of NLP Tools

Team Participants

  • Ervin Ruci, eruci@univlora.edu.al, +355-692035216,

http://www.univlora.edu.al/personel/eruci Address: L. Partizani, Rr Drashovica, Nr 48, Vlore, Albania. Past Experience/Qualifications: Webmaster, Mount Allison University, Sackville, NB (1997-2000) Applications Developer, CIRA (Canadian Internet Registration Authority, Ottawa, ON (2001-2005) Founder, Geocoder.ca, a free geocoding solution for North America (2006) Education: Mount Allison University (Computer Science), Carleton University Graduate Studies (MSC Computational Geometry) Current Employer and/or School: University of Vlora, Department of Computer Science

  • Tanush Shaska, shaska@univlora.edu.al, shaska@okaland.edu,

http://www.albmath.org/users/shaska/index.html Address: 546 Science and Engineering Bld. Department of Mathematics and Statistics Oakland University Rochester, MI, 48309-4485, USA Phone: 248-370-3436 Past Experience/Qualifications: Summer 07 Visiting Professor, Dep. Computer Science, Maria Curie-Sklodowska Univ., Lublin, Poland 2003-05 Assistant Professor of Mathematics, Department of Mathematics, University of Idaho 2001-03 Visiting Assistant Professor of Mathematics, University of California at Irvine 2000 Deutsche Forschungsgemeinschaft Fellow, Dep. of Mathematics, Univ. of Erlangen, Germany. Currently: Professor, University of Vlora and Assistant Professor Oakland University Education: PHD, University of Florida

  • Eustrat Zhupa, ezhupa@univlora.edu.al

http://www.univlora.edu.al/personel/ezhupa Address: University of Vlora, Faculty of Sciences, Vlore, Albania Past Experience/Qualifications: Lecturer, University of Bari, Italy (2004-2008). Currently: Professor, University of Vlora. Dean of the Faculty of Sciences, University of Vlora Education: PHD, University of Bari.


Objectives

Project Objectives: (please list specific, measurable objectives for your project)

  • Develop a cross-lingual natural language processing system with easily plug-and-extend

functionality based on the Global Wordnet project. The software will enable a user to define a particular word or concept in their own language and obtain the word that matches their definition in any language installed in the software's knowledge base. A sample application illustrating this as proof of concept can be found on the web at : http://fjalor.kerkoje.com

  • Develop tools for extending and improving knowledge bases such as the Albanet

project : http://albanet.univlora.edu.al and other wordnets in the user's native language. All this input will be aggregated in a centralized web service that will keep track of changes and extensions of the knowledge bases in this collaborative effort to improve the quality of the Global Wordnet.

  • Make all code and databases well documented and develop a SVN repository for

tracking changes to the software by the community.

Plan of Action

We will direct our students to modify and adapt the current software in a standalone version for the XO Laptop platform, then extend its current functionality to develop this application into a collaborative Wordnet extension platform.

Plan and Procedure for Achieving the Stated Objectives:

  • Develop the documentation and the basic technical design for the system
  • Divide the coding tasks between 20 of our best students.
  • Integrate and streamline all work done into a single standalone application that will

mostly work in off-line mode, but sync the data changes to a central repository whenever online.


Needs:

Linguistic tools are an important educational tool in the under-developed

world. These tools will make global knowledge more accessible to all, regardless of the language this knowledge is compiled under.

This project will collect local knowledge bases to create a network of interconnected

concepts across different languages and dialects.

This project will provide a software platform that can be

extended and used in other tasks as well, such as Information Retrieval, Publishing and Sharing Creative work.

Will invite greater participation in collaborative linguistic knowledge bases

development by making the software available to other platforms/environments. Why can't this project be done in emulation using non-XO machines? We wish to use the lowest end possible machines, so as to make sure that all interface functions behave properly inviting greater participation from students in the underdeveloped world who have the creative energy but not the tools to participate in large Natural Language Processing collaborative development efforts.


Why are you requesting the number of machines you are asking for? We need one laptop for each student who will be working on the project.

We will consider salvaged/rebuilt and/or damaged XO laptops as we are looking to make our software function in the lowest common denominator, and our students will gain even greater skill in facing the extra challenge of fixing and reconfiguring XO laptops that are in not near optimal shape.


Sharing Deliverables:

Project URL: http://albanet.univlora.edu.al


All results will be posted on this website in a quarterly basis.

The final package will be distributed as a single self installable software package tested and verified to work properly on any XO-laptop.

Our work will have many possible applications outside the XO community, especially in the areas of cross-lingual named entity extraction, and cross-lingual information retrieval, both areas of current active research.

We are part of the Global wordnet project (http://globalwordnet.org/), and we will announce the progress of our work through regular contacts with the Global wordnet community.

There are no nearby XO Lending libraries we can rely on at the moment, it seems like Albania is off the map when it comes to the existence of such support groups.

Moreover

  • Our project will benefit from testing and documentation efforts of a wide range of

people and will achieve its true goals only after it has been widely distributed to the community.

  • Teachers (especially foreign language teachers) will provide valuable input on how to

use this tool as part of their curricula)

  • We will promote our work on the University of Vlora Research page as well as various

conferences such as the Kosova Freedom Software conference where Ervin Ruci and Eustrat Zhupa are scheduled to present a paper on cross-language entity recognition systems in August 2009. “Different languages divide us, but information technology erases that division”.

  • We are always looking for mentors and supporters in our quest to develop better tools

for information management and processing, so as to get closer to our goal of using technology to improve the quality of information we receive across different languages.

  • The mentor will be someone with access to the natural language processing groups in

the world who can provide valuable advice and guidance i our work.


8. Timeline

  • Designing the main outline of the interfaces and the systems for sharing and gathering

information. (2 months)

  • Coding the Algorithms that will create language independent functionality across

different language bases using Hidden Markov Models and probabilistic learning algorithms for analysing and processing information across different languages. (8 months)

  • Testing, Documenting, optimizing the software to function on laptops with low

processing power and storage capacity. (4 months)

  • Porting the application to other platforms and developing the off-line technology for

syncing all work done by individuals into a central repository. (4 months)