Projects/OLPC ALBANET: Difference between revisions
(NLP tools for OLPC, Meaning to Word Multi-lingual Dictionary [Ervin Ruci, ALBANIA]) |
|||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
⚫ | |||
1. Project Title & Shipment Detail |
|||
⚫ | |||
Shipping Address You've Verified: |
|||
Att: Ervin Ruci, Universiteti i Vlores, Departamenti i Shkencave Kompjuterike dhe |
|||
Inxhinjerise Elektrike, Sheshi Pavaresia, Vlore, Albania. |
|||
Phone: +355 – 692035216 |
|||
=== Unfortunately our laptops got held up at customs, the officials there are of the opinion that these devices are worth a lot of money, perhaps due to their ignorance of this things, ; our project here can not afford thousands of dollars in customs fees, so we have asked them to send the laptops back. On the bright side our project is proceeding with meeting of its objectives and we will have a working prototype before the end of this year. :) Thanks. eruci@univlora.edu.al=== |
|||
Number of Laptops You Request to Borrow: 20 |
|||
Loan Length—How Many Months: 18 |
|||
== Team Participants == |
|||
Name(s) & Contact Info: (include email addresses & phone numbers) |
|||
* Ervin Ruci, eruci@univlora.edu.al, +355-692035216, |
|||
http://www.univlora.edu.al/personel/eruci |
http://www.univlora.edu.al/personel/eruci |
||
Address: L. Partizani, Rr Drashovica, Nr 48, Vlore, Albania. |
Address: L. Partizani, Rr Drashovica, Nr 48, Vlore, Albania. |
||
Line 20: | Line 15: | ||
Computational Geometry) |
Computational Geometry) |
||
Current Employer and/or School: University of Vlora, Department of Computer Science |
Current Employer and/or School: University of Vlora, Department of Computer Science |
||
* Tanush Shaska, shaska@univlora.edu.al, shaska@okaland.edu, |
|||
http://www.albmath.org/users/shaska/index.html |
http://www.albmath.org/users/shaska/index.html |
||
Address: |
Address: |
||
Line 38: | Line 33: | ||
Currently: Professor, University of Vlora and Assistant Professor Oakland University |
Currently: Professor, University of Vlora and Assistant Professor Oakland University |
||
Education: PHD, University of Florida |
Education: PHD, University of Florida |
||
* Eustrat Zhupa, ezhupa@univlora.edu.al |
|||
http://www.univlora.edu.al/personel/ezhupa |
http://www.univlora.edu.al/personel/ezhupa |
||
Address: University of Vlora, Faculty of Sciences, Vlore, Albania |
Address: University of Vlora, Faculty of Sciences, Vlore, Albania |
||
Line 47: | Line 42: | ||
== Objectives == |
|||
Project Objectives: (please list specific, measurable objectives for your project) |
Project Objectives: (please list specific, measurable objectives for your project) |
||
* Develop a cross-lingual natural language processing system with easily plug-and-extend |
|||
functionality based on the Global Wordnet project. The software will enable a user to |
functionality based on the Global Wordnet project. The software will enable a user to |
||
define a particular word or concept in their own language and obtain the word that |
define a particular word or concept in their own language and obtain the word that |
||
Line 55: | Line 50: | ||
sample application illustrating this as proof of concept can be found on the web at : |
sample application illustrating this as proof of concept can be found on the web at : |
||
http://fjalor.kerkoje.com |
http://fjalor.kerkoje.com |
||
* Develop tools for extending and improving knowledge bases such as the Albanet |
|||
project : http://albanet.univlora.edu.al and other wordnets in the user's native language. |
project : http://albanet.univlora.edu.al and other wordnets in the user's native language. |
||
All this input will be aggregated in a centralized web service that will keep track of |
All this input will be aggregated in a centralized web service that will keep track of |
||
changes and extensions of the knowledge bases in this collaborative effort to improve |
changes and extensions of the knowledge bases in this collaborative effort to improve |
||
the quality of the Global Wordnet. |
the quality of the Global Wordnet. |
||
* Make all code and databases well documented and develop a SVN repository for |
|||
tracking changes to the software by the community. |
tracking changes to the software by the community. |
||
[edit] 4. Plan of Action (One or more paragraphs) |
|||
== Plan of Action == |
|||
We will direct our students to modify and adapt the current software in a standalone version for |
We will direct our students to modify and adapt the current software in a standalone version for |
||
the XO Laptop platform, then extend its current functionality to develop this application into a |
the XO Laptop platform, then extend its current functionality to develop this application into a |
||
collaborative Wordnet extension platform. |
collaborative Wordnet extension platform. |
||
Plan and Procedure for Achieving the Stated Objectives: |
== Plan and Procedure for Achieving the Stated Objectives: == |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
mostly work in off-line mode, but sync the data changes to a central repository |
mostly work in off-line mode, but sync the data changes to a central repository |
||
whenever online. |
whenever online. |
||
Line 75: | Line 74: | ||
== Needs: == |
|||
Linguistic tools are an important educational tool in the under-developed |
Linguistic tools are an important educational tool in the under-developed |
||
world. These tools will make global knowledge more accessible to all, regardless of the language this |
world. These tools will make global knowledge more accessible to all, regardless of the language this |
||
knowledge is compiled under. |
knowledge is compiled under. |
||
This project will collect local knowledge bases to create a network of interconnected |
|||
concepts across different languages and dialects. |
concepts across different languages and dialects. |
||
This project will provide a software platform that can be |
|||
extended and used in other tasks as well, such as Information Retrieval, Publishing and Sharing |
extended and used in other tasks as well, such as Information Retrieval, Publishing and Sharing |
||
Creative work. |
Creative work. |
||
Will invite greater participation in collaborative linguistic knowledge bases |
|||
development by making the software available to other platforms/environments. |
development by making the software available to other platforms/environments. |
||
Why can't this project be done in emulation using non-XO machines? We wish to use the lowest end |
Why can't this project be done in emulation using non-XO machines? We wish to use the lowest end |
||
Line 90: | Line 93: | ||
participation from students in the underdeveloped world who have the creative energy but not the tools |
participation from students in the underdeveloped world who have the creative energy but not the tools |
||
to participate in large Natural Language Processing collaborative development efforts. |
to participate in large Natural Language Processing collaborative development efforts. |
||
Why are you requesting the number of machines you are asking for? We need one laptop for each |
Why are you requesting the number of machines you are asking for? We need one laptop for each |
||
student who will be working on the project. |
student who will be working on the project. |
||
We will consider salvaged/rebuilt and/or damaged XO laptops as we are looking to make our software |
We will consider salvaged/rebuilt and/or damaged XO laptops as we are looking to make our software |
||
function in the lowest common denominator, and our students will gain even greater skill in facing the |
function in the lowest common denominator, and our students will gain even greater skill in facing the |
||
extra challenge of fixing and reconfiguring XO laptops that are in not near optimal shape. |
extra challenge of fixing and reconfiguring XO laptops that are in not near optimal shape. |
||
Will you consider (1) salvaged/rebuilt or (2) damaged XO Laptops? |
|||
== Sharing Deliverables: == |
|||
Project URL: http://albanet.univlora.edu.al |
Project URL: http://albanet.univlora.edu.al |
||
How will you convey tentative ideas & results back to the OLPC/Sugar community, prior to |
|||
All results will be posted on this website in a quarterly basis. |
|||
How will the final fruits of your labor be distributed to children or community members worldwide? |
|||
The final package will be distributed as a single self installable software package tested and verified to |
The final package will be distributed as a single self installable software package tested and verified to |
||
work properly on any XO-laptop. |
work properly on any XO-laptop. |
||
Will your work have any possible application or use outside our community? |
|||
Our work will have many possible applications outside the XO community, especially in the areas of |
Our work will have many possible applications outside the XO community, especially in the areas of |
||
cross-lingual named entity extraction, and cross-lingual information retrieval, both areas of current |
cross-lingual named entity extraction, and cross-lingual information retrieval, both areas of current |
||
active research. |
active research. |
||
If yes, how will these people be reached? |
|||
We are part of the Global wordnet project (http://globalwordnet.org/), and we will announce the |
We are part of the Global wordnet project (http://globalwordnet.org/), and we will announce the |
||
progress of our work through regular contacts with the Global wordnet community. |
progress of our work through regular contacts with the Global wordnet community. |
||
Have you investigated working with nearby XO Lending Libraries or Project Groups? |
|||
There are no nearby XO Lending libraries we can rely on at the moment, it seems like Albania is off |
There are no nearby XO Lending libraries we can rely on at the moment, it seems like Albania is off |
||
the map when it comes to the existence of such support groups. |
the map when it comes to the existence of such support groups. |
||
== Moreover == |
|||
[edit] 7. |
|||
* Our project will benefit from testing and documentation efforts of a wide range of |
|||
people and will achieve its true goals only after it has been widely distributed to the |
people and will achieve its true goals only after it has been widely distributed to the |
||
community. |
community. |
||
* Teachers (especially foreign language teachers) will provide valuable input on how to |
|||
use this tool as part of their curricula) |
use this tool as part of their curricula) |
||
* We will promote our work on the University of Vlora Research page as well as various |
|||
conferences such as the Kosova Freedom Software conference where Ervin Ruci and |
conferences such as the Kosova Freedom Software conference where Ervin Ruci and |
||
Eustrat Zhupa are scheduled to present a paper on cross-language entity recognition |
Eustrat Zhupa are scheduled to present a paper on cross-language entity recognition |
||
systems in August 2009. “Different languages divide us, but information technology |
systems in August 2009. “Different languages divide us, but information technology |
||
erases that division”. |
erases that division”. |
||
* We are always looking for mentors and supporters in our quest to develop better tools |
|||
for information management and processing, so as to get closer to our goal of using |
for information management and processing, so as to get closer to our goal of using |
||
technology to improve the quality of information we receive across different languages. |
technology to improve the quality of information we receive across different languages. |
||
* The mentor will be someone with access to the natural language processing groups in |
|||
the world who can provide valuable advice and guidance i our work. |
the world who can provide valuable advice and guidance i our work. |
||
Would your Project benefit from Support, Documentation and/or Testing people? |
|||
Page 4 |
|||
[edit] 8. Timeline (Show start to finish) |
|||
== 8. Timeline == |
|||
* Designing the main outline of the interfaces and the systems for sharing and gathering |
|||
information. (2 months) |
information. (2 months) |
||
* Coding the Algorithms that will create language independent functionality across |
|||
different language bases using Hidden Markov Models and probabilistic learning |
different language bases using Hidden Markov Models and probabilistic learning |
||
algorithms for analysing and processing information across different languages. (8 |
algorithms for analysing and processing information across different languages. (8 |
||
months) |
months) |
||
* Testing, Documenting, optimizing the software to function on laptops with low |
|||
processing power and storage capacity. (4 months) |
processing power and storage capacity. (4 months) |
||
* Porting the application to other platforms and developing the off-line technology for |
|||
syncing all work done by individuals into a central repository. (4 months) |
syncing all work done by individuals into a central repository. (4 months) |
Latest revision as of 15:07, 21 May 2009
Cross-Lingual Meaning to Word dictionary & Collection of NLP Tools
Unfortunately our laptops got held up at customs, the officials there are of the opinion that these devices are worth a lot of money, perhaps due to their ignorance of this things, ; our project here can not afford thousands of dollars in customs fees, so we have asked them to send the laptops back. On the bright side our project is proceeding with meeting of its objectives and we will have a working prototype before the end of this year. :) Thanks. eruci@univlora.edu.al
Team Participants
- Ervin Ruci, eruci@univlora.edu.al, +355-692035216,
http://www.univlora.edu.al/personel/eruci Address: L. Partizani, Rr Drashovica, Nr 48, Vlore, Albania. Past Experience/Qualifications: Webmaster, Mount Allison University, Sackville, NB (1997-2000) Applications Developer, CIRA (Canadian Internet Registration Authority, Ottawa, ON (2001-2005) Founder, Geocoder.ca, a free geocoding solution for North America (2006) Education: Mount Allison University (Computer Science), Carleton University Graduate Studies (MSC Computational Geometry) Current Employer and/or School: University of Vlora, Department of Computer Science
- Tanush Shaska, shaska@univlora.edu.al, shaska@okaland.edu,
http://www.albmath.org/users/shaska/index.html Address: 546 Science and Engineering Bld. Department of Mathematics and Statistics Oakland University Rochester, MI, 48309-4485, USA Phone: 248-370-3436 Past Experience/Qualifications: Summer 07 Visiting Professor, Dep. Computer Science, Maria Curie-Sklodowska Univ., Lublin, Poland 2003-05 Assistant Professor of Mathematics, Department of Mathematics, University of Idaho 2001-03 Visiting Assistant Professor of Mathematics, University of California at Irvine 2000 Deutsche Forschungsgemeinschaft Fellow, Dep. of Mathematics, Univ. of Erlangen, Germany. Currently: Professor, University of Vlora and Assistant Professor Oakland University Education: PHD, University of Florida
- Eustrat Zhupa, ezhupa@univlora.edu.al
http://www.univlora.edu.al/personel/ezhupa Address: University of Vlora, Faculty of Sciences, Vlore, Albania Past Experience/Qualifications: Lecturer, University of Bari, Italy (2004-2008). Currently: Professor, University of Vlora. Dean of the Faculty of Sciences, University of Vlora Education: PHD, University of Bari.
Objectives
Project Objectives: (please list specific, measurable objectives for your project)
- Develop a cross-lingual natural language processing system with easily plug-and-extend
functionality based on the Global Wordnet project. The software will enable a user to define a particular word or concept in their own language and obtain the word that matches their definition in any language installed in the software's knowledge base. A sample application illustrating this as proof of concept can be found on the web at : http://fjalor.kerkoje.com
- Develop tools for extending and improving knowledge bases such as the Albanet
project : http://albanet.univlora.edu.al and other wordnets in the user's native language. All this input will be aggregated in a centralized web service that will keep track of changes and extensions of the knowledge bases in this collaborative effort to improve the quality of the Global Wordnet.
- Make all code and databases well documented and develop a SVN repository for
tracking changes to the software by the community.
Plan of Action
We will direct our students to modify and adapt the current software in a standalone version for the XO Laptop platform, then extend its current functionality to develop this application into a collaborative Wordnet extension platform.
Plan and Procedure for Achieving the Stated Objectives:
- Develop the documentation and the basic technical design for the system
- Divide the coding tasks between 20 of our best students.
- Integrate and streamline all work done into a single standalone application that will
mostly work in off-line mode, but sync the data changes to a central repository whenever online.
Needs:
Linguistic tools are an important educational tool in the under-developed
world. These tools will make global knowledge more accessible to all, regardless of the language this knowledge is compiled under.
This project will collect local knowledge bases to create a network of interconnected
concepts across different languages and dialects.
This project will provide a software platform that can be
extended and used in other tasks as well, such as Information Retrieval, Publishing and Sharing Creative work.
Will invite greater participation in collaborative linguistic knowledge bases
development by making the software available to other platforms/environments. Why can't this project be done in emulation using non-XO machines? We wish to use the lowest end possible machines, so as to make sure that all interface functions behave properly inviting greater participation from students in the underdeveloped world who have the creative energy but not the tools to participate in large Natural Language Processing collaborative development efforts.
Why are you requesting the number of machines you are asking for? We need one laptop for each
student who will be working on the project.
We will consider salvaged/rebuilt and/or damaged XO laptops as we are looking to make our software function in the lowest common denominator, and our students will gain even greater skill in facing the extra challenge of fixing and reconfiguring XO laptops that are in not near optimal shape.
Sharing Deliverables:
Project URL: http://albanet.univlora.edu.al
All results will be posted on this website in a quarterly basis.
The final package will be distributed as a single self installable software package tested and verified to work properly on any XO-laptop.
Our work will have many possible applications outside the XO community, especially in the areas of cross-lingual named entity extraction, and cross-lingual information retrieval, both areas of current active research.
We are part of the Global wordnet project (http://globalwordnet.org/), and we will announce the progress of our work through regular contacts with the Global wordnet community.
There are no nearby XO Lending libraries we can rely on at the moment, it seems like Albania is off the map when it comes to the existence of such support groups.
Moreover
- Our project will benefit from testing and documentation efforts of a wide range of
people and will achieve its true goals only after it has been widely distributed to the community.
- Teachers (especially foreign language teachers) will provide valuable input on how to
use this tool as part of their curricula)
- We will promote our work on the University of Vlora Research page as well as various
conferences such as the Kosova Freedom Software conference where Ervin Ruci and Eustrat Zhupa are scheduled to present a paper on cross-language entity recognition systems in August 2009. “Different languages divide us, but information technology erases that division”.
- We are always looking for mentors and supporters in our quest to develop better tools
for information management and processing, so as to get closer to our goal of using technology to improve the quality of information we receive across different languages.
- The mentor will be someone with access to the natural language processing groups in
the world who can provide valuable advice and guidance i our work.
8. Timeline
- Designing the main outline of the interfaces and the systems for sharing and gathering
information. (2 months)
- Coding the Algorithms that will create language independent functionality across
different language bases using Hidden Markov Models and probabilistic learning algorithms for analysing and processing information across different languages. (8 months)
- Testing, Documenting, optimizing the software to function on laptops with low
processing power and storage capacity. (4 months)
- Porting the application to other platforms and developing the off-line technology for
syncing all work done by individuals into a central repository. (4 months)