User:Cjl/cni-dictionary: Difference between revisions
No edit summary |
No edit summary |
||
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
The goal is to transform a 456 page image-PDF file of an Asháninka-Spanish and Spanish-Asháninka dictionary into OCR'ed digital text. That digitized text is the starting point for any number of efforts. |
|||
Here is the full image-PDF [[File:Dt19.pdf]], which has been broken down to individual pages and grouped in batches below for processing via OCR. |
|||
In preparation, download one of the zips blow (each contains 10 images) and mark it reserved on the wiki. |
In preparation, download one of the zips blow (each contains 10 images) and mark it reserved on the wiki. |
||
Line 28: | Line 32: | ||
13) Repeat process from step 3 with next file, the pulldown settings should be preserved, but visually check to be sure. |
13) Repeat process from step 3 with next file, the pulldown settings should be preserved, but visually check to be sure. |
||
When complete, place the text file results into a folder named "[Batch-cni-dict-TXTnn" (where nn is the number) and upload it to the wiki as a .zip file. |
When complete, place the text file results into a folder named "[Batch-cni-dict-TXTnn" (where nn is the number) and upload it to the wiki as a .zip file. Mark that batch as complete. |
||
Line 43: | Line 47: | ||
* [[File:Batch-cni-dict-PDF06.zip]] |
* [[File:Batch-cni-dict-PDF06.zip]] |
||
** [[File:Batch-cni-dict-TXT06.zip]] |
** [[File:Batch-cni-dict-TXT06.zip]] |
||
* [[File:Batch-cni-dict-PDF07.zip]] |
* [[File:Batch-cni-dict-PDF07.zip]] YP complete |
||
** [[File:Batch-cni-dict-TXT07.zip]] |
** [[File:Batch-cni-dict-TXT07.zip]] |
||
* [[File:Batch-cni-dict-PDF08.zip]] |
* [[File:Batch-cni-dict-PDF08.zip]] |
||
** [[File:Batch-cni-dict-TXT08.zip]] |
** [[File:Batch-cni-dict-TXT08.zip]] |
||
* [[File:Batch-cni-dict-PDF09.zip]] |
* [[File:Batch-cni-dict-PDF09.zip]] |
||
Line 73: | Line 77: | ||
* [[File:Batch-cni-dict-PDF21.zip]] |
* [[File:Batch-cni-dict-PDF21.zip]] |
||
** [[File:Batch-cni-dict-TXT21.zip]] |
** [[File:Batch-cni-dict-TXT21.zip]] |
||
* [[File:Batch-cni-dict-PDF22.zip]] |
* [[File:Batch-cni-dict-PDF22.zip]] complete |
||
** [[File:Batch-cni-dict-TXT22.zip]] |
** [[File:Batch-cni-dict-TXT22.zip]] |
||
* [[File:Batch-cni-dict-PDF23.zip]] |
* [[File:Batch-cni-dict-PDF23.zip]] |
||
Line 119: | Line 123: | ||
* [[File:Batch-cni-dict-PDF44.zip]] |
* [[File:Batch-cni-dict-PDF44.zip]] |
||
** [[File:Batch-cni-dict-TXT44.zip]] |
** [[File:Batch-cni-dict-TXT44.zip]] |
||
* [[File:Batch-cni-dict-PDF45.zip]] |
* [[File:Batch-cni-dict-PDF45.zip]] complete |
||
** [[File:Batch-cni-dict-TXT45.zip]] |
** [[File:Batch-cni-dict-TXT45.zip]] |
||
* [[File:Batch-cni-dict-PDF46.zip]] |
* [[File:Batch-cni-dict-PDF46.zip]] complete |
||
** [[File:Batch-cni-dict-TXT46.zip]] |
** [[File:Batch-cni-dict-TXT46.zip]] |
Latest revision as of 03:59, 21 October 2012
The goal is to transform a 456 page image-PDF file of an Asháninka-Spanish and Spanish-Asháninka dictionary into OCR'ed digital text. That digitized text is the starting point for any number of efforts.
Here is the full image-PDF File:Dt19.pdf, which has been broken down to individual pages and grouped in batches below for processing via OCR.
In preparation, download one of the zips blow (each contains 10 images) and mark it reserved on the wiki.
1) Go to: http://www.onlineocr.net/Default.aspx
2) scroll down to bottom of page
3) Click "Browse" (lower right)
4) Select next file from wherever you saved it.
5) Click "Upload"
6) Set Recognition language: to Spanish [on pulldown]
7) Set Output format: to Text Plain (txt) [on pulldown]
8) Enter 6-digit CAPTCHA number
9) Click "Recognize" (lower left), wait for it to process.
10) scroll further down to bottom of page
11) Click "Download Output File"
12) Save file as Pagennn.txt
13) Repeat process from step 3 with next file, the pulldown settings should be preserved, but visually check to be sure.
When complete, place the text file results into a folder named "[Batch-cni-dict-TXTnn" (where nn is the number) and upload it to the wiki as a .zip file. Mark that batch as complete.
- File:Batch-cni-dict-PDF01.zip complete
- File:Batch-cni-dict-PDF02.zip complete
- File:Batch-cni-dict-PDF03.zip complete
- File:Batch-cni-dict-PDF04.zip
- File:Batch-cni-dict-PDF05.zip
- File:Batch-cni-dict-PDF06.zip
- File:Batch-cni-dict-PDF07.zip YP complete
- File:Batch-cni-dict-PDF08.zip
- File:Batch-cni-dict-PDF09.zip
- File:Batch-cni-dict-PDF10.zip
- File:Batch-cni-dict-PDF11.zip
- File:Batch-cni-dict-PDF12.zip
- File:Batch-cni-dict-PDF13.zip
- File:Batch-cni-dict-PDF14.zip
- File:Batch-cni-dict-PDF15.zip
- File:Batch-cni-dict-PDF16.zip
- File:Batch-cni-dict-PDF17.zip
- File:Batch-cni-dict-PDF18.zip
- File:Batch-cni-dict-PDF19.zip
- File:Batch-cni-dict-PDF20.zip
- File:Batch-cni-dict-PDF21.zip
- File:Batch-cni-dict-PDF22.zip complete
- File:Batch-cni-dict-PDF23.zip
- File:Batch-cni-dict-PDF24.zip
- File:Batch-cni-dict-PDF25.zip
- File:Batch-cni-dict-PDF26.zip
- File:Batch-cni-dict-PDF27.zip
- File:Batch-cni-dict-PDF28.zip
- File:Batch-cni-dict-PDF29.zip
- File:Batch-cni-dict-PDF30.zip
- File:Batch-cni-dict-PDF31.zip
- File:Batch-cni-dict-PDF32.zip
- File:Batch-cni-dict-PDF33.zip
- File:Batch-cni-dict-PDF34.zip
- File:Batch-cni-dict-PDF35.zip
- File:Batch-cni-dict-PDF36.zip
- File:Batch-cni-dict-PDF37.zip
- File:Batch-cni-dict-PDF38.zip
- File:Batch-cni-dict-PDF39.zip
- File:Batch-cni-dict-PDF40.zip
- File:Batch-cni-dict-PDF41.zip
- File:Batch-cni-dict-PDF42.zip
- File:Batch-cni-dict-PDF43.zip
- File:Batch-cni-dict-PDF44.zip
- File:Batch-cni-dict-PDF45.zip complete
- File:Batch-cni-dict-PDF46.zip complete