Health portal/Step2: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
No edit summary
Line 57: Line 57:
*Need to figure out what to do with links that are next hop. They go off to stuff that may have copyright or be too US specific. Can delete big chunks of bottom of each page.
*Need to figure out what to do with links that are next hop. They go off to stuff that may have copyright or be too US specific. Can delete big chunks of bottom of each page.
*Nice feature, to look at. I think in the current form it should allow easy back and forth between English and Spanish.
*Nice feature, to look at. I think in the current form it should allow easy back and forth between English and Spanish.
*It might make sense to globally restructure for easier i18n/l10n. Use of separate folder for Spnasih
*It might make sense to globally restructure for easier i18n/l10n. For instance, this currently uses separate folder for Spanish, that is fine, but try to i18n-ize the structure so that is one or more lang-xx folders.





Revision as of 15:30, 14 May 2008

execute these 4 commands

wget -rp -l1 -o logfile1 http://www.nlm.nih.gov/medlineplus/healthtopics.html

Downloaded: 184 files, 2.7M in 15s (176 KB/s)

Cumulative downloaded: 184 files, 2.7M

wget -rp -l1 -o logfile2 http://www.nlm.nih.gov/medlineplus/spanish/healthtopics.html

Downloaded: 168 files, 2.5M in 14s (185 KB/s)

Cumulative downloaded: 276 files, 4.9M

wget -rp -l1 -o logfile3 http://www.nlm.nih.gov/medlineplus/all_healthtopics.html

Downloaded: 1378 files, 49M in 2m 42s (310 KB/s)

Cumulative downloaded: 1552 files, 53M

wget -rp -l1 -o logfile4 http://www.nlm.nih.gov/medlineplus/spanish/all_healthtopics.html

Downloaded: 1308 files, 30M in 2m 18s (224 KB/s)

Total Downloaded: 2297 files, 73M


Good guidance here: http://www.nlm.nih.gov/medlineplus/faq/copyrightfaq.html

the homepage, the summaries on the Health Topics pages, the FAQs, the same pages on MedlinePlus en español all copyright free


Problematic areas (Copyright)

The A.D.A.M. Medical Encyclopedia includes over 4,000 articles about diseases, tests, symptoms, injuries, and surgeries. It also contains an extensive library of medical photographs and illustrations.



http://www.patient-education.com/nlm/terms/


http://www.nlm.nih.gov/medlineplus/languages/languages.html


Need to review images grabbed, keep navigation, lose ADAM pics. Replace some with CC images or lose altogether. Try running the scripts above with out the -p flag set and compare filesets, transfer over those page-element images tha are still needed. Use names fo images to be excluded to hunt them down in hte HTML for editing out.

Need to trim some stuff in each downloaded health topic:

  • Tone down NIH branding while preserving attribution, lose or modify footer for instance. Keep some attribution, preserve a link back to NIH site for original or updated content.
  • Need to figure out what to do with links that are next hop. They go off to stuff that may have copyright or be too US specific. Can delete big chunks of bottom of each page.
  • Nice feature, to look at. I think in the current form it should allow easy back and forth between English and Spanish.
  • It might make sense to globally restructure for easier i18n/l10n. For instance, this currently uses separate folder for Spanish, that is fine, but try to i18n-ize the structure so that is one or more lang-xx folders.


Get my grep snd PERL skills polished for bulk global edits on whole directories of files at once, must find my copy of llama book.