Semantic MediaWiki

From OLPC
Revision as of 12:08, 9 December 2008 by Skierpage (talk | contribs) (→‎Implementation: update with actual steps)
Jump to navigation Jump to search
   This page is part of the Wiki Cleanup Project.   [[ Wiki SEO | Cleanup | Wiki tasks ]]

There's a lot of information on wiki.laptop.org. wikipedia:Semantic MediaWiki is an extension that lets you annotate existing wiki text and templates so that you can query and explore this information. Done right we get less duplication, more standardization, and easy information reuse.

mw:Semantic Forms is another extension that exploits SMW to let users edit template information in a form.

Where Semantics are being used

You can browse Special:Properties to see properties in use. You can use Special:Allpages to display Form: pages.

For tests

Many tests are using semantic annotations, so we can query for them. See Test cases 8.2.0 and Testcase Query Examples.

To display query results so that they look like the test case (green badge), use format=embedded. To display in some other format, someone has to create Template:Display_query_row_in_a_gray_box and use format=template | template=Display query row in a gray box

Some test results are also using semantic annotations, see TestResults 8.2.0.

Issues for Test cases

Structuring Test cases

We are using long titles that identify area, e.g. Tests/Sugar_Control_Panel/About_Me/Color_Change. Putting the organization in the name ensures test cases will appear in a useful order in generic queries.

Could also/instead use subcategories of Category:Test cases to identify the kind of test, but Semantic Forms has limited capability to assign categories. A query for a category will find things in its subcategories, so it's fine to have a detailed category hierarchy.

It might be possible to query for test cases "like" Tests/Sugar_*. to only show certain test cases.

Issues for test results

The form editor lets you add multiple test results to a test case. So a page like Tests/Upgrades/DataIntegrity can have both values for Property:PassFail, and you can't tell in a query which one is associated with a particular build. To isolate each result, you would have to create a separate page for each combination of test case and build, e.g. TR_Build2230/Upgrades/DataIntegrity. That's a lot of pages to name, although Semantic Forms can automatically generate a unique page as you fill in a form]. Is there any reason to have semantics for each test result, when you can simply visit a page with the test results?

Details: Form:Test_case somehow has adds a Template:Test results for each test result. There's also Form:Test Results which brings up a broken editing form and invokes a non-existent Template:Test Results, see Try_to_break_2.

  • TODO: figure out a generic way to have a link _Log a test result_ that when clicked goes to Form:Test_result and the wiki editor fills in a form that creates a test result linked to the original page. Then any page with a test on it can be "filled out" with test results that anyone can enter!

For projects and tasks

see Form:Tasks and Form:Project, and their respective template pages Template:Task and Template:Project. We would like to use the Project, and Task forms to track projects, and to some extent replace the project database. The tasks template should be for small open tasks that will then be populated on aggregate pages based on various properties like priority and skill set required.

For deployments

All deployments have the deployment template on them. They also have the "edit with form" tab on the main page. This is seen on for example OLPC Peru. This information is aggregated on the page Deployments, using a semantic query. Also see Deployment query examples.

For activities

There is currently the Template:Activity page that is annotates general information about activities. Form:Activity fills this in. There is also a Template:Activity bundle that allows for multiple bundle versions so that you can keep track of, for example, the last activity version that works with build 656 separately from the latest version.

The To add another activity version that works with other builds click "add another form interface works nicely for editing, but doesn't work for querying! It just adds more Property:Activity version and Property:Activity bundle values to the page — they aren't grouped together in any way except visually. This is the same underlying issue as adding multiple Test results to a test case, see #Issues for test results. -- Skierpage 01:08, 14 September 2008 (UTC)

We need to fix the main template so that it will also include a place for the .xo bundle

You can see an example query page Activity_query_examples

Adapting activities and their templates to use Semantic Forms would make it easier to

  • maintain the Activities page
  • show the status of activities in various activity packs
  • query for things like activities that don't have a translation status.

Activity update?

The XO's Software update panel (new in release 8.2.0) consults wiki pages like Activities and Activities/Joyride to figure out whether a newer version of an activity is available. Perhaps these pages could be generated by querying for the properties of activities, but the resulting HTML has to match a rigorous Activity microformat.

Other things to do ??

People want changes but what exactly?

For releases

The Releases page had duplicated summary information for each release. In September 2008 this info was annotated in each release page (see e.g. 8.1.0), so you could query for it. Some queries:

Also see Releases-test page.

ISSUE: Merging release page and "Release notes"

As of September 2008, most releases have a page that only contains the semantic info, and a separate Release_notes/xxx page with all the information about the release, for example 8.1.1 and Release notes/8.1.1. This is a hassle for editors who have to maintain two pages and the link between them, and for users who follow links to a build and then have to go to its release notes.

User:CScott experimented with moving the semantic info for 8.1.0 into Release notes/8.1.0, but that means in a query of e.g. build numbers and firmware versions the "release" appears as Release notes/8.1.0. And it's strange for a page named "Release notes/8.1.0" to make semantic statements like "my ECO is xyz", "my firmware version is nnn", etc.

User:Skierpage thinks the solution is to have pages named e.g. 9.1.0 with the semantic information in a "Standard information" section. Then as release note information becomes available, editors add it to this page in additional sections and categorize the page as Category:Release notes. There is no compelling need for release notes to be subpages of "Release notes". There is unlikely to be title conflict between release numbers and existing pages.

User:CScott also removed past release info from Releases and put it in Release notes, turning the former into Future releases. This also shows the distinction between releases and release notes is not useful. As a release page firms up, it gets release note information.

Info for each release

Here are some properties that releases may have, based on the Releases page and e-mail message "Re: [sw-eco] q2e11 firmware for 8.1.2 ECO" to devel@laptop.org of 2008-07-15:

Future releases vs. released releases

A future release has

  • [[Build number::999]] — it doesn't have a known build
  • [[Status::anything but Released]]
  • [[Target date::some date]] and no [[Release date]]

A release that's happened has

  • [[Build number::NNN]]
  • [[Status::Released]]
  • [[Release date::some date]] and no [[Target date]]

Release info issues

  • If Greg wants to re-display additional status information in the queries on the Releases page like "Candidate build is out - Join the test effort!", we could add Property:More info to release pages and display in queries.
  • The Releases page used to show
    • link to release notes
    • URL for build directory
    • each objective in a bulleted list
Displaying these in the queries would require creating a query results template.

ECOs?

Each ECO is associated with a release and has info like:

  • Title: string (not always a release)
  • Date proposed: date
  • Target date: date, sometimes fuzzy like "week of 2008-03-02"
  • Trac items: multiple bug numbers (and manual copy of their text)
  • Date released
  • Priority: "normal", "high", with string
  • Champion: usually someone with a wiki page.
  • Reviewers: wiki text, often includes "triage team" and/or "Testing: 1 hour smoke"
  • Special testing required: string
  • Rollout: "Manufacturing" and "Field"
  • Checklist: wiki page for URL, usually <ECO page_name>_Checklist

It's unclear if any of this is worth annotating. Maybe a table of ECOs with their build and firmware, but the release is tied to an ECO, not vice-versa.


Reusing release values

wiki.laptop.org uses three methods to display a changing value like The release candidate on multiple pages without having to update each one.

latest release is ==> 13.2.11 || 2020-01-29
current testing stream & release is ==> official & os860
current testing stream & release is ==> {{#ask: Friends in testing/current image stream | ?Current image stream for testing=}} & {{#ask: Friends in testing/current image number | ?Current image number for testing=}}
#show should also work... {{#show: Friends in testing/current image stream ?Current image stream for testing}} & {{#show: Friends in testing/current image number ?Current image number for testing}}
As of 2008-08-07 this last semantic approach is implemented completely the wrong way, these should be generic properties Property:Stream and Property:Build associated with a descriptive page like Testing build or Stable release, not specific properties on specific page names. -- Skierpage 10:27, 12 August 2008 (UTC)

For events

Annotate events, jams, and meetings with [Property:Start date]], and then other pages can query to get a list or timeline of future pages. See the queries on Jams and Events#Upcoming events on wiki.laptop.org

  • Also give events a Property:End date; this is optional for meetings and jams that are usually one day.

Where semantics could be used

For translations?

The main translation infrastructure on the wiki requires that for every page that has translations you maintain a little subpage linking the different translations together. For example, Software components has Software components/translations that lists

 {{translationlist | es | it | ja | ko | orig=en }}

As an alternative, the Semantic-MediaWiki.org site uses a "Docu" template that queries to find all other pages based off the same master page and displays a translation bar linking to them. The existing Template:translation could probably be adapted to work with semantics, since the information that this query needs is already in pages, such as:

 {{ Translation | lang = de  | source = Translating/HowTo  | version = 54321 }}

The benefit is no one has to maintain the "Page foo/translations" page, and there's only one template to use, instead of Translation and Translations. (The existing Template:Translations would call this, defaulting to lang=en and source={{PAGENAME}}.)

For software features

See Features-test page for some sample work.

Greg in October e-mail said

I want to come up with a set of feature, then slice them by a couple of different "selects".
That is:
  • list all features by requestor (e.g. Peru or Scott)
  • list them by technology
  • list them by strategic relevance (e.g. stability)
  • probably some others (e.g. developer, contacts etc.)
I also want to have 2 - 3 levels of detail on each. The main page can list the characteristics per above. Then each feature can link to a detailed requirements definition and a specification (or design proposal).

User:Skierpage comments

As we can see with Release notes and Future releases, wiki users get frustrated with the summaries generated by queries, so they write and maintain their own summaries alongside query results. So unless people are willing to have a "9.1.0 features" summary that they can't edit, annotating all this data will add more information and work!
  • A details page for each feature, in Category:Software features
  • Each feature page has 0-to-many Property:Requested by saying who requests it.
    • This could be a many-valued property -- "requested by OLPC Peru, with priority High", though they're a little clunky to edit.
    • A deployment or user can still easily "list all pages requested by me", just query [[Requested by::{{FULLPAGENAME}}]]).
    • Note the inverse, adding 0-to-many Property:Requests featureto each deployment page, makes it very difficult to have a summary table showing a feature, its technology, and who requests it (you can't really do a join between "list of feature pages" and "pages requesting features").
  • Each feature page can have various technology categories (see CScott's Category:Subsystems?). A simple query will show all its categories.
  • I'm not sure what property or category to use for strategic relevance...
  • 2-3 levels of details sounds like Activities' use of Property:Short description and Property:Description. Note multiple "description" properties frustrate editors and get misused or stale, especially if they're in a form that makes people repeat what they already said in opening paragraphs.

Looking at Feature roadmap, a country request like "Collaboration in groups" decomposes into one or more requirements. Does a country request map to a feature, or is each requirement a separate feature, or is a feature composed of requirements from various requestors? People might not like their specific request map to a feature — "That's not what I asked for, it's just something related you decided to do!" It's possible to model grouping and subsetting in SMW, but it means more pages to edit and more complicated queries.

Later Greg wrote

  • Then I plan to start tagging items for 9.1 and adding other tags. For 9.1, a quick filter will show what is top priority for that release. In the end I want it to look like this: https://launchpad.net/ubuntu/+specs

User:Skierpage comments:

You can implement properties corresponding to all the fields in that "Ubuntu Blueprints" link (Has_design_status, Has_delivery_status, Is_assigned_to, For_build_stream). It might be simpler to just have a Property:Feature_has_status on each feature page with allowed values like "high priority for 9.1.0", "9.1.0 candidate", "may defer post-9.1.0", etc.

Software features December proposal

December 2 Greg wrote to wiki-gang@laptop.org (skierpage comments indented)

How about we start by taking sections 6.2 - 6.8 and adding Semantic tags to that.

Each feature should like to its own page. All the fields for a feature will be visible on its detail page.

The main feature roadmap page will only show a query. That query will show:

  • Name
Greg wants the page titles prefixed "Feature roadmap/name of feature".
  • Primary requester (this field does not exist right now. We should be forced to pick one, then the rest can be in a new field called other requesters, or all requesters and it will show on the detail page. We can start by just changing the existing field and leaving "primary" blank until its filled in).
Done: Property:Primary requester; also Property:Requested by that you can use more than once.
  • Category (this will be the .n level feature section, e.g. "activity-related work" etc.)
OK, subcategories of Category:Software features. There's also User:Mstone's idea of Category:Subsystems
  • Helps deployability Y/N (new field, binary)
Done, Property:Helps deployability
  • Target for 9.1 Y/N (new field, binary)
Done, Property:Target for 9.1. BUT doesn't parse, maybe "Is planned for 9.1" is better? Also is inflexible, could instead do Feature_status::"Target for 9.1"/Defer/etc., or Targeted_for_release::"9.1.0"/9.1.1/9.2.0/Future/ that are more flexible.
  • Assign OLPC resources Y/N (new field, binary)
Doesn't parse, is this phrase a command, a TODO, or ??? Would it be more flexible to have Property:Team member::User:CScott , etc.?
  • Owners (same as it is today)
Instead let's Property:Contact person that you can use more than once.
  • Priority (Critical, High, Medium, Low)
Done, Property:Priority, with numbers in front so they sort right and you can query for priority < 2. ISSUES: Should these match trac's blocker, high, normal, low?

Start with a query sorted by Category.

Details

Subcategories of Category:Software features, or use Property:Is part of some Category:Subsystems? The latter is User:Mstone's attempt to organize Subsystems, unfortunately it doesn't quite map to the subsections of the feature roadmap.

The subsections of Feature roadmap#Roadmap are as of 2008-12-02
Activity-related work
Power management
Hardware support
Collaboration
Performance
Reliability
Journal, File Manipulation
Localization
Security, activation and deployability
Network
GUI, Usability
Server
Other
The subcategories of Category:Subsystems are (dynamic query)
{{#ask:
 | format=list
 | link=all
 }}
Implementation

To repeat this,

  1. Edit Feature roadmap#Roadmap
  2. Open vim, :set encoding=utf-8
  3. Copy wiki markup to vi
  4. Fix Name= to be lower-case feature names.
  5. Add some formatting to it using perl -n fixname.pl, or the vim commands are roughly:
    • :%s/{{Feature_request/{{-stop-}}^V^M{{-start-}}&/ add start and end markers
    • :%s/{{Feature_request/{{XFeature_request/ different template
    • :%s:|Name=\s*:Name=Feature roadmap/: put page into sub-area
    • Also need the page subsection to be a property since subcategories don't work well, so add |feature_subcategory=Activity-related work to XFeature template
  6. Clean up the resulting script: remove extra whitespace and some stray text after clipboard feature.
  7. Configure Pywikipedia bot using lfaraone's config file e-mailed to wiki-gang.
  8. Use Pywikipedia bot's Pagefromfile command to load pages
 path/to/python login.py
 path/to/python pagefromfile_x.py -debug -file:feature_pages.txt -include  -titlestart:"Name=" -titleend:"\n"

or if using perl,

 ''path/to/python pagefromfile_x.py -debug -file:feature_pages.txt -notitle
  1. Remove -debug and run again if it looks good.

For content repositories?

User:Lauren and others created lots of pages in Category:Content Repository about various online libraries and collections. These pages have information such as

  • Format
  • Scope
  • Multilingualism
  • Quality

which could be made into semantic annotations for properties with these names, so you could query "Show me the name, url, and formats of all content repositories of topic::music with quality::high.

These seems to come from Template:Content item (maybe there's a "Create content item" link or option somewhere?). However, skierpage looked at a dozen of these pages in September 2008 and in most of them these fields aren't filled out.

To Do list

Semantic Google maps

To embed google maps in the wiki with locations of projects/deployments/events and with links to appropriate wiki pages, we need

Meanwhile, people have made their own maps on Google
-- Skierpage 02:55, 1 October 2008 (UTC)

Easy Fixes

(ie just edit a little bit of code)


Auto generate templates
Make template an activity and edit with form
Move around the <no include> tags
But when empty it should show something
Make a tag that says this is not semantic stuff
Ignore semantics in a specific name space
The wiki already ignores semantic annotations in the Template: namespace. -- Skierpage 03:01, 1 October 2008 (UTC)


Alt text not working in templates
Feature part of Tests/Sugar_Control_Panel/About_Me/Color_Change
Actually this is hard to fix in SMW. If something is of Type:Page, you can't provide alt text to display in query results instead of the page link, even though it's trivial to do in wiki markup. -- Skierpage 03:01, 1 October 2008 (UTC)
Add new types
  • It's easy to add a property that has some built-in type (like [[Has type::Text]] and restrict it to certain values and maybe control its format.
  • It's possible to add a new "linear" type that converts between different units , e.g. a Type;Area
  • it's hard to add other kinds of types, it requires writing PHP code.
  • -- Skierpage 10:42, 31 July 2008 (UTC)
Time duration
If you want auto-conversion between e.g. weeks and days and seconds, copy what you want from [1] -- Skierpage 10:42, 31 July 2008 (UTC)
User
Not sure what this means. If it's links to User:skierpage then maybe just create Property:User of type:page.
Trac bug number
This is available as Property:Trac bug number of type:Number. You should be able to use Service links and templates in query results to get this to display as a link. But numbers appear with a comma for the thousands separator. Could instead make it Type:URL and you use a template to fabricate the URL, but then it would appear as a long URL in query results. It might be possible to write PHP code to create a Type:Trac that displays appropriately, similar to how the <trac>4321</trac> code displays.
Image
Just some property of Type:Page (the default), in most cases SMW do the right thing and display the image .
Template?
Make the free text box larger.
When creating a template, you should be able to create properties in that window as well as use already created properties.


Way to 'prettify' links when used in SMW properties with a "Page" type.
Example: Color Change instead of Tests/Sugar_Control_Panel/About_Me/Color_Change.
See #Suggestions_for_test_cases. Maybe have a "Brief name" property and display that while linking to the full page name. Or, in queries use format=template and in the template use a parser string function to change the title from Tests/Sugar_Control_Panel/About_Me/Color_Change. Or, use categories to identify the hierarchy. You risk collision if the page title is just Color change, so you could have a little "TC" prefix, just TC/Color Change or a pseudo-namespace TC:Color_change


Translating Property names
Nathany on #cc


When making a template add a more organized presentation of properties

Complex fixes

(ie require creating a new extension)


Something in between a list of specific allowed values and free text
I want to be able to have the equivalent of an other field
One approach: have two properties like Property:Version and Property:Version_more that you always display together, the first has restricted values, the other is text. Another is to use Type:Page and have pages for the allowed values. So not-allowed values "work", but stand out in red.


Way to automatically email users who maintain pages


Automatically populate data
ex page last updated case
SMW 1.1 can't query on or display such metadata. I think other extensions like DPL can.
Way to pull %translated from pootle
Could have a bot that gets this information and populates wiki pages with it.

Semantic annotations

The wiki has the wikipedia:Semantic MediaWiki extension installed, the thing to do is use it appropriately.

Successful rollout suggestions

  • Deliver benefits early.
    • To a casual editor, SMW is just more markup, more stuff to learn, an ugly factbox on pages, more "Semantic Web 3.0" hype. So find use cases that deliver obvious benefits early
  • Don't create lots of properties and spend time annotating lots of pages until you have use cases for them. A person reading one wiki page after another does not need semantic annotations.
  • It's fine to convert templates to make semantic annotations, since you get this "for free"


wiki.laptop.org issues

Use Template:SMW issue to note a problem on any page, it'll show up on the Property:SMW issue page.

Template conversion

  • Use arraymap to turn lists of things into multiple property assignments.
  • Use Semantic Forms... (not sure about it)
  • Beware, people take advantage of wiki markup to stick all kinds of text inside templates. Maybe have an "fieldname extra info" field for arbitrary text.

Generic names

Should Property:Test objective be a generic name, or specialize it to Property:Test case objective ? skierpage's bias is to use generic names, but document on their property page what templates and pages use them.

How tos

Modify or remove a property

Summary:

  1. Edit the template the form fills in
  2. edit the form
  3. edit queries that use the property

Here's an example from User:Skierpage of removing Property:Contact person from deployment pages.

A human being should have documented the Property:Contact person page with:

Template:Foo, filled in by Form:Bar, may set this property.

1. Instead, if you edit a deployment page you can see

 {{Deployment
 ...
 |contact person=user:sbuchele

therefore Template:Deployment sets this property. So I edited the template and removed the table row that sets this property. Changing a template should (and did!) eventually update pages invoking it like OLPC_Ghana; if you don't want to wait you would Edit & Save each deployment page to get it to happen faster.

2. To modify the form, you could guess it's called Form:Deployment, or use Special:Allpages to show all pages in the Form: namespace, or look at Category:Deployments and see [[Has default form::Form:Deployment]] which makes the "edit with form" tab appear.

I'm not an expert in Semantic Forms, so I just edited Form:Deployment, deleted the "Contact Person" part of it, and it seemed to work.


3. Then, edit queries that show Contact person and delete the

 | ?Contact person

line. I did so on the Deployments page.