Sean Colombo/Case Insensitivity
Talk4
2,334,754pages on
this wiki
this wiki
< User:Sean Colombo
See also: http://meta.wikimedia.org/wiki/Case_sensitivity_of_page_names
Contents |
Core concepts
Edit
- Should be able to disable the extension and return to 'normal' functionality at any time.
- Should make it so that typing "wrong" capitalization doesn't prevent a user from accessing a page (both on the site and via the API).
- Should be intuitive to users what's going on if there is some side-effect, even to people who weren't aware of this change.
TODO
Edit
Development
Edit
- Might want to base it off of http://www.mediawiki.org/wiki/Extension:TitleKey and share the same index table. Note that the search-suggest is already case-insensitive. Not sure if that's because we have a Lucene backend (in which case... can we just use that instead of a separate index?) or if the table already exists. Might still be worth installing the extension because it case-insensitivzes(?) the OpenSearch too. Ours currently is case-sensitive.
- Add case_folded_title column to page table in the database... or a separate table as per TitleKey (on dev).
- Create & run a script to create the case-folded versions of all titles.
- Make sure that case-folding works with foreign characters (including simple ones such as Ü but also complex such as asian characters). See mb_strtolower($str);.
- Find a list of all of the only-changed-case redirects and delete them with a bot.
- Create a list of all of the currently-existing conflicts which would cause key errors.
- Fix the conflicts in dev.
- Make extension:
- If a page does not exist, but a page exists with the folded-case of the same page, use that (should be similar code to implied-redirects).
- If moving to a page which already has a page with the destination's case-folded title, disallow this... unless: (next item)
- If moving to a page which has the same case-folded title, allow this.
- Disallow page-creation of articles with duplicate case-folded titles. For the most part, this won't happen to users because they'll get redirected first, but it could still happen.
- Verify that the canonical page name in the meta tags is correct, otherwise search engines will penalize us for having duplicate content (worth verifying this on normal redirects and ImpliedRedirects too).
- Make sure the parser's red/blue linking colorization links anything which has a matching case-folded page.
- Change API to use the case-folded version (but keep the existing code around somehow, so that we could switch back by using a config setting). This would be a good time to test that code which extracted the fuzzy finding of pages to be its own function (could just use diff sub-functions for case-sensitive and case-insensitive dbs).
Rollout
Edit
- Add case_folded_title column to page table in the database (on production).
- Run script to create the case-folded versions of all titles.
- Get the list of conflicts
- Resolve the conflicts as quickly as possible
- Can we delete case-change-only redirects using the API _after_ the extension is on? Otherwise, make sure to do them VERY fast right before flipping the extension on.
- Flip the extension live using Wikifactory (as soon after resolveing the conflicts as possible... to prevent more conflicts from being created).