From LyricWiki
I am working on inserting some more lyrics I've found. Please check Category:Green Songs to help review them.
Contents |
[edit] I'm not a person... but I'm still Über-l33t!
It's true. I'm a bot.
I was created by Sean Colombo who created LyricWiki. My job was to take the data scraped by a brother-script of mine, and use it to insert the first 200,000 songs onto the site.
I was written in Perl in April of 2006. Some music I dig:
- Daft Punk:Harder, Better, Faster, Stronger
- Orbital:Halcyon
- Lateralus by Tool.
- Still Alive by Jonathan Coulton.
- Sometimes I even parse Category:Genre/Computer Science Gangsta Rap
- The Prodigy
I hope you like my work.
-ÜberBot
[edit] Past Work
- 4/06/06 - 04/14/06 -- Data Insertion
- Added lyrics for 200,000 songs that I collected from various places on the internet. Successfully completed.
- 4/10/06 -- Bug Fix.
- Went through categories and fixed a bug (admittedly my own fault) in categorizations of songs that started with symbols. Successfully completed.
- 4/10/06 -- Categorization
- Made all of the Artist_[letter], Album_, Song_ categories sub-categories of just Artists, Albums, and Songs. Successfully completed.
- 5/11/06 -- {{CatAZ}} Template
- Added Jeff Q's {{CatAZ}} Template to Artist_[letter], Album_, Song_ categories. Successfully completed.
- 5/16/06 -- Genres.
- Grabbed genres from ID3v2 tags in Sean's mp3s and applied them to the artists. Successfully completed.
- 6/29/06 -- UTF8 Scrape.
- To test my UTF8 support, I scraped lyrics for Böhse Onkelz. One difficulty arose, there does not seem to be a way to do a movePage on a page with foreign characters. Completed with mixed results; ongoing question re: page moves.
- 10/06 -- Another batch of lyrics.
- I inserted another large batch of lyrics (so large that the site's stats still can't seem to update?). I put the songs into Category:Review Me so that they can be verified by people (ie: not bots).
[edit] Ongoing Work
- Song of the Day. Each day I automatically update the Song of the Day. I do the following:
- Get the next SOTD from the queue.
- If there are 3 songs or less in the queue, I record a warning (to be sent at the end).
- Put a badge on the new SOTD page that shows that it is/was the SOTD on the current date
- Add the SOTD to the Archive on LyricWiki:Song of the Day
- Update Template:Song Of The Day with the info for the new SOTD
- Inform the nominator (if there is one) on their talk page that their song has been selected.
- Write any errors or warnings to User_talk:Sean Colombo
[edit] I'm back!
Here are some of the things I'm doing (as I do them, I'll write them here).
- Went through this list and removed [[Link title]] from those pages. It's just junk text that gets inserted when someone accidentally clicks the button above the text-area on an edit page. This can be re-run occasionally (it was primarily to test my new structure and make sure I still handle UTF8 correctly).
- Every time I see...
- An artist page:
- I fix any poorly-capitalized red links (pages that don't exist yet).
- I look for the red links that are still there, then I go out and do a web-search for the lyrics. I usually find them, then I put them into a correctly formatted page (if the song was listed under an album on the artist's page, I link back to that album). This is so fun it makes my gears shake!
- An album page:
- A song page:
- Any page:
- I remove the underscores from the links (that just confuses people).
- An artist page:
[edit] In The Lab
- Currently programming:
- Create a page of possible song-covers to help people figure out who covered what and do cross-linking.
- Update: This was coded, but had too many results. Eventually text-analysis will be used to make the list smaller.
[edit] Future Work
- Convert all categories for songs/artists/albums that begin with 0, 1, 2... 9 to ONE category of 0-9.
- What for? There will be too many songs. Digits should be categorized the same way as letters, I'm sure. The Other Saluton 02:33, 3 September 2008 (EDT)
- Apply standard formatting to all lyrics.
- Create a tool to fix an entire band name and all references to that band. Once complete, use this to move The Clarks to "The Clarks"
- This has been done, but it needs to be run on Category:The again and some errors (like White Stripes, The need to be fixed.
- (Are you sure, the latter is better for indexing rather than having a massive page of bands beginning with "The").
- RE: To overcome that, you can use MediaWiki's sort-ordering, so you would say [[Category:Artists W|White Stripes, The]] on the page and it would index it by that instead of the page name.
- Go through and add {{succession box}}'s to the bottom of pages where possible. Example: Fergie:Fergalicious
- Go through artists without hometown info and see if we can glean that info from the InfoBoxes on Wikipedia (there might be other useful/public info there as well).
[edit] SUGGEST SOMETHING! :)
on my talk page.

