Just Solve the File Format Problem:Community portal

From Just Solve the File Format Problem
Revision as of 05:36, 17 November 2012 by Chronomex (Talk | contribs)

Jump to: navigation, search
please add your signature by typing ~~~~ if you add or reply



Below is a list of "issues" which would ordinarily be in a ticketing system of some kind, but are here on the Wiki instead, because that's how we roll. When things are resolved feel free to remove or strike out the issue.

Time zone setting

OK, to get this community portal going... is this the place to report issues with the wiki server configuration? Anyway, its time-zone setting seems to be a bit odd. It's set (apparently) to UTC + 4 hours (somewhere in Asia?), but it thinks it's in UTC, so if you set up your user configuration to adjust it to your local time zone, it ends up 4 hours off. I had to use "-08:00" to get my current EDT time. (That will change by one hour in a week or two when Daylight Saving Time ends.) Dan Tobias (talk) 22:23, 28 October 2012 (UTC)


The Wiki is obviously working (yay), but it would be nice if the MediaWiki .htaccess file was adjusted so that the index.php disappears from wiki URLs.


I'm not lawyer but I don't think declaring the content here Public Domain (US) as discussed in the FAQ will have desirable consequences outside the United States. It's been my experience working with datasets published by the US federal government that people outside the US are uncertain about whether or not they can use the data. I think using CC0 would be preferable to just saying Public Domain, if you really want to encourage reuse. Edsu (talk) 14:26, 1 November 2012 (UTC)

Using templates

I'm a newbie with Mediawiki. Could someone add a quick help note about using the templates for this site? --Phillipkent (talk) 14:34, 4 November 2012 (UTC)

Mediawiki site has extensive info about that: http://www.mediawiki.org/ Maurice.de.rooij (talk) 23:03, 7 November 2012 (UTC)

Using bots

I am working on a bot that hoovers all file format extensions and have it create pages for them in the "ext" namespace. More bots are welcome for gardening, any ideas??? Also see http://justsolve.archiveteam.org/api.php for inspiration Maurice.de.rooij (talk) 22:51, 7 November 2012 (UTC)

How do you figure out what in a page is a file extension? Does an infobox have to be filled in correctly? Dan Tobias (talk) 23:29, 7 November 2012 (UTC)
If the template formatinfo is filled in correctly, the bot uses that, otherwise text is parsed with regex. Pronom signature files are parsed locally to check if the extension is known. That is my plan for version 0.0001 Maurice.de.rooij (talk) 23:36, 7 November 2012 (UTC)
Template:FormatInfo for people unfamiliar with this template Maurice.de.rooij (talk) 23:48, 7 November 2012 (UTC)
So I assume the articles themselves should not be the file extension but some more descriptive string? Like Adobe Photoshop instead of PSD, or ScreamTracker 3 instead of S3M? --Darkstar (talk) 03:12, 8 November 2012 (UTC)
Yes, that would be easiest for the bot to pick up. I also started adding categories, not only for better navigation, but to create some kind of Ontology as well like Thing > Physical Format > Video > Cassette > VHS. The nav headers are also very useful for that. Maurice.de.rooij (talk) 13:39, 8 November 2012 (UTC)
Bender the bot has finished its first run of pages with extensions and their corresponding PUIDs: http://justsolve.archiveteam.org/index.php?title=Special:RecentChanges&hidebots=0

Maurice.de.rooij (talk) 19:28, 16 November 2012 (UTC)

Headers and ontologies

For a while, people were adding "breadcrumb" headers linking to the ontology levels above the current page. Then they were removing those headers in favor of just using the ontology links in the format infoboxes. Now they seem to be adding the breadcrumbs back. Which style is actually the "official" standard here? Dan Tobias (talk) 21:31, 11 November 2012 (UTC)

Well, there's no improvement yet... there are still people adding breadcrumbs and people removing them, so I have no idea whose style to follow. Dan Tobias (talk) 07:02, 13 November 2012 (UTC)
I like to use categories, as breadcrumbs are ugly and tedious to add and maintain and suggest we use categories as it is the official (?) policy on Wikipedia. Maurice.de.rooij (talk) 04:21, 16 November 2012 (UTC)

Additions and suggestions

(This is probably the wrong place for this but I figured here was a good start...)

Is there a place for a new template object - signature or magic number? e.g. jpg has ÿØÿè as a highly dependable BOF marker.
this has already been researched extensively, please read PRONOM. Maurice.de.rooij (talk) 04:17, 16 November 2012 (UTC)
Although it would be a good idea to to just that and create a form of registry. The wiki API can output several formats, eg. xml. Maurice.de.rooij (talk) 04:37, 16 November 2012 (UTC)
First experimental page: Registry:fmt/111 Maurice.de.rooij (talk) 05:36, 16 November 2012 (UTC)
It's a shame that the pages would be named after something as opaque as a number, but I don't know how you'd get around that... But I'd recommend, if possible, to create a field containing only the literal hexadecimal byte values, instead of the hodgepodge of regex abbreviations. And if you add categories based on the magic numbers, then someone could generate a finite state machine that serves as a file format identifier. Just thinking out loud here. Gphemsley (talk) 05:08, 17 November 2012 (UTC)

Harvesting resources

IMHO it would be a good idea to harvest all webpages which are linked in the resource sections. Of course we have Wayback, but, but, but? ... Maurice.de.rooij (talk) 04:15, 16 November 2012 (UTC)

At the end of the month (and perhaps periodically thereafter) Archiveteam will be doing a deep focused crawl of all the sites linked from this wiki, and passing it off to the Wayback Machine at archive.org. Chronomex (talk) 05:36, 17 November 2012 (UTC)
Personal tools