Just Solve the File Format Problem talk:Community portal
|  (→Additions and suggestions) |  (→Additions and suggestions:  reply) | ||
| (8 intermediate revisions by 5 users not shown) | |||
| Line 45: | Line 45: | ||
| ::: I am comfortable with both being on any page during this period, and then a cleanup operating being done down the line. As soon as categories and infoboxes have a better consistency, Breadcrumbs are doomed. --[[User:Jason Scott|Jason Scott]] ([[User talk:Jason Scott|talk]]) 06:24, 19 November 2012 (UTC) | ::: I am comfortable with both being on any page during this period, and then a cleanup operating being done down the line. As soon as categories and infoboxes have a better consistency, Breadcrumbs are doomed. --[[User:Jason Scott|Jason Scott]] ([[User talk:Jason Scott|talk]]) 06:24, 19 November 2012 (UTC) | ||
| − | + | === Additions and suggestions === | |
| (This is probably the wrong place for this but I figured here was a good start...) | (This is probably the wrong place for this but I figured here was a good start...) | ||
| :Is there a place for a new template object - signature or magic number? e.g. jpg has ÿØÿè as a highly dependable BOF marker. | :Is there a place for a new template object - signature or magic number? e.g. jpg has ÿØÿè as a highly dependable BOF marker. | ||
| :: this has already been researched extensively, please read [[PRONOM]]. [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 04:17, 16 November 2012 (UTC) | :: this has already been researched extensively, please read [[PRONOM]]. [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 04:17, 16 November 2012 (UTC) | ||
| − | :::I'm not sure a hop to PRONOM helps. If the point of this wiki is to aggregate data about formats, we should consider surfacing signature info directly, and linking the reference back to PRONOM. At the moment the main identifying characteristic of a "format" is not visible on the main record page...  | + | :::I'm not sure a hop to PRONOM helps. If the point of this wiki is to aggregate data about formats, we should consider surfacing signature info directly, and linking the reference back to PRONOM. At the moment the main identifying characteristic of a "format" is not visible on the main record page for that format... --[[User:JaygattusoNLNZ|JaygattusoNLNZ]] ([[User talk:JaygattusoNLNZ|talk]]) 02:06, 20 November 2012 (UTC)  | 
| ::: Although it would be a good idea  to to just that and create a form of registry. The wiki API can output several formats, eg. xml. [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 04:37, 16 November 2012 (UTC) | ::: Although it would be a good idea  to to just that and create a form of registry. The wiki API can output several formats, eg. xml. [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 04:37, 16 November 2012 (UTC) | ||
| :::: First experimental page: [[Registry:fmt/111]] [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 05:36, 16 November 2012 (UTC) | :::: First experimental page: [[Registry:fmt/111]] [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 05:36, 16 November 2012 (UTC) | ||
| ::::: It's a shame that the pages would be named after something as opaque as a number, but I don't know how you'd get around that... But I'd recommend, if possible, to create a field containing only the literal hexadecimal byte values, instead of the hodgepodge of regex abbreviations. And if you add categories based on the magic numbers, then someone could generate a finite state machine that serves as a file format identifier. Just thinking out loud here. [[User:Gphemsley|Gphemsley]] ([[User talk:Gphemsley|talk]]) 05:08, 17 November 2012 (UTC) | ::::: It's a shame that the pages would be named after something as opaque as a number, but I don't know how you'd get around that... But I'd recommend, if possible, to create a field containing only the literal hexadecimal byte values, instead of the hodgepodge of regex abbreviations. And if you add categories based on the magic numbers, then someone could generate a finite state machine that serves as a file format identifier. Just thinking out loud here. [[User:Gphemsley|Gphemsley]] ([[User talk:Gphemsley|talk]]) 05:08, 17 November 2012 (UTC) | ||
| :::::: Agree about the regex hodgepodge. This is an experiment, so all input is welcome! Did you mean a finite state something like http://justsolve.archiveteam.org/wiki/ByteSequence:0/d0cf11e0a1b11ae1  ? [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 19:40, 19 November 2012 (UTC) | :::::: Agree about the regex hodgepodge. This is an experiment, so all input is welcome! Did you mean a finite state something like http://justsolve.archiveteam.org/wiki/ByteSequence:0/d0cf11e0a1b11ae1  ? [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 19:40, 19 November 2012 (UTC) | ||
| + | ::::::: Well, it's not clear to me exactly what that example is showing, but I was thinking "Byte 1=X, go here if Byte 2=Y, go here if Byte 2=Z", etc. So then you'd have a hierarchy of which sequence of bytes means which format. [[User:Gphemsley|Gphemsley]] ([[User talk:Gphemsley|talk]]) 17:11, 21 November 2012 (UTC) | ||
| + | :::::::: The example shows a potential lookup for a bytesequence, in this case for OLE2, and refers to the [[PUID]] it belongs to. It will become clear when I have added more context coming days. Your hierarchy idea would also fit here and for sure going to try that, although IMHO it would be better to use the hexadecimal byte value - [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 11:13, 22 November 2012 (UTC) | ||
| + | ::::::::: The hexadecimal byte value was exactly what I was suggesting to use. [[User:Gphemsley|Gphemsley]] ([[User talk:Gphemsley|talk]]) 18:37, 22 November 2012 (UTC) | ||
| ===Harvesting resources=== | ===Harvesting resources=== | ||
| Line 61: | Line 64: | ||
| :: Nice, thanks for the info. [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 09:59, 19 November 2012 (UTC) | :: Nice, thanks for the info. [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 09:59, 19 November 2012 (UTC) | ||
| + | |||
| + | === Not all extensions show up in category pages?  === | ||
| + | [[User:Johanvanderknijff|Johanvanderknijff]] ([[User talk:Johanvanderknijff|talk]]) 10:43, 20 November 2012 (UTC) | ||
| + | It looks like not all file extensions that are defined using the infoboxes show up in the category overview pages. For example, looking at:  | ||
| + | |||
| + | http://justsolve.archiveteam.org/index.php/WQ1 | ||
| + | |||
| + | And: | ||
| + | |||
| + | http://justsolve.archiveteam.org/index.php?title=Category:File_formats_by_extension&subcatfrom=P%0AFile+formats+with+extension+.png#mw-subcategories | ||
| + | |||
| + | There's no .wq1 extension in the category page, even though I have no idea why! There are some more examples. | ||
| + | |||
| + | : It looks like if the category is new, you need to create the page with e.g. "Category:File formats by extension|W" (replace quotes with internal links [[ ]]) [[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 12:00, 20 November 2012 (UTC) | ||
Latest revision as of 18:37, 22 November 2012
| Contents | 
[edit] Time zone setting
OK, to get this community portal going... is this the place to report issues with the wiki server configuration? Anyway, its time-zone setting seems to be a bit odd. It's set (apparently) to UTC + 4 hours (somewhere in Asia?), but it thinks it's in UTC, so if you set up your user configuration to adjust it to your local time zone, it ends up 4 hours off. I had to use "-08:00" to get my current EDT time. (That will change by one hour in a week or two when Daylight Saving Time ends.) Dan Tobias (talk) 22:23, 28 October 2012 (UTC)
- I just set it to EST. --Jason Scott (talk) 23:19, 18 November 2012 (UTC)
- Seems to work now, but perhaps as a side effect, now I don't see any new changes after around 6 PM (EST) in the Recent Changes function. Dan Tobias (talk) 03:04, 19 November 2012 (UTC)
 
- Oh, I am SURE the time change annoyed the Wiki. It'll work out. --Jason Scott (talk) 06:22, 19 November 2012 (UTC)
 
 
[edit] index.php
The Wiki is obviously working (yay), but it would be nice if the MediaWiki .htaccess file was adjusted so that the index.php disappears from wiki URLs.
- It's a minor aesthetic thing, but I happily have made those changes. It should be /wiki/ from now on. --Jason Scott (talk) 23:30, 18 November 2012 (UTC)
[edit] Licensing
I'm not lawyer but I don't think declaring the content here Public Domain (US) as discussed in the FAQ will have desirable consequences outside the United States. It's been my experience working with datasets published by the US federal government that people outside the US are uncertain about whether or not they can use the data. I think using CC0 would be preferable to just saying Public Domain, if you really want to encourage reuse. Edsu (talk) 14:26, 1 November 2012 (UTC)
- Public Domain is Public Domain in the US and in many countries. It is trivial for someone to take this public domain wiki and turn it into a bestselling novel, a screensaver, a movie starring Christian Bale, a CC0-licensed Wiki, a GFDL-licensed wiki, or a floor wax. So in the future, if it needs to be the case that someone wants to take this work and convert it randomly to a new format, the openness of a full public-domain-declared wiki will work for them. --Jason Scott (talk) 06:22, 19 November 2012 (UTC)
- I got talked into CC0. --Jason Scott (talk) 19:02, 19 November 2012 (UTC)
 
[edit] Using templates
I'm a newbie with Mediawiki. Could someone add a quick help note about using the templates for this site? --Phillipkent (talk) 14:34, 4 November 2012 (UTC)
- Mediawiki site has extensive info about that: http://www.mediawiki.org/ Maurice.de.rooij (talk) 23:03, 7 November 2012 (UTC)
[edit] Using bots
I am working on a bot that hoovers all file format extensions and have it create pages for them in the "ext" namespace. More bots are welcome for gardening, any ideas??? Also see http://justsolve.archiveteam.org/api.php for inspiration Maurice.de.rooij (talk) 22:51, 7 November 2012 (UTC)
- How do you figure out what in a page is a file extension? Does an infobox have to be filled in correctly? Dan Tobias (talk) 23:29, 7 November 2012 (UTC)
-  If the template formatinfo is filled in correctly, the bot uses that, otherwise text is parsed with regex. Pronom signature files are parsed locally to check if the extension is known. That is my plan for version 0.0001 Maurice.de.rooij (talk) 23:36, 7 November 2012 (UTC)
- Template:FormatInfo for people unfamiliar with this template Maurice.de.rooij (talk) 23:48, 7 November 2012 (UTC)
 
 
-  If the template formatinfo is filled in correctly, the bot uses that, otherwise text is parsed with regex. Pronom signature files are parsed locally to check if the extension is known. That is my plan for version 0.0001 Maurice.de.rooij (talk) 23:36, 7 November 2012 (UTC)
- So I assume the articles themselves should not be the file extension but some more descriptive string? Like Adobe Photoshop instead of PSD, or ScreamTracker 3 instead of S3M? --Darkstar (talk) 03:12, 8 November 2012 (UTC)
- Yes, that would be easiest for the bot to pick up. I also started adding categories, not only for better navigation, but to create some kind of Ontology as well like Thing > Physical Format > Video > Cassette > VHS. The nav headers are also very useful for that. Maurice.de.rooij (talk) 13:39, 8 November 2012 (UTC)
 
- Bender the bot has finished its first run of pages with extensions and their corresponding PUIDs: http://justsolve.archiveteam.org/index.php?title=Special:RecentChanges&hidebots=0
Maurice.de.rooij (talk) 19:28, 16 November 2012 (UTC)
[edit] Headers and ontologies
For a while, people were adding "breadcrumb" headers linking to the ontology levels above the current page. Then they were removing those headers in favor of just using the ontology links in the format infoboxes. Now they seem to be adding the breadcrumbs back. Which style is actually the "official" standard here? Dan Tobias (talk) 21:31, 11 November 2012 (UTC)
- Well, there's no improvement yet... there are still people adding breadcrumbs and people removing them, so I have no idea whose style to follow. Dan Tobias (talk) 07:02, 13 November 2012 (UTC)
-  I like to use categories, as breadcrumbs are ugly and tedious to add and maintain and suggest we use categories as it is the official (?) policy on Wikipedia. Maurice.de.rooij (talk) 04:21, 16 November 2012 (UTC)
- I am comfortable with both being on any page during this period, and then a cleanup operating being done down the line. As soon as categories and infoboxes have a better consistency, Breadcrumbs are doomed. --Jason Scott (talk) 06:24, 19 November 2012 (UTC)
 
 
-  I like to use categories, as breadcrumbs are ugly and tedious to add and maintain and suggest we use categories as it is the official (?) policy on Wikipedia. Maurice.de.rooij (talk) 04:21, 16 November 2012 (UTC)
[edit] Additions and suggestions
(This is probably the wrong place for this but I figured here was a good start...)
- Is there a place for a new template object - signature or magic number? e.g. jpg has ÿØÿè as a highly dependable BOF marker.
-  this has already been researched extensively, please read PRONOM. Maurice.de.rooij (talk) 04:17, 16 November 2012 (UTC)
- I'm not sure a hop to PRONOM helps. If the point of this wiki is to aggregate data about formats, we should consider surfacing signature info directly, and linking the reference back to PRONOM. At the moment the main identifying characteristic of a "format" is not visible on the main record page for that format... --JaygattusoNLNZ (talk) 02:06, 20 November 2012 (UTC)
-  Although it would be a good idea  to to just that and create a form of registry. The wiki API can output several formats, eg. xml. Maurice.de.rooij (talk) 04:37, 16 November 2012 (UTC)
-  First experimental page: Registry:fmt/111 Maurice.de.rooij (talk) 05:36, 16 November 2012 (UTC)
-  It's a shame that the pages would be named after something as opaque as a number, but I don't know how you'd get around that... But I'd recommend, if possible, to create a field containing only the literal hexadecimal byte values, instead of the hodgepodge of regex abbreviations. And if you add categories based on the magic numbers, then someone could generate a finite state machine that serves as a file format identifier. Just thinking out loud here. Gphemsley (talk) 05:08, 17 November 2012 (UTC)
-  Agree about the regex hodgepodge. This is an experiment, so all input is welcome! Did you mean a finite state something like http://justsolve.archiveteam.org/wiki/ByteSequence:0/d0cf11e0a1b11ae1  ? Maurice.de.rooij (talk) 19:40, 19 November 2012 (UTC)
-  Well, it's not clear to me exactly what that example is showing, but I was thinking "Byte 1=X, go here if Byte 2=Y, go here if Byte 2=Z", etc. So then you'd have a hierarchy of which sequence of bytes means which format. Gphemsley (talk) 17:11, 21 November 2012 (UTC)
- The example shows a potential lookup for a bytesequence, in this case for OLE2, and refers to the PUID it belongs to. It will become clear when I have added more context coming days. Your hierarchy idea would also fit here and for sure going to try that, although IMHO it would be better to use the hexadecimal byte value - Maurice.de.rooij (talk) 11:13, 22 November 2012 (UTC)
 
 
-  Well, it's not clear to me exactly what that example is showing, but I was thinking "Byte 1=X, go here if Byte 2=Y, go here if Byte 2=Z", etc. So then you'd have a hierarchy of which sequence of bytes means which format. Gphemsley (talk) 17:11, 21 November 2012 (UTC)
 
-  Agree about the regex hodgepodge. This is an experiment, so all input is welcome! Did you mean a finite state something like http://justsolve.archiveteam.org/wiki/ByteSequence:0/d0cf11e0a1b11ae1  ? Maurice.de.rooij (talk) 19:40, 19 November 2012 (UTC)
 
-  It's a shame that the pages would be named after something as opaque as a number, but I don't know how you'd get around that... But I'd recommend, if possible, to create a field containing only the literal hexadecimal byte values, instead of the hodgepodge of regex abbreviations. And if you add categories based on the magic numbers, then someone could generate a finite state machine that serves as a file format identifier. Just thinking out loud here. Gphemsley (talk) 05:08, 17 November 2012 (UTC)
 
-  First experimental page: Registry:fmt/111 Maurice.de.rooij (talk) 05:36, 16 November 2012 (UTC)
 
 
-  this has already been researched extensively, please read PRONOM. Maurice.de.rooij (talk) 04:17, 16 November 2012 (UTC)
[edit] Harvesting resources
IMHO it would be a good idea to harvest all webpages which are linked in the resource sections. Of course we have Wayback, but, but, but? ... Maurice.de.rooij (talk) 04:15, 16 November 2012 (UTC)
- At the end of the month (and perhaps periodically thereafter) Archiveteam will be doing a deep focused crawl of all the sites linked from this wiki, and passing it off to the Wayback Machine at archive.org. Chronomex (talk) 05:36, 17 November 2012 (UTC)
- Nice, thanks for the info. Maurice.de.rooij (talk) 09:59, 19 November 2012 (UTC)
 
[edit] Not all extensions show up in category pages?
Johanvanderknijff (talk) 10:43, 20 November 2012 (UTC) It looks like not all file extensions that are defined using the infoboxes show up in the category overview pages. For example, looking at:
http://justsolve.archiveteam.org/index.php/WQ1
And:
There's no .wq1 extension in the category page, even though I have no idea why! There are some more examples.
- It looks like if the category is new, you need to create the page with e.g. "Category:File formats by extension|W" (replace quotes with internal links [[ ]]) Halftheisland (talk) 12:00, 20 November 2012 (UTC)

