Just Solve the File Format Problem:Community portal
| Line 176: | Line 176: | ||
| Maybe we can do this together instead of everyone here focusing on different things? Also is there a better way to discuss things than writing here? | Maybe we can do this together instead of everyone here focusing on different things? Also is there a better way to discuss things than writing here? | ||
| [[User:Tekkno|Tekkno]] ([[User talk:Tekkno|talk]]) 01:39, 9 May 2019 (UTC) | [[User:Tekkno|Tekkno]] ([[User talk:Tekkno|talk]]) 01:39, 9 May 2019 (UTC) | ||
| + | |||
| + | :You should set up a NNTP for reverse engineering file formats discussions (if there isn't already the appropriaate newsgroup). (I had done some of my own reverse engineering file formats too, but I have not set up a NNTP to discuss them. I do have a NNTP server, so you can suggest newsgroups there if wanted, I suppose) --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 21:26, 23 August 2021 (UTC) | ||
| == CAPTCHA == | == CAPTCHA == | ||
Revision as of 21:26, 23 August 2021
- please add your signature by typing ~~~~ if you add or reply
Open issues
Below is a list of "issues" which would ordinarily be in a ticketing system of some kind, but are here on the Wiki instead, because that's how we roll. As things are resolved, they will be moved to the Discussion page. If there's an appeal or an issue, the conversation can continue there - this page will be for open issues.
Use of case in URLS / links. I went through all the electronic format types pages, and tried to normalise all the pages where I could (there was a mix of link structures - I've tried to get them all (apart from animation - I've been at it all day!) so they are file extension - file type name. I notice that we have a mix of upper and lower case file extension through out. This means we may have 2 links which should point to the same URL (e.g. mix and MIX) is this a known issue with the current layout? --JaygattusoNLNZ (talk) 01:32, 20 November 2012 (UTC)
- Since you're linking both the extension and the name, does that mean that there are supposed to be separate articles for each? I don't know if there's really a need for "mainspace" articles by extension, since there are already categories for that purpose; you can browse them through Category:File formats by extension. Dan Tobias (talk) 02:12, 20 November 2012 (UTC)
- I just copied the most common model that I found on the formats pages. The problem is, if you don't homogenize the method, the linking/crosslinking doesn't work properly.  All instances of .doc (for example) should point to the same resource page / disambiguation page. If someone has linked to only format in one place (e.g. MS Word (.doc)), and someone else the extension (MS Word - doc), we can't makes sure they point to the same place. The problem occurs because format names and extensions are used interchangeably. You raise an interesting question about the relationship between the ext and the format name. I would argue they are not equal (1:1), nor (1:many) / (many:1) so it makes sense to protect both aspects as definable things - the extension because that's whats most commonly searched for and referred to by users and 'format name' because its more accurate. How is the Category:File formats by extension populated? --JaygattusoNLNZ (talk) 18:31, 20 November 2012 (UTC)
- The categories are inserted when you use the ext template in the infobox. My preference is to have articles by actual format name and use multiple navigation aids (menus, cats, etc.) to get to them. Dan Tobias (talk) 01:38, 21 November 2012 (UTC)
 
 
- I just copied the most common model that I found on the formats pages. The problem is, if you don't homogenize the method, the linking/crosslinking doesn't work properly.  All instances of .doc (for example) should point to the same resource page / disambiguation page. If someone has linked to only format in one place (e.g. MS Word (.doc)), and someone else the extension (MS Word - doc), we can't makes sure they point to the same place. The problem occurs because format names and extensions are used interchangeably. You raise an interesting question about the relationship between the ext and the format name. I would argue they are not equal (1:1), nor (1:many) / (many:1) so it makes sense to protect both aspects as definable things - the extension because that's whats most commonly searched for and referred to by users and 'format name' because its more accurate. How is the Category:File formats by extension populated? --JaygattusoNLNZ (talk) 18:31, 20 November 2012 (UTC)
Article naming convention
As mentioned above, there's some dispute over whether to name articles after the full name of a format or its file extension. If using full names, you then get into issues of whether to use the full technical name or a shorter thing that's more popularly used, and in some cases that's even the same as the extension (GIF, for instance). And you also get into tricky issues of capitalization: all-caps like an acronym, all-lowercase like filenames are often done (though this is OS-dependent; some, like MS-DOS, use all-uppercase), or mixed case (proper names capitalized)? And then there's the disambiguation issue of how to name articles on different things that have the same name, which happens sometimes even with long official names, but even more often with short acronyms and file extensions. But there's also yet another issue of which things get separate articles and which are combined, like formats that have had many different versions, etc.
Currently you have things like CI and CT, recently-created articles that represent two different file types within the data of one type of music tracker. The spec document they link to is the same one, which documents all the file types used in that tracker. Unless there's going to be really a lot to say about each of the specific file types, my own preference would be to have one article called CyberTracker that discusses all the formats used by the program in question, with subheaders within the article for the different file types, and all the extensions listed in the infobox (and hence in associated categories). If any other indices by extension are built up, they'd also have entries for both CI and CT. For instance, when I documented Softdisk Family Tree, I covered all the various file formats in one article, though there are several versions and multiple files for each. Dan Tobias (talk) 13:39, 21 November 2012 (UTC)
- I realise I'm as guilty of this as anyone, having used both forms at some point (e.g. Surprise! Adlib Tracker v2.0 and CI). Indeed, the two articles - CI and CT - you refer to were created by me. I guess in general I would favour using a descriptive page name rather than simply the file extension - that seems to be something that's being taken care of by infoboxes and categories.
- On the issue of what gets a separate page and what doesn't, I guess that just comes down to individual discretion. There will be instances where a format has undergone a number of minor revisions over time or has a number of minor variants (e.g. the variant forms of Chaos Music Composer's CMC) where it would make sense to keep them all to a single page, while a major revision would necessitate a multi-page approach (e.g. the shift with Capella from the binary CAP to the XML-based CapXML format).
- However, I'm not sure I agree with CI and CT having a single CyberTracker page. While both link to the same spec document and both are used by the same program, they are different formats serving different purposes. I think in general we should try and distinguish between program and file format - S3M doesn't belong on the ScreamTracker page, although each should link to the other. Halftheisland (talk) 14:04, 21 November 2012 (UTC)
- Since the purpose of the wiki is to document file formats, I think it's good that as many formats as possible are listed in the category pages and that you can browse these pages for format extensions. Sometimes it might be better to link multiple extension to the same article (e.g. a specific application), but not always. I think it is difficult to come up with a strict rule for this (but maybe recommendations and, even better, good examples). --PN (talk) 15:08, 21 November 2012 (UTC)
- It's a judgment call, certainly. It depends on how the files are typically encountered, distributed, used, etc., and how they're thought of by people who use them; if a bunch of file types related to a particular program are usually found together as part of a larger data set, they most likely belong together in one article (with subsections to describe the function of the particular files), but if they're distinct entities with their own particular treatment (like separate areas of file trading sites for enthusiasts) they should have separate articles, though more descriptive names like "CyberTracker instrument file" might be better than a cryptic and likely ambiguous CI. Dan Tobias (talk) 15:46, 21 November 2012 (UTC)
- And then, somebody has also used a robot to create pages in a separate namespace devoted to file extensions, like Ext:cin. That's yet another navigational system for getting to information by extension, though those pages oddly don't actually have direct links to the normal pages here about those file formats. Dan Tobias (talk) 15:56, 21 November 2012 (UTC)
- Yes, that was me with Bender the bot. Still experimenting with it and working on creating a list of all pages in relation to extensions. Maurice.de.rooij (talk) 15:22, 22 November 2012 (UTC)
 
- What I'd like to avoid is the messy format somebody did to a few index pages like Compression, where each line has separately hyperlinked format names and extensions (not always in a consistent order) where often one or the other is a redlink, or one redirects to the other, or one is just a disambiguation page, making a somewhat confusing hodgepodge. Dan Tobias (talk) 16:22, 21 November 2012 (UTC)
- I've started rearranging the Compression page to be a little less messy. Dan Tobias (talk) 16:56, 22 November 2012 (UTC)
 
 
So now what?
The official month of this project is now over... what are the plans for the site now? It's made a good start at documenting file formats, but has a good long way to go yet. (A project like this can never possibly be "finished", since there are always more file formats coming out of the woodwork, both new ones that are introduced, and old ones that are discovered.) Dan Tobias (talk) 05:10, 1 December 2012 (UTC)
- This is an awesome project and I will stay committed to it. Of course this first month is just a start. Let's roll people! Maurice.de.rooij (talk) 23:22, 3 December 2012 (UTC)
Anybody else still around?
Everybody else seems to have vanished around the middle of December... I'm the only one editing here lately. I hate to put more effort into improving a ghost town... anyone else even reading this? Dan Tobias (talk) 23:16, 2 January 2013 (UTC)
- I will be editing more once I get back to work - still don't have a home internet connection and working from the local library computers / girlfriend's netbook over public wi-fi is a pain. It would be nice to see more contributions from others - you can see how much work is left to do on the music section alone, and I've really only been creating stub entries for most things. Halftheisland (talk) 13:51, 3 January 2013 (UTC)
- Well, I still stop by on occasion, and I've vowed to use the site as my first stop when I come across a file format I don't recognize, but I never made any substantial additions, so I'm not sure if that gives you any useful information. (My edits were mostly technical or editorial.) GPHemsley (talk) 00:18, 13 January 2013 (UTC)
- I'll be editing from time-to-time. Currently a bit snowed under with other work, but planning to do more later in the year. Would also like to review the InfoBox(es) at some point, to ensure the information on this site can be reliably linked up to other information sources. AndyJackson (talk) 12:10, 18 January 2013 (UTC)
- I'm here. Like Andy, my workload is quite high, but I'll be popping in and out. --Rhetoric X (talk) 12:31, 18 January 2013 (UTC)
- Hi there! I sometimes add a word here or there. I must say this Wiki is pretty good now. Popular formats are nicely described and niche formats are just niche formats so it's sometimes hard to add anything about them. I think that maybe it would be helpful to start adding images to posts. An image explaining format details or a screenshot of an image editor may be a nice addition. What about algorithms in pseudo-code? --Tekkno (talk) 0:28, 7 September 2018 (UTC)
Spam
I see the spammers have found the site, as I worried would happen; I run a wiki myself (MPedia, about things related to Mensa) and have to constantly play whack-a-mole with them; even adding such annoyances (for legitimate users) as a captcha and e-mail confirmation seem to only slightly slow the spammers down. I don't know the solution. Dan Tobias (talk) 12:59, 18 January 2013 (UTC)
- ...but "learn-to-read-Korean-in-15-minutes" is a legitimate addition, going to a comic strip explaining the Hangul writing system, which is in fact a legitimate article here since "file formats" is interpreted expansively to include human written languages. That link sounds a bit spammy, but if it was from a spammer, it would go to some page selling a dodgy language-learning tool, not a free-to-read resource! (It can start to get tricky distinguishing spam from legitimate stuff when you've got such a wide range of topics here to begin with! Once there's a huge flood of spam to get rid of, there's some danger of legitimate users getting caught in the net too.) Dan Tobias (talk) 13:03, 18 January 2013 (UTC)
- Yes, it's incumbent on me to make sure we can have people sign up, and be a part of it, without getting spammers. We'll keep exploring. At least bots can't take us on.... I think.... --Jason Scott (talk) 19:28, 18 January 2013 (UTC)
 
- If you've got some tips about how to configure MediaWiki to have open signups but not get the flood of spambots, let me know; that would help me with my own wiki. Dan Tobias (talk) 12:56, 22 January 2013 (UTC)
 
 
Orphaned / Blank Pages
I've been making an attempt to clear up some of the orphaned pages, but there are a few I'm not sure of - maybe Dan or someone could sort them out?
- Emulation
- FAQ:File Format
- File format extensions list (seems to be used for the "ext:" pages but hasn't been updated)
- Library
- Original Plan
- RAD Game Tools (should probably have the individual formats moved to appropriate sections)
- Statistica (clearly belongs in Scientific Data formats, but I'm not sure where)
I've also come across a few pages that should probably be deleted - either because they've been blanked at some point (I know I did this to a few pages) or because they contain data duplicated elsewhere.
Halftheisland (talk) 10:41, 22 January 2013 (UTC)
- OK, I deleted those last three; I'll look at the others. Dan Tobias (talk) 12:58, 22 January 2013 (UTC)
- I put Statistica under "Mathematics" in the science category. Dan Tobias (talk) 13:02, 22 January 2013 (UTC)
Hi Dan, got another one for you - I merged the info from ODS files created by Microsoft Office 2007 SP2 into the main OpenDocument Spreadsheet page. Halftheisland (talk) 13:59, 25 February 2013 (UTC)
Added Barnes & Noble to the list (made a bit of a mess and forgot about the rename feature) Johanvanderknijff (talk) 19:05, 21 April 2016 (UTC)
Permissions for user pages
Is there any way we can get permission to delete sub-pages of our own user pages? I've been using mine to draft articles bit by bit, rather than release half-finished articles into the wild, and it would be nice to be able to remove the drafts once complete Halftheisland (talk) 12:43, 3 October 2013 (UTC)
- I'm not sure, but as an admin I can delete anything you ask. It might also be possible to use the Move function to move it directly into the intended place. Dan Tobias (talk) 16:45, 3 October 2013 (UTC)
cd.textfiles.com
All the files on http://cd.textfiles.com/ disappeared a few days ago, breaking about a million links on this wiki. Does anyone have any information about that? Jsummers (talk) 18:48, 25 January 2015 (UTC)
- As I recall from Jason's Twitter feed, he had some server problems, with most of his sites going down at least temporary, and most of them eventually coming back up, but maybe that one had a harder crash. Dan Tobias (talk) 19:50, 25 January 2015 (UTC)
The "Creative Commons 0" image at the bottom of every page (https://www.mediawiki.org/w/skins/common/images/cc-0.png) is broken. Can that be fixed? Jsummers (talk) 00:06, 10 July 2015 (UTC)
- Still broken 5 years later... Is this place even maintained? GoodClover (talk) 23:23, 12 March 2021 (UTC)
- Ok so it appears it should probably be this image, it matches the 88x31px that the HTML claims the image would be if it was there. Who maintains this site so it can be fixed? GoodClover (talk) 00:01, 13 March 2021 (UTC)
- I guess that would be Jason Scott. I'm an admin, but if I have any ability to edit that part of the site I have no idea how. Dan Tobias (talk) 01:31, 13 March 2021 (UTC)
 
 
- Ok so it appears it should probably be this image, it matches the 88x31px that the HTML claims the image would be if it was there. Who maintains this site so it can be fixed? GoodClover (talk) 00:01, 13 March 2021 (UTC)
Wikipedia links
At least in my geographical area, Wikipedia has been redirecting "http:" links to "https:". So, all of the [[Wikipedia:...]] links in this wiki are getting redirected. Could/should we change these links to use "https:" directly?
The magic "RFC" links like RFC 822 could also use https:, though the http: links still work. Jsummers (talk) 00:10, 10 July 2015 (UTC)
Google Code
We still have around 50 articles that link to Google Code. My understanding is that the next phase of Google Code's shutdown process will happen on 2016-01-25 (two weeks from today). It would be good to update as many of these as possible before then.
Jsummers (talk) 21:05, 11 January 2016 (UTC)
Cleanup of top-level categories
(Call for objections.) I want to do some cleanup of the top-level categories, and make sure there's at least one category for virtually every article. (See Special:UncategorizedPages.) My plans:
- A new "Meta" category, for articles about the File Formats Wiki (e.g. FAQ, Original Plan, Statement of Project, Main Page, ...).
- Rename the Geek humor category to "Humor"
- Remove the Computer facts category
- A new "Information" category, for relevant informative articles (Ontology, Patents, ...) that don't have a more suitable top-level category.
- Maybe someday: A category named "Devices", or "Hardware", or even "Things". Most computers and Networked devices just aren't formats, IMHO. (But I'm not going to delete the infobox from all the "Networked devices" articles. If we can't figure out a way to have infoboxes for nonformats, then I'll leave them be.)
Jsummers (talk) 15:56, 1 June 2017 (UTC)
Love It!
Hi there, kudos to all you guys who helped create this valuable resource. Wikipedia is such a snob when it comes to detailed technical documentation so this wiki is a lifesaver. I added a few things to:
Thanks again!
PS: Can the "thumbs up" icon be changed to something better? Do you want me to design a possible logo?
Hgupta (talk) 05:42, 17 August 2017 (UTC)
- Nice work! As for the thumb icon, you'd have to ask Jason Scott, the owner of this site (and the one who put the thumb up). Dan Tobias (talk) 13:02, 17 August 2017 (UTC)
What time is it?
I'm making this edit at 17:10 UTC, but the timestamp is: Jsummers (talk) 17:25, 2 May 2018 (UTC)
- "Does anybody really know what time it is; does anybody really care?" -- Chicago
[posted at 01:20 UTC; let's see when it thinks it is] Dan Tobias (talk) 01:36, 3 May 2018 (UTC)
Type / Creator codes
Curious what everyone's thoughts are on collecting Type/Creator Codes for Macintosh formats. There seems to be a few attempts at doing this around the webs. Is there a way here to gather them all into one area of the wiki? --Thorsted (talk) 17:46, 4 May 2019 (UTC)
- Type Code : Wikipedia
- Creator Code : Wikipedia
- TCDBx unmaintained
- The Programmers Apple Mac Sourcebook
- Mac Signatures
- Maybe do it similar to how file extensions are handled, as an item in the infobox that links to a category? Dan Tobias (talk) 19:09, 4 May 2019 (UTC)
- An article for Mac type/creator codes has been on my to-do list for a while, so we could at least do that, and see if there's any interest in listing lots of codes there. Should it be one article, or two? FormatInfo already has a "type code" param that is supposed to be for the Mac code. Maybe we are supposed to make a "Type Code" template to go along with it, so we can do like "|type code={{Type Code|XXXX}}". Jsummers (talk) 21:07, 4 May 2019 (UTC)
 
- If they were listed in a single article as opposed to a series of categories, I don't see what there would be for a template would do. In that case, the text on the left side of the infobox could link to the list page (although this might be ugly). (It would be convenient if there was something between the complexity of the MediaWiki category system and a list page, but I don't think anything like that exists in a plain Mediawiki installation.) Effect2 (talk) 21:30, 4 May 2019 (UTC)
 
 
- Even if they went into the infobox, the category system could potentially be left out out, as is currently done with FOURCCs and MIMETypes (the latter links to an external database, but whether anything is there is based on luck more than anything else, as there are so many unregistered mimetypes). These can still be found with the wiki's search feature. Effect2 (talk) 21:13, 4 May 2019 (UTC)
 
- And there's also the Creator Code, as noted above; that refers to what program created the file, so there might be several associated with one file type code (and several file type codes associated with one creator). Perhaps there needs to be a section of the article listing all the code values associated with a given format and/or program (depending on what's covered by the article). Dan Tobias (talk) 21:44, 4 May 2019 (UTC)
 
 
- I like the idea of at least a uniform template for using codes within format descriptions. Since most of the files from the early macintosh days don't have an extension, unless they were cross platform and the Windows extension is used, then the only way to identify the file is from its Type/Creator code. I don't think Apple ever released the full registry, but some estimates are well over 50,000 entries.--Thorsted (talk) 03:24, 5 May 2019 (UTC)
 
Reverse engineering formats
I am trying to reverse engineer some formats. Sometimes successfully, sometimes not. My most recent attempt is:
Maybe we can do this together instead of everyone here focusing on different things? Also is there a better way to discuss things than writing here? Tekkno (talk) 01:39, 9 May 2019 (UTC)
- You should set up a NNTP for reverse engineering file formats discussions (if there isn't already the appropriaate newsgroup). (I had done some of my own reverse engineering file formats too, but I have not set up a NNTP to discuss them. I do have a NNTP server, so you can suggest newsgroups there if wanted, I suppose) --Zzo38 (talk) 21:26, 23 August 2021 (UTC)
CAPTCHA
AT is no longer on EFnet: https://archiveteam.org/index.php?title=Archiveteam:IRC#Special_ArchiveTeam_IRC_rules Arlo James Barnes (talk) 02:57, 8 November 2020 (UTC)
- This is a pretty serious problem. Are there any plans to fix it? -Jsummers (talk) 16:22, 12 November 2020 (UTC)
- Seems like it has been fixed, by removing the CAPTCHA altogether. Let's all keep a keen eye out for spamdalism. Arlo James Barnes (talk) 03:33, 23 November 2020 (UTC)
 
special:interwiki
don't see it at special:specialpages? Arlo James Barnes (talk) 02:57, 8 November 2020 (UTC)
List of my idea what maybe should be added on
My idea of what things I think that probably should be added on (when someone has the information of it to add):
- TRON character encoding
- TRON Application Databus
- BANCStar
- C67 (music)
(I might add a few others later if I will remember some more later, too) --Zzo38 (talk) 09:30, 31 July 2021 (UTC)

