Just Solve the File Format Problem:Community portal
Dan Tobias (Talk | contribs) (→Spam) |
Jason Scott (Talk | contribs) (→Spam) |
||
Line 50: | Line 50: | ||
:...but "learn-to-read-Korean-in-15-minutes" is a legitimate addition, going to a comic strip explaining the [[Hangul]] writing system, which is in fact a legitimate article here since "file formats" is interpreted expansively to include human written languages. That link ''sounds'' a bit spammy, but if it was from a spammer, it would go to some page selling a dodgy language-learning tool, not a free-to-read resource! (It can start to get tricky distinguishing spam from legitimate stuff when you've got such a wide range of topics here to begin with! Once there's a huge flood of spam to get rid of, there's some danger of legitimate users getting caught in the net too.) [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:03, 18 January 2013 (UTC) | :...but "learn-to-read-Korean-in-15-minutes" is a legitimate addition, going to a comic strip explaining the [[Hangul]] writing system, which is in fact a legitimate article here since "file formats" is interpreted expansively to include human written languages. That link ''sounds'' a bit spammy, but if it was from a spammer, it would go to some page selling a dodgy language-learning tool, not a free-to-read resource! (It can start to get tricky distinguishing spam from legitimate stuff when you've got such a wide range of topics here to begin with! Once there's a huge flood of spam to get rid of, there's some danger of legitimate users getting caught in the net too.) [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:03, 18 January 2013 (UTC) | ||
+ | |||
+ | ::Yes, it's incumbent on me to make sure we can have people sign up, and be a part of it, without getting spammers. We'll keep exploring. At least bots can't take us on.... I think.... |
Revision as of 19:27, 18 January 2013
- please add your signature by typing ~~~~ if you add or reply
Contents |
Open issues
Below is a list of "issues" which would ordinarily be in a ticketing system of some kind, but are here on the Wiki instead, because that's how we roll. As things are resolved, they will be moved to the Discussion page. If there's an appeal or an issue, the conversation can continue there - this page will be for open issues.
Use of case in URLS / links. I went through all the electronic format types pages, and tried to normalise all the pages where I could (there was a mix of link structures - I've tried to get them all (apart from animation - I've been at it all day!) so they are file extension - file type name. I notice that we have a mix of upper and lower case file extension through out. This means we may have 2 links which should point to the same URL (e.g. mix and MIX) is this a known issue with the current layout? --JaygattusoNLNZ (talk) 01:32, 20 November 2012 (UTC)
- Since you're linking both the extension and the name, does that mean that there are supposed to be separate articles for each? I don't know if there's really a need for "mainspace" articles by extension, since there are already categories for that purpose; you can browse them through Category:File formats by extension. Dan Tobias (talk) 02:12, 20 November 2012 (UTC)
- I just copied the most common model that I found on the formats pages. The problem is, if you don't homogenize the method, the linking/crosslinking doesn't work properly. All instances of .doc (for example) should point to the same resource page / disambiguation page. If someone has linked to only format in one place (e.g. MS Word (.doc)), and someone else the extension (MS Word - doc), we can't makes sure they point to the same place. The problem occurs because format names and extensions are used interchangeably. You raise an interesting question about the relationship between the ext and the format name. I would argue they are not equal (1:1), nor (1:many) / (many:1) so it makes sense to protect both aspects as definable things - the extension because that's whats most commonly searched for and referred to by users and 'format name' because its more accurate. How is the Category:File formats by extension populated? --JaygattusoNLNZ (talk) 18:31, 20 November 2012 (UTC)
- The categories are inserted when you use the ext template in the infobox. My preference is to have articles by actual format name and use multiple navigation aids (menus, cats, etc.) to get to them. Dan Tobias (talk) 01:38, 21 November 2012 (UTC)
- I just copied the most common model that I found on the formats pages. The problem is, if you don't homogenize the method, the linking/crosslinking doesn't work properly. All instances of .doc (for example) should point to the same resource page / disambiguation page. If someone has linked to only format in one place (e.g. MS Word (.doc)), and someone else the extension (MS Word - doc), we can't makes sure they point to the same place. The problem occurs because format names and extensions are used interchangeably. You raise an interesting question about the relationship between the ext and the format name. I would argue they are not equal (1:1), nor (1:many) / (many:1) so it makes sense to protect both aspects as definable things - the extension because that's whats most commonly searched for and referred to by users and 'format name' because its more accurate. How is the Category:File formats by extension populated? --JaygattusoNLNZ (talk) 18:31, 20 November 2012 (UTC)
Article naming convention
As mentioned above, there's some dispute over whether to name articles after the full name of a format or its file extension. If using full names, you then get into issues of whether to use the full technical name or a shorter thing that's more popularly used, and in some cases that's even the same as the extension (GIF, for instance). And you also get into tricky issues of capitalization: all-caps like an acronym, all-lowercase like filenames are often done (though this is OS-dependent; some, like MS-DOS, use all-uppercase), or mixed case (proper names capitalized)? And then there's the disambiguation issue of how to name articles on different things that have the same name, which happens sometimes even with long official names, but even more often with short acronyms and file extensions. But there's also yet another issue of which things get separate articles and which are combined, like formats that have had many different versions, etc.
Currently you have things like CI and CT, recently-created articles that represent two different file types within the data of one type of music tracker. The spec document they link to is the same one, which documents all the file types used in that tracker. Unless there's going to be really a lot to say about each of the specific file types, my own preference would be to have one article called CyberTracker that discusses all the formats used by the program in question, with subheaders within the article for the different file types, and all the extensions listed in the infobox (and hence in associated categories). If any other indices by extension are built up, they'd also have entries for both CI and CT. For instance, when I documented Softdisk Family Tree, I covered all the various file formats in one article, though there are several versions and multiple files for each. Dan Tobias (talk) 13:39, 21 November 2012 (UTC)
- I realise I'm as guilty of this as anyone, having used both forms at some point (e.g. Surprise! Adlib Tracker v2.0 and CI). Indeed, the two articles - CI and CT - you refer to were created by me. I guess in general I would favour using a descriptive page name rather than simply the file extension - that seems to be something that's being taken care of by infoboxes and categories.
- On the issue of what gets a separate page and what doesn't, I guess that just comes down to individual discretion. There will be instances where a format has undergone a number of minor revisions over time or has a number of minor variants (e.g. the variant forms of Chaos Music Composer's CMC) where it would make sense to keep them all to a single page, while a major revision would necessitate a multi-page approach (e.g. the shift with Capella from the binary CAP to the XML-based CapXML format).
- However, I'm not sure I agree with CI and CT having a single CyberTracker page. While both link to the same spec document and both are used by the same program, they are different formats serving different purposes. I think in general we should try and distinguish between program and file format - S3M doesn't belong on the ScreamTracker page, although each should link to the other. Halftheisland (talk) 14:04, 21 November 2012 (UTC)
- Since the purpose of the wiki is to document file formats, I think it's good that as many formats as possible are listed in the category pages and that you can browse these pages for format extensions. Sometimes it might be better to link multiple extension to the same article (e.g. a specific application), but not always. I think it is difficult to come up with a strict rule for this (but maybe recommendations and, even better, good examples). --PN (talk) 15:08, 21 November 2012 (UTC)
- It's a judgment call, certainly. It depends on how the files are typically encountered, distributed, used, etc., and how they're thought of by people who use them; if a bunch of file types related to a particular program are usually found together as part of a larger data set, they most likely belong together in one article (with subsections to describe the function of the particular files), but if they're distinct entities with their own particular treatment (like separate areas of file trading sites for enthusiasts) they should have separate articles, though more descriptive names like "CyberTracker instrument file" might be better than a cryptic and likely ambiguous CI. Dan Tobias (talk) 15:46, 21 November 2012 (UTC)
- And then, somebody has also used a robot to create pages in a separate namespace devoted to file extensions, like Ext:cin. That's yet another navigational system for getting to information by extension, though those pages oddly don't actually have direct links to the normal pages here about those file formats. Dan Tobias (talk) 15:56, 21 November 2012 (UTC)
- Yes, that was me with Bender the bot. Still experimenting with it and working on creating a list of all pages in relation to extensions. Maurice.de.rooij (talk) 15:22, 22 November 2012 (UTC)
- What I'd like to avoid is the messy format somebody did to a few index pages like Compression, where each line has separately hyperlinked format names and extensions (not always in a consistent order) where often one or the other is a redlink, or one redirects to the other, or one is just a disambiguation page, making a somewhat confusing hodgepodge. Dan Tobias (talk) 16:22, 21 November 2012 (UTC)
- I've started rearranging the Compression page to be a little less messy. Dan Tobias (talk) 16:56, 22 November 2012 (UTC)
So now what?
The official month of this project is now over... what are the plans for the site now? It's made a good start at documenting file formats, but has a good long way to go yet. (A project like this can never possibly be "finished", since there are always more file formats coming out of the woodwork, both new ones that are introduced, and old ones that are discovered.) Dan Tobias (talk) 05:10, 1 December 2012 (UTC)
- This is an awesome project and I will stay committed to it. Of course this first month is just a start. Let's roll people! Maurice.de.rooij (talk) 23:22, 3 December 2012 (UTC)
Anybody else still around?
Everybody else seems to have vanished around the middle of December... I'm the only one editing here lately. I hate to put more effort into improving a ghost town... anyone else even reading this? Dan Tobias (talk) 23:16, 2 January 2013 (UTC)
- I will be editing more once I get back to work - still don't have a home internet connection and working from the local library computers / girlfriend's netbook over public wi-fi is a pain. It would be nice to see more contributions from others - you can see how much work is left to do on the music section alone, and I've really only been creating stub entries for most things. Halftheisland (talk) 13:51, 3 January 2013 (UTC)
- Well, I still stop by on occasion, and I've vowed to use the site as my first stop when I come across a file format I don't recognize, but I never made any substantial additions, so I'm not sure if that gives you any useful information. (My edits were mostly technical or editorial.) GPHemsley (talk) 00:18, 13 January 2013 (UTC)
- I'll be editing from time-to-time. Currently a bit snowed under with other work, but planning to do more later in the year. Would also like to review the InfoBox(es) at some point, to ensure the information on this site can be reliably linked up to other information sources. AndyJackson (talk) 12:10, 18 January 2013 (UTC)
- I'm here. Like Andy, my workload is quite high, but I'll be popping in and out. --Rhetoric X (talk) 12:31, 18 January 2013 (UTC)
Spam
I see the spammers have found the site, as I worried would happen; I run a wiki myself (MPedia, about things related to Mensa) and have to constantly play whack-a-mole with them; even adding such annoyances (for legitimate users) as a captcha and e-mail confirmation seem to only slightly slow the spammers down. I don't know the solution. Dan Tobias (talk) 12:59, 18 January 2013 (UTC)
- ...but "learn-to-read-Korean-in-15-minutes" is a legitimate addition, going to a comic strip explaining the Hangul writing system, which is in fact a legitimate article here since "file formats" is interpreted expansively to include human written languages. That link sounds a bit spammy, but if it was from a spammer, it would go to some page selling a dodgy language-learning tool, not a free-to-read resource! (It can start to get tricky distinguishing spam from legitimate stuff when you've got such a wide range of topics here to begin with! Once there's a huge flood of spam to get rid of, there's some danger of legitimate users getting caught in the net too.) Dan Tobias (talk) 13:03, 18 January 2013 (UTC)
- Yes, it's incumbent on me to make sure we can have people sign up, and be a part of it, without getting spammers. We'll keep exploring. At least bots can't take us on.... I think....