Just Solve the File Format Problem - User contributions [en]

Just Solve the File Format Problem:Community portal

2026-05-13T19:54:38Z

265 993 303: /* CAPTCHA issue */

: ''please add your signature by typing <nowiki>~~~~</nowiki> if you add or reply

== Open issues ==

Below is a list of "issues" which would ordinarily be in a ticketing system of some kind, but are here on the Wiki instead, because that's how we roll. As things are resolved, they will be moved to the Discussion page. If there's an appeal or an issue, the conversation can continue there - this page will be for open issues.

Use of case in URLS / links. I went through all the electronic format types pages, and tried to normalise all the pages where I could (there was a mix of link structures - I've tried to get them all (apart from animation - I've been at it all day!) so they are [[file extension]] - [[file type name]].
I notice that we have a mix of upper and lower case file extension through out. This means we may have 2 links which should point to the same URL (e.g. [[mix]] and [[MIX]]) is this a known issue with the current layout? --[[User:JaygattusoNLNZ|JaygattusoNLNZ]] ([[User talk:JaygattusoNLNZ|talk]]) 01:32, 20 November 2012 (UTC)
:Since you're linking both the extension and the name, does that mean that there are supposed to be separate articles for each? I don't know if there's really a need for "mainspace" articles by extension, since there are already categories for that purpose; you can browse them through [[:Category:File formats by extension]]. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 02:12, 20 November 2012 (UTC)
::I just copied the most common model that I found on the formats pages. The problem is, if you don't homogenize the method, the linking/crosslinking doesn't work properly. All instances of .doc (for example) should point to the same resource page / disambiguation page. If someone has linked to only format in one place (e.g. [[MS Word]] (.doc)), and someone else the extension (MS Word - [[doc]]), we can't makes sure they point to the same place. The problem occurs because format names and extensions are used interchangeably. You raise an interesting question about the relationship between the ext and the format name. I would argue they are not equal (1:1), nor (1:many) / (many:1) so it makes sense to protect both aspects as definable things - the extension because that's whats most commonly searched for and referred to by users and 'format name' because its more accurate. How is the [[:Category:File formats by extension]] populated? --[[User:JaygattusoNLNZ|JaygattusoNLNZ]] ([[User talk:JaygattusoNLNZ|talk]]) 18:31, 20 November 2012 (UTC)
:::The categories are inserted when you use the ext template in the infobox. My preference is to have articles by actual format name and use multiple navigation aids (menus, cats, etc.) to get to them. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 01:38, 21 November 2012 (UTC)

== Article naming convention ==

As mentioned above, there's some dispute over whether to name articles after the full name of a format or its file extension. If using full names, you then get into issues of whether to use the full technical name or a shorter thing that's more popularly used, and in some cases that's even the same as the extension (GIF, for instance). And you also get into tricky issues of capitalization: all-caps like an acronym, all-lowercase like filenames are often done (though this is OS-dependent; some, like MS-DOS, use all-uppercase), or mixed case (proper names capitalized)? And then there's the disambiguation issue of how to name articles on different things that have the same name, which happens sometimes even with long official names, but even more often with short acronyms and file extensions. But there's also yet another issue of which things get separate articles and which are combined, like formats that have had many different versions, etc.

Currently you have things like [[CI]] and [[CT]], recently-created articles that represent two different file types within the data of one type of music tracker. The spec document they link to is the same one, which documents all the file types used in that tracker. Unless there's going to be really a lot to say about each of the specific file types, my own preference would be to have one article called [[CyberTracker]] that discusses all the formats used by the program in question, with subheaders within the article for the different file types, and all the extensions listed in the infobox (and hence in associated categories). If any other indices by extension are built up, they'd also have entries for both CI and CT. For instance, when I documented [[Softdisk Family Tree]], I covered all the various file formats in one article, though there are several versions and multiple files for each. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:39, 21 November 2012 (UTC)
: I realise I'm as guilty of this as anyone, having used both forms at some point (e.g. [[Surprise! Adlib Tracker v2.0]] and [[CI]]). Indeed, the two articles - [[CI]] and [[CT]] - you refer to were created by me. I guess in general I would favour using a descriptive page name rather than simply the file extension - that seems to be something that's being taken care of by infoboxes and categories.

:On the issue of what gets a separate page and what doesn't, I guess that just comes down to individual discretion. There will be instances where a format has undergone a number of minor revisions over time or has a number of minor variants (e.g. the variant forms of Chaos Music Composer's [[CMC]]) where it would make sense to keep them all to a single page, while a major revision would necessitate a multi-page approach (e.g. the shift with Capella from the binary [[CAP]] to the XML-based [[CapXML]] format).

:However, I'm not sure I agree with [[CI]] and [[CT]] having a single [[CyberTracker]] page. While both link to the same spec document and both are used by the same program, they are different formats serving different purposes. I think in general we should try and distinguish between program and file format - [[S3M]] doesn't belong on the [[ScreamTracker]] page, although each should link to the other. [[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 14:04, 21 November 2012 (UTC)

:Since the purpose of the wiki is to document file formats, I think it's good that as many formats as possible are listed in the category pages and that you can browse these pages for format extensions. Sometimes it might be better to link multiple extension to the same article (e.g. a specific application), but not always. I think it is difficult to come up with a strict rule for this (but maybe recommendations and, even better, good examples). --[[User:PN|PN]] ([[User talk:PN|talk]]) 15:08, 21 November 2012 (UTC)

::It's a judgment call, certainly. It depends on how the files are typically encountered, distributed, used, etc., and how they're thought of by people who use them; if a bunch of file types related to a particular program are usually found together as part of a larger data set, they most likely belong together in one article (with subsections to describe the function of the particular files), but if they're distinct entities with their own particular treatment (like separate areas of file trading sites for enthusiasts) they should have separate articles, though more descriptive names like "CyberTracker instrument file" might be better than a cryptic and likely ambiguous CI. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 15:46, 21 November 2012 (UTC)
::And then, somebody has also used a robot to create pages in a separate namespace devoted to file extensions, like [[Ext:cin]]. That's yet another navigational system for getting to information by extension, though those pages oddly don't actually have direct links to the normal pages here about those file formats. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 15:56, 21 November 2012 (UTC)
::: Yes, that was me with Bender the bot. Still experimenting with it and working on creating a list of all pages in relation to extensions. [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 15:22, 22 November 2012 (UTC)
::What I'd like to avoid is the messy format somebody did to a few index pages like [[Compression]], where each line has separately hyperlinked format names and extensions (not always in a consistent order) where often one or the other is a redlink, or one redirects to the other, or one is just a disambiguation page, making a somewhat confusing hodgepodge. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 16:22, 21 November 2012 (UTC)
:::I've started rearranging the Compression page to be a little less messy. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 16:56, 22 November 2012 (UTC)

== So now what? ==
The official month of this project is now over... what are the plans for the site now? It's made a good start at documenting file formats, but has a good long way to go yet. (A project like this can never possibly be "finished", since there are always more file formats coming out of the woodwork, both new ones that are introduced, and old ones that are discovered.) [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 05:10, 1 December 2012 (UTC)
: This is an awesome project and I will stay committed to it. Of course this first month is just a start. Let's roll people! [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 23:22, 3 December 2012 (UTC)

== Anybody else still around? ==
Everybody else seems to have vanished around the middle of December... I'm the only one editing here lately. I hate to put more effort into improving a ghost town... anyone else even reading this? [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 23:16, 2 January 2013 (UTC)

: I will be editing more once I get back to work - still don't have a home internet connection and working from the local library computers / girlfriend's netbook over public wi-fi is a pain. It would be nice to see more contributions from others - you can see how much work is left to do on the music section alone, and I've really only been creating stub entries for most things. [[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 13:51, 3 January 2013 (UTC)

:Well, I still stop by on occasion, and I've vowed to use the site as my first stop when I come across a file format I don't recognize, but I never made any substantial additions, so I'm not sure if that gives you any useful information. (My edits were mostly technical or editorial.) [[User:Gphemsley|GPHemsley]] ([[User talk:Gphemsley|talk]]) 00:18, 13 January 2013 (UTC)

:I'll be editing from time-to-time. Currently a bit snowed under with other work, but planning to do more later in the year. Would also like to review the InfoBox(es) at some point, to ensure the information on this site can be reliably linked up to other information sources. [[User:AndyJackson|AndyJackson]] ([[User talk:AndyJackson|talk]]) 12:10, 18 January 2013 (UTC)

:I'm here. Like Andy, my workload is quite high, but I'll be popping in and out. --[[User:Rhetoric X|Rhetoric X]] ([[User talk:Rhetoric X|talk]]) 12:31, 18 January 2013 (UTC)

:I dip in and out when I want a challenge (or can stomach the frustration.) [[User:Foxtrot|Foxtrot]] ([[User talk:Foxtrot|talk]]) 11:47, 11 January 2023 (UTC)

:Hi there! I sometimes add a word here or there. I must say this Wiki is pretty good now. Popular formats are nicely described and niche formats are just niche formats so it's sometimes hard to add anything about them. I think that maybe it would be helpful to start adding images to posts. An image explaining format details or a screenshot of an image editor may be a nice addition. What about algorithms in pseudo-code? --[[User:Tekkno|Tekkno]] ([[User talk:Tekkno|talk]]) 0:28, 7 September 2018 (UTC)

::A description of file formats and pseudo-codes would be helpful (although you do not necessarily need a picture). --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 05:19, 5 May 2019 (UTC)

== Spam ==

I see the spammers have found the site, as I worried would happen; I run a wiki myself ([http://mpedia.dan.info/ MPedia], about things related to Mensa) and have to constantly play whack-a-mole with them; even adding such annoyances (for legitimate users) as a captcha and e-mail confirmation seem to only slightly slow the spammers down. I don't know the solution. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 12:59, 18 January 2013 (UTC)

:...but "learn-to-read-Korean-in-15-minutes" is a legitimate addition, going to a comic strip explaining the [[Hangul]] writing system, which is in fact a legitimate article here since "file formats" is interpreted expansively to include human written languages. That link ''sounds'' a bit spammy, but if it was from a spammer, it would go to some page selling a dodgy language-learning tool, not a free-to-read resource! (It can start to get tricky distinguishing spam from legitimate stuff when you've got such a wide range of topics here to begin with! Once there's a huge flood of spam to get rid of, there's some danger of legitimate users getting caught in the net too.) [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:03, 18 January 2013 (UTC)

::Yes, it's incumbent on me to make sure we can have people sign up, and be a part of it, without getting spammers. We'll keep exploring. At least bots can't take us on.... I think.... --[[User:Jason Scott|Jason Scott]] ([[User talk:Jason Scott|talk]]) 19:28, 18 January 2013 (UTC)

:::If you've got some tips about how to configure MediaWiki to have open signups but not get the flood of spambots, let me know; that would help me with my own wiki. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 12:56, 22 January 2013 (UTC)

== Orphaned / Blank Pages ==

I've been making an attempt to clear up some of the orphaned pages, but there are a few I'm not sure of - maybe Dan or someone could sort them out?

* [[Emulation]]
* [[FAQ:File Format]]
* [[File format extensions list]] (seems to be used for the "ext:" pages but hasn't been updated)
* [[Library]]
* [[Original Plan]]
* [[RAD Game Tools]] (should probably have the individual formats moved to appropriate sections)
* [[Statistica]] (clearly belongs in Scientific Data formats, but I'm not sure where)

I've also come across a few pages that should probably be deleted - either because they've been blanked at some point (I know I did this to a few pages) or because they contain data duplicated elsewhere.

* [[AA]]
* [[Compressed executable (.com)]]
* [[SAP]]
* [[Barnes & Noble Fixed-layout Format]]

[[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 10:41, 22 January 2013 (UTC)

:OK, I deleted those last three; I'll look at the others. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 12:58, 22 January 2013 (UTC)
:I put Statistica under "Mathematics" in the science category. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:02, 22 January 2013 (UTC)

Hi Dan, got another one for you - I merged the info from [[ODS files created by Microsoft Office 2007 SP2]] into the main [[OpenDocument Spreadsheet]] page. [[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 13:59, 25 February 2013 (UTC)

Added Barnes & Noble to the list (made a bit of a mess and forgot about the rename feature) [[User:Johanvanderknijff|Johanvanderknijff]] ([[User talk:Johanvanderknijff|talk]]) 19:05, 21 April 2016 (UTC)

== Permissions for user pages ==

Is there any way we can get permission to delete sub-pages of our own user pages? I've been using mine to draft articles bit by bit, rather than release half-finished articles into the wild, and it would be nice to be able to remove the drafts once complete [[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 12:43, 3 October 2013 (UTC)

:I'm not sure, but as an admin I can delete anything you ask. It might also be possible to use the Move function to move it directly into the intended place. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 16:45, 3 October 2013 (UTC)

== cd.textfiles.com ==
All the files on http://cd.textfiles.com/ disappeared a few days ago, breaking about a million links on this wiki. Does anyone have any information about that? [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 18:48, 25 January 2015 (UTC)

:As I recall from Jason's Twitter feed, he had some server problems, with most of his sites going down at least temporary, and most of them eventually coming back up, but maybe that one had a harder crash. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 19:50, 25 January 2015 (UTC)

== Broken image in footer ==
The "Creative Commons 0" image at the bottom of every page (https://www.mediawiki.org/w/skins/common/images/cc-0.png) is broken. Can that be fixed? [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 00:06, 10 July 2015 (UTC)
:Still broken 5 years later... Is this place even maintained? [[User:GoodClover|GoodClover]] ([[User talk:GoodClover|talk]]) 23:23, 12 March 2021 (UTC)
::Ok so it appears it should probably be [https://licensebuttons.net/l/zero/1.0/88x31.png this image], it matches the 88x31px that the HTML claims the image would be if it was there. Who maintains this site so it can be fixed? [[User:GoodClover|GoodClover]] ([[User talk:GoodClover|talk]]) 00:01, 13 March 2021 (UTC)
:::I guess that would be Jason Scott. I'm an admin, but if I have any ability to edit that part of the site I have no idea how. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 01:31, 13 March 2021 (UTC)

== Wikipedia links ==
At least in my geographical area, Wikipedia has been redirecting "http:" links to "https:". So, all of the <nowiki>[[Wikipedia:...]]</nowiki> links in this wiki are getting redirected. Could/should we change these links to use "https:" directly?

The magic "RFC" links like RFC 822 could also use https:, though the http: links still work. [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 00:10, 10 July 2015 (UTC)

== Google Code ==
We still have around 50 articles that link to Google Code. My understanding is that the next phase of Google Code's shutdown process will happen on 2016-01-25 (two weeks from today). It would be good to update as many of these as possible before then.
* [http://fileformats.archiveteam.org/index.php?title=Special%3ALinkSearch&target=http%3A%2F%2Fcode.google.com&namespace= links to http://code.google.com]
* [http://fileformats.archiveteam.org/index.php?title=Special%3ALinkSearch&target=https%3A%2F%2Fcode.google.com&namespace= links to https://code.google.com]
[[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 21:05, 11 January 2016 (UTC)

== Cleanup of top-level categories ==
(Call for objections.) I want to do some cleanup of the [[:Category:Top Level Categories|top-level categories]], and make sure there's at least one category for virtually every article. (See [[Special:UncategorizedPages]].) My plans:
* A new "Meta" category, for articles about the File Formats Wiki (e.g. [[FAQ]], [[Original Plan]], [[Statement of Project]], [[Main Page]], ...).
* Rename the [[:Category:Geek humor|Geek humor]] category to "Humor"
* Remove the [[:Category:Computer facts|Computer facts]] category
* A new "Information" category, for relevant informative articles ([[Ontology]], [[Patents]], ...) that don't have a more suitable top-level category.
* Maybe someday: A category named "Devices", or "Hardware", or even "Things". Most computers and [[Networked devices]] just aren't formats, IMHO. (But I'm not going to delete the infobox from all the "Networked devices" articles. If we can't figure out a way to have infoboxes for nonformats, then I'll leave them be.)
[[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 15:56, 1 June 2017 (UTC)

[[Category:Meta]]

== Love It! ==
Hi there, kudos to all you guys who helped create this valuable resource. Wikipedia is such a snob when it comes to detailed technical documentation so this wiki is a lifesaver. I added a few things to:

* [[SWF#Software]]
* [[FLA#Software]]
* [[BSON#Libraries]]

Thanks again!

PS: Can the "thumbs up" icon be changed to something better? Do you want me to design a possible logo?

[[User:Hgupta|Hgupta]] ([[User talk:Hgupta|talk]]) 05:42, 17 August 2017 (UTC)

:Nice work! As for the thumb icon, you'd have to ask Jason Scott, the owner of this site (and the one who put the thumb up). [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:02, 17 August 2017 (UTC)

:I support the idea of changing the logo. [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 16:08, 18 August 2017 (UTC)

== What time is it? ==
I'm making this edit at 17:10 UTC, but the timestamp is: [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 17:25, 2 May 2018 (UTC)
:"Does anybody really know what time it is; does anybody really care?" -- Chicago
[posted at 01:20 UTC; let's see when it thinks it is] [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 01:36, 3 May 2018 (UTC)

== Type / Creator codes ==

Curious what everyone's thoughts are on collecting Type/Creator Codes for Macintosh formats. There seems to be a few attempts at doing this around the webs. Is there a way here to gather them all into one area of the wiki? --[[User:Thorsted|Thorsted]] ([[User talk:Thorsted|talk]]) 17:46, 4 May 2019 (UTC)

* [[Wikipedia:Type_code|Type Code : Wikipedia]]
* [[Wikipedia:Creator_code|Creator Code : Wikipedia]]
* [http://www.lacikam.co.il/tcdb/ TCDBx unmaintained]
* [https://vintageapple.org/macprogramming/pdf/The_Programmers_Apple_Mac_Sourcebook_1989.pdf The Programmers Apple Mac Sourcebook]
* [https://www.macdisk.com/macsigen.php Mac Signatures]

:Maybe do it similar to how file extensions are handled, as an item in the infobox that links to a category? [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 19:09, 4 May 2019 (UTC)

::An article for Mac type/creator codes has been on my to-do list for a while, so we could at least do that, and see if there's any interest in listing lots of codes there. Should it be one article, or two? FormatInfo already has a "type code" param that is supposed to be for the Mac code. Maybe we are supposed to make a "Type Code" template to go along with it, so we can do like "<nowiki>|type code={{Type Code|XXXX}}</nowiki>". [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 21:07, 4 May 2019 (UTC)

:::If they were listed in a single article as opposed to a series of categories, I don't see what there would be for a template would do. In that case, the text on the left side of the infobox could link to the list page (although this might be ugly). (It would be convenient if there was something between the complexity of the MediaWiki category system and a list page, but I don't think anything like that exists in a plain Mediawiki installation.) [[User:Effect2|Effect2]] ([[User talk:Effect2|talk]]) 21:30, 4 May 2019 (UTC)

::Even if they went into the infobox, the category system could potentially be left out out, as is currently done with FOURCCs and MIMETypes (the latter links to an external database, but whether anything is there is based on luck more than anything else, as there are so many unregistered mimetypes). These can still be found with the wiki's search feature. [[User:Effect2|Effect2]] ([[User talk:Effect2|talk]]) 21:13, 4 May 2019 (UTC)

:::And there's also the Creator Code, as noted above; that refers to what program created the file, so there might be several associated with one file type code (and several file type codes associated with one creator). Perhaps there needs to be a section of the article listing all the code values associated with a given format and/or program (depending on what's covered by the article). [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 21:44, 4 May 2019 (UTC)

::I like the idea of at least a uniform template for using codes within format descriptions. Since most of the files from the early macintosh days don't have an extension, unless they were cross platform and the Windows extension is used, then the only way to identify the file is from its Type/Creator code. I don't think Apple ever released the full registry, but some estimates are well over 50,000 entries.--[[User:Thorsted|Thorsted]] ([[User talk:Thorsted|talk]]) 03:24, 5 May 2019 (UTC)

== Reverse engineering formats ==
I am trying to reverse engineer some formats. Sometimes successfully, sometimes not. My most recent attempt is:
* http://fileformats.archiveteam.org/wiki/DGI_(Digi-Pic)

Maybe we can do this together instead of everyone here focusing on different things? Also is there a better way to discuss things than writing here?
[[User:Tekkno|Tekkno]] ([[User talk:Tekkno|talk]]) 01:39, 9 May 2019 (UTC)

:You should set up a NNTP for reverse engineering file formats discussions (if there isn't already the appropriaate newsgroup). (I had done some of my own reverse engineering file formats too, but I have not set up a NNTP to discuss them. I do have a NNTP server, so you can suggest newsgroups there if wanted, I suppose) --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 21:26, 23 August 2021 (UTC)

== CAPTCHA ==

AT is no longer on EFnet: https://archiveteam.org/index.php?title=Archiveteam:IRC#Special_ArchiveTeam_IRC_rules [[User:Arlo James Barnes|Arlo James Barnes]] ([[User talk:Arlo James Barnes|talk]]) 02:57, 8 November 2020 (UTC)

: This is a pretty serious problem. Are there any plans to fix it? -[[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 16:22, 12 November 2020 (UTC)

:: <del>Seems like it has been fixed, by removing the CAPTCHA altogether.</del> [2021-12-30 edit: I spoke too soon; still 'efnet'.] Let's all keep a keen eye out for spamdalism. [[User:Arlo James Barnes|Arlo James Barnes]] ([[User talk:Arlo James Barnes|talk]]) 03:33, 23 November 2020 (UTC)

== [[special:interwiki]] ==
don't see it at [[special:specialpages]]? [[User:Arlo James Barnes|Arlo James Barnes]] ([[User talk:Arlo James Barnes|talk]]) 02:57, 8 November 2020 (UTC)

: Perhaps the [https://www.mediawiki.org/wiki/Extension:Interwiki Interwiki extension] is not installed? — [[User:Rjt|rjt]] ([[User talk:Rjt|talk]]) 15:16, 30 December 2023 (UTC)

== List of my idea what maybe should be added on ==
My idea of what things I think that probably should be added on (when someone has the information of it to add):
* TRON character encoding
* TRON Application Databus
* BANCStar
* C67 (music)
(I might add a few others later if I will remember some more later, too) --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 09:30, 31 July 2021 (UTC)

==Hello==
I’ve joined and made a few edits. Starting with the Linux page by modification of the attribution of Linux to iOS, which is BSD.
Mad a few tweaks to HLP by creating a page for the source file.
As a retro tech enthusiast I think I could help a bit on some of the older files. Especially tape and disk formats; then, and their modern emulation files. As well as format info.
You can take a look at the original writeup I did here [http://wiki.digital-digest.com/index.php?title=History_of_AV]. Thought it has errors and is considerably lacking many formats.

==SSL and SEO==
SSL has been a factor in web site indexing for a while https://security.googleblog.com/2014/08/https-as-ranking-signal_6.html. Anecdotally I am seeing this more profoundly with personal websites. I wonder if Just Solved can be upgraded to HTTPS sometime in the near future. This should help SEO rankings which benefits us all as it attracts more users. It also protects us using the site too. [[User:Ross-spencer|Ross-spencer]] ([[User talk:Ross-spencer|talk]]) 07:34, 3 January 2023 (UTC)

:You'd have to ask Jason about this server-level stuff. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 16:16, 3 January 2023 (UTC)

:Please do not make TLS mandatory. However, optional TLS is a good idea. --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 21:13, 5 January 2023 (UTC)

:NB. SSL is listed with further rationale in the TODO: http://fileformats.archiveteam.org/wiki/JustSolve:To_Do [[User:Ross-spencer|Ross-spencer]] ([[User talk:Ross-spencer|talk]]) 09:49, 11 June 2024 (UTC)

== Mediawiki version ==

What are the plans to upgrade this site? Wikimedia is currently at 1.38, Just Solved File Formats is 1.19 with it's dependencies MySQL and PHP somewhat far behind current standards too. [[User:Ross-spencer|Ross-spencer]] ([[User talk:Ross-spencer|talk]]) 07:34, 3 January 2023 (UTC)

: Over the past six weeks, I've sent a few emails to Jason Scott asking how we could get some maintenance for this site. Although he has replied, with some indication that he might be willing to help, I haven't been able to figure out the right thing to say to make that actually happen. At this point, I'm not optimistic that my emails will be sufficient. -[[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 14:48, 25 February 2023 (UTC)

:: [[special:version#Installed_software]] is where MW stores the version, for anyone wondering. [[User:Arlo James Barnes|Arlo James Barnes]] ([[User talk:Arlo James Barnes|talk]]) 05:08, 3 January 2025 (UTC)

== About this community portal ==

Hi there! I'm a new editor here <sup>^^</sup> I would like to know a bit more on the status of the project.

Also, I've noticed that both this page and its associated [[{{TALKPAGENAME}}|talk page]] are being used as the community portal. I think we should choose just one of them, consolidate all the discussion there, and redirect the unused page to the one we decide to keep.

For reference, Inkipedia uses the [https://splatoonwiki.org/wiki/Inkipedia_talk:Ink_Pump talk page] as the central community space, with the project page redirecting to it, which works well since the talk page includes an "Add topic" button for easy participation. [[User:It's moon|It's moon]] ([[User talk:It's moon|talk]]) 04:18, 23 May 2025 (UTC)

== File information template ==

On another topic, would y'all be interested in having a template to add to file pages so we can provide license, source and author info? I created [https://deadmau5.miraheze.org/wiki/Template:File this template] on another wiki I edit. [[User:It's moon|It's moon]] ([[User talk:It's moon|talk]]) 04:18, 23 May 2025 (UTC)

== CAPTCHA issue ==

The other day I made some edits to the [[Quattro_Pro]] page. I was able to save most of them, but after another edit that inserted a "references" tag I end with a CAPTCHA: "Your edit includes new external links. To help protect against automated spam, please answer the question that appears below", with the "question" being "Write wiki@textfiles.com to ask for an account". This is a bit puzzling as I already have an account and I'm seeing this while logged in! Does anyone know why this happens (or better, have a fix)? Thanks!

[[User:Johanvanderknijff|Johanvanderknijff]] ([[User talk:Johanvanderknijff|talk]]) 11:55, 3 June 2025 (UTC)

:Try inputting the password that was provided via email, when creating your account. [[User:Anonymoususer852|Anonymoususer852]] ([[User talk:Anonymoususer852|talk]]) 21:45, 30 July 2025 (UTC)

:: What about when I never received verification email and I still can't make any constructive edits due to it? [[User:265 993 303|265 993 303]] ([[User talk:265 993 303|talk]]) 07:50, 22 February 2026 (UTC)

::I get stuck in a scenario where [[Special:ConfirmEmail]] says "A confirmation code has already been e-mailed to you; if you recently created your account, you may wish to wait a few minutes for it to arrive before trying to request a new code.", after pressing "Mail a confirmation code" button it says "Confirmation e-mail sent.", but I never receive the confirmation at all (not in spam either), yet despite already having the correct e-mail address on my account. [[User:265 993 303|265 993 303]] ([[User talk:265 993 303|talk]]) 19:54, 13 May 2026 (UTC)

== Removing copyrighted images ==

There are some images on the wiki that are likely copyrighted, and thus are not suitable for the wiki's CC0 license. In particular:
* [[:File:Boyfriend.gif]], [[:File:Girlfriend.png]] (art assets from the game Friday Night Funkin; see [https://web.archive.org/web/20250511162953/https://en.wikipedia.org/wiki/File:Boyfriend-2.png discussion of copyright situation at Wikipedia])
* [[:File:Arcade-gaming-1361761483gDu.jpg]] (the author might have uploaded this to the internet as "public domain", but the focus of the image is a screenshot of the copyrighted Sega game ''Rambo'')
Is there a standard way to request their deletion? [[User:Havoc Crow|Havoc Crow]] ([[User talk:Havoc Crow|talk]]) 06:10, 28 June 2025 (UTC)
:If nothing else, there should be contact information at [[Talk:Main Page]].

Since none of these images are currently in use on any page, and the copyright status has been brought into question, I have deleted them. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 18:30, 22 February 2026 (UTC)

== IRC links are invalid on Main page ==
<s>[[Main_Page]] contains potentially obsoleted IRC server and channel. These should be pointed to Hackint IRC network with the same channel name; #justsolve. I don't have permissions to edit that page, or create talk page for that.</s>

<s>[[User:Anonymoususer852|Anonymoususer852]] ([[User talk:Anonymoususer852|talk]]) 18:35, 2 August 2025 (UTC)</s>

This is already noted on [[JustSolve:To_Do]], under "Admin side". --[[User:Anonymoususer852|Anonymoususer852]] ([[User talk:Anonymoususer852|talk]]) 18:47, 2 August 2025 (UTC)

Just Solve the File Format Problem:Community portal

2026-02-22T07:50:50Z

265 993 303: /* CAPTCHA issue */

: ''please add your signature by typing <nowiki>~~~~</nowiki> if you add or reply

== Open issues ==

Below is a list of "issues" which would ordinarily be in a ticketing system of some kind, but are here on the Wiki instead, because that's how we roll. As things are resolved, they will be moved to the Discussion page. If there's an appeal or an issue, the conversation can continue there - this page will be for open issues.

Use of case in URLS / links. I went through all the electronic format types pages, and tried to normalise all the pages where I could (there was a mix of link structures - I've tried to get them all (apart from animation - I've been at it all day!) so they are [[file extension]] - [[file type name]].
I notice that we have a mix of upper and lower case file extension through out. This means we may have 2 links which should point to the same URL (e.g. [[mix]] and [[MIX]]) is this a known issue with the current layout? --[[User:JaygattusoNLNZ|JaygattusoNLNZ]] ([[User talk:JaygattusoNLNZ|talk]]) 01:32, 20 November 2012 (UTC)
:Since you're linking both the extension and the name, does that mean that there are supposed to be separate articles for each? I don't know if there's really a need for "mainspace" articles by extension, since there are already categories for that purpose; you can browse them through [[:Category:File formats by extension]]. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 02:12, 20 November 2012 (UTC)
::I just copied the most common model that I found on the formats pages. The problem is, if you don't homogenize the method, the linking/crosslinking doesn't work properly. All instances of .doc (for example) should point to the same resource page / disambiguation page. If someone has linked to only format in one place (e.g. [[MS Word]] (.doc)), and someone else the extension (MS Word - [[doc]]), we can't makes sure they point to the same place. The problem occurs because format names and extensions are used interchangeably. You raise an interesting question about the relationship between the ext and the format name. I would argue they are not equal (1:1), nor (1:many) / (many:1) so it makes sense to protect both aspects as definable things - the extension because that's whats most commonly searched for and referred to by users and 'format name' because its more accurate. How is the [[:Category:File formats by extension]] populated? --[[User:JaygattusoNLNZ|JaygattusoNLNZ]] ([[User talk:JaygattusoNLNZ|talk]]) 18:31, 20 November 2012 (UTC)
:::The categories are inserted when you use the ext template in the infobox. My preference is to have articles by actual format name and use multiple navigation aids (menus, cats, etc.) to get to them. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 01:38, 21 November 2012 (UTC)

== Article naming convention ==

As mentioned above, there's some dispute over whether to name articles after the full name of a format or its file extension. If using full names, you then get into issues of whether to use the full technical name or a shorter thing that's more popularly used, and in some cases that's even the same as the extension (GIF, for instance). And you also get into tricky issues of capitalization: all-caps like an acronym, all-lowercase like filenames are often done (though this is OS-dependent; some, like MS-DOS, use all-uppercase), or mixed case (proper names capitalized)? And then there's the disambiguation issue of how to name articles on different things that have the same name, which happens sometimes even with long official names, but even more often with short acronyms and file extensions. But there's also yet another issue of which things get separate articles and which are combined, like formats that have had many different versions, etc.

Currently you have things like [[CI]] and [[CT]], recently-created articles that represent two different file types within the data of one type of music tracker. The spec document they link to is the same one, which documents all the file types used in that tracker. Unless there's going to be really a lot to say about each of the specific file types, my own preference would be to have one article called [[CyberTracker]] that discusses all the formats used by the program in question, with subheaders within the article for the different file types, and all the extensions listed in the infobox (and hence in associated categories). If any other indices by extension are built up, they'd also have entries for both CI and CT. For instance, when I documented [[Softdisk Family Tree]], I covered all the various file formats in one article, though there are several versions and multiple files for each. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:39, 21 November 2012 (UTC)
: I realise I'm as guilty of this as anyone, having used both forms at some point (e.g. [[Surprise! Adlib Tracker v2.0]] and [[CI]]). Indeed, the two articles - [[CI]] and [[CT]] - you refer to were created by me. I guess in general I would favour using a descriptive page name rather than simply the file extension - that seems to be something that's being taken care of by infoboxes and categories.

:On the issue of what gets a separate page and what doesn't, I guess that just comes down to individual discretion. There will be instances where a format has undergone a number of minor revisions over time or has a number of minor variants (e.g. the variant forms of Chaos Music Composer's [[CMC]]) where it would make sense to keep them all to a single page, while a major revision would necessitate a multi-page approach (e.g. the shift with Capella from the binary [[CAP]] to the XML-based [[CapXML]] format).

:However, I'm not sure I agree with [[CI]] and [[CT]] having a single [[CyberTracker]] page. While both link to the same spec document and both are used by the same program, they are different formats serving different purposes. I think in general we should try and distinguish between program and file format - [[S3M]] doesn't belong on the [[ScreamTracker]] page, although each should link to the other. [[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 14:04, 21 November 2012 (UTC)

:Since the purpose of the wiki is to document file formats, I think it's good that as many formats as possible are listed in the category pages and that you can browse these pages for format extensions. Sometimes it might be better to link multiple extension to the same article (e.g. a specific application), but not always. I think it is difficult to come up with a strict rule for this (but maybe recommendations and, even better, good examples). --[[User:PN|PN]] ([[User talk:PN|talk]]) 15:08, 21 November 2012 (UTC)

::It's a judgment call, certainly. It depends on how the files are typically encountered, distributed, used, etc., and how they're thought of by people who use them; if a bunch of file types related to a particular program are usually found together as part of a larger data set, they most likely belong together in one article (with subsections to describe the function of the particular files), but if they're distinct entities with their own particular treatment (like separate areas of file trading sites for enthusiasts) they should have separate articles, though more descriptive names like "CyberTracker instrument file" might be better than a cryptic and likely ambiguous CI. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 15:46, 21 November 2012 (UTC)
::And then, somebody has also used a robot to create pages in a separate namespace devoted to file extensions, like [[Ext:cin]]. That's yet another navigational system for getting to information by extension, though those pages oddly don't actually have direct links to the normal pages here about those file formats. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 15:56, 21 November 2012 (UTC)
::: Yes, that was me with Bender the bot. Still experimenting with it and working on creating a list of all pages in relation to extensions. [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 15:22, 22 November 2012 (UTC)
::What I'd like to avoid is the messy format somebody did to a few index pages like [[Compression]], where each line has separately hyperlinked format names and extensions (not always in a consistent order) where often one or the other is a redlink, or one redirects to the other, or one is just a disambiguation page, making a somewhat confusing hodgepodge. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 16:22, 21 November 2012 (UTC)
:::I've started rearranging the Compression page to be a little less messy. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 16:56, 22 November 2012 (UTC)

== So now what? ==
The official month of this project is now over... what are the plans for the site now? It's made a good start at documenting file formats, but has a good long way to go yet. (A project like this can never possibly be "finished", since there are always more file formats coming out of the woodwork, both new ones that are introduced, and old ones that are discovered.) [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 05:10, 1 December 2012 (UTC)
: This is an awesome project and I will stay committed to it. Of course this first month is just a start. Let's roll people! [[User:Maurice.de.rooij|Maurice.de.rooij]] ([[User talk:Maurice.de.rooij|talk]]) 23:22, 3 December 2012 (UTC)

== Anybody else still around? ==
Everybody else seems to have vanished around the middle of December... I'm the only one editing here lately. I hate to put more effort into improving a ghost town... anyone else even reading this? [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 23:16, 2 January 2013 (UTC)

: I will be editing more once I get back to work - still don't have a home internet connection and working from the local library computers / girlfriend's netbook over public wi-fi is a pain. It would be nice to see more contributions from others - you can see how much work is left to do on the music section alone, and I've really only been creating stub entries for most things. [[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 13:51, 3 January 2013 (UTC)

:Well, I still stop by on occasion, and I've vowed to use the site as my first stop when I come across a file format I don't recognize, but I never made any substantial additions, so I'm not sure if that gives you any useful information. (My edits were mostly technical or editorial.) [[User:Gphemsley|GPHemsley]] ([[User talk:Gphemsley|talk]]) 00:18, 13 January 2013 (UTC)

:I'll be editing from time-to-time. Currently a bit snowed under with other work, but planning to do more later in the year. Would also like to review the InfoBox(es) at some point, to ensure the information on this site can be reliably linked up to other information sources. [[User:AndyJackson|AndyJackson]] ([[User talk:AndyJackson|talk]]) 12:10, 18 January 2013 (UTC)

:I'm here. Like Andy, my workload is quite high, but I'll be popping in and out. --[[User:Rhetoric X|Rhetoric X]] ([[User talk:Rhetoric X|talk]]) 12:31, 18 January 2013 (UTC)

:I dip in and out when I want a challenge (or can stomach the frustration.) [[User:Foxtrot|Foxtrot]] ([[User talk:Foxtrot|talk]]) 11:47, 11 January 2023 (UTC)

:Hi there! I sometimes add a word here or there. I must say this Wiki is pretty good now. Popular formats are nicely described and niche formats are just niche formats so it's sometimes hard to add anything about them. I think that maybe it would be helpful to start adding images to posts. An image explaining format details or a screenshot of an image editor may be a nice addition. What about algorithms in pseudo-code? --[[User:Tekkno|Tekkno]] ([[User talk:Tekkno|talk]]) 0:28, 7 September 2018 (UTC)

::A description of file formats and pseudo-codes would be helpful (although you do not necessarily need a picture). --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 05:19, 5 May 2019 (UTC)

== Spam ==

I see the spammers have found the site, as I worried would happen; I run a wiki myself ([http://mpedia.dan.info/ MPedia], about things related to Mensa) and have to constantly play whack-a-mole with them; even adding such annoyances (for legitimate users) as a captcha and e-mail confirmation seem to only slightly slow the spammers down. I don't know the solution. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 12:59, 18 January 2013 (UTC)

:...but "learn-to-read-Korean-in-15-minutes" is a legitimate addition, going to a comic strip explaining the [[Hangul]] writing system, which is in fact a legitimate article here since "file formats" is interpreted expansively to include human written languages. That link ''sounds'' a bit spammy, but if it was from a spammer, it would go to some page selling a dodgy language-learning tool, not a free-to-read resource! (It can start to get tricky distinguishing spam from legitimate stuff when you've got such a wide range of topics here to begin with! Once there's a huge flood of spam to get rid of, there's some danger of legitimate users getting caught in the net too.) [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:03, 18 January 2013 (UTC)

::Yes, it's incumbent on me to make sure we can have people sign up, and be a part of it, without getting spammers. We'll keep exploring. At least bots can't take us on.... I think.... --[[User:Jason Scott|Jason Scott]] ([[User talk:Jason Scott|talk]]) 19:28, 18 January 2013 (UTC)

:::If you've got some tips about how to configure MediaWiki to have open signups but not get the flood of spambots, let me know; that would help me with my own wiki. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 12:56, 22 January 2013 (UTC)

== Orphaned / Blank Pages ==

I've been making an attempt to clear up some of the orphaned pages, but there are a few I'm not sure of - maybe Dan or someone could sort them out?

* [[Emulation]]
* [[FAQ:File Format]]
* [[File format extensions list]] (seems to be used for the "ext:" pages but hasn't been updated)
* [[Library]]
* [[Original Plan]]
* [[RAD Game Tools]] (should probably have the individual formats moved to appropriate sections)
* [[Statistica]] (clearly belongs in Scientific Data formats, but I'm not sure where)

I've also come across a few pages that should probably be deleted - either because they've been blanked at some point (I know I did this to a few pages) or because they contain data duplicated elsewhere.

* [[AA]]
* [[Compressed executable (.com)]]
* [[SAP]]
* [[Barnes & Noble Fixed-layout Format]]

[[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 10:41, 22 January 2013 (UTC)

:OK, I deleted those last three; I'll look at the others. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 12:58, 22 January 2013 (UTC)
:I put Statistica under "Mathematics" in the science category. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:02, 22 January 2013 (UTC)

Hi Dan, got another one for you - I merged the info from [[ODS files created by Microsoft Office 2007 SP2]] into the main [[OpenDocument Spreadsheet]] page. [[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 13:59, 25 February 2013 (UTC)

Added Barnes & Noble to the list (made a bit of a mess and forgot about the rename feature) [[User:Johanvanderknijff|Johanvanderknijff]] ([[User talk:Johanvanderknijff|talk]]) 19:05, 21 April 2016 (UTC)

== Permissions for user pages ==

Is there any way we can get permission to delete sub-pages of our own user pages? I've been using mine to draft articles bit by bit, rather than release half-finished articles into the wild, and it would be nice to be able to remove the drafts once complete [[User:Halftheisland|Halftheisland]] ([[User talk:Halftheisland|talk]]) 12:43, 3 October 2013 (UTC)

:I'm not sure, but as an admin I can delete anything you ask. It might also be possible to use the Move function to move it directly into the intended place. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 16:45, 3 October 2013 (UTC)

== cd.textfiles.com ==
All the files on http://cd.textfiles.com/ disappeared a few days ago, breaking about a million links on this wiki. Does anyone have any information about that? [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 18:48, 25 January 2015 (UTC)

:As I recall from Jason's Twitter feed, he had some server problems, with most of his sites going down at least temporary, and most of them eventually coming back up, but maybe that one had a harder crash. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 19:50, 25 January 2015 (UTC)

== Broken image in footer ==
The "Creative Commons 0" image at the bottom of every page (https://www.mediawiki.org/w/skins/common/images/cc-0.png) is broken. Can that be fixed? [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 00:06, 10 July 2015 (UTC)
:Still broken 5 years later... Is this place even maintained? [[User:GoodClover|GoodClover]] ([[User talk:GoodClover|talk]]) 23:23, 12 March 2021 (UTC)
::Ok so it appears it should probably be [https://licensebuttons.net/l/zero/1.0/88x31.png this image], it matches the 88x31px that the HTML claims the image would be if it was there. Who maintains this site so it can be fixed? [[User:GoodClover|GoodClover]] ([[User talk:GoodClover|talk]]) 00:01, 13 March 2021 (UTC)
:::I guess that would be Jason Scott. I'm an admin, but if I have any ability to edit that part of the site I have no idea how. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 01:31, 13 March 2021 (UTC)

== Wikipedia links ==
At least in my geographical area, Wikipedia has been redirecting "http:" links to "https:". So, all of the <nowiki>[[Wikipedia:...]]</nowiki> links in this wiki are getting redirected. Could/should we change these links to use "https:" directly?

The magic "RFC" links like RFC 822 could also use https:, though the http: links still work. [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 00:10, 10 July 2015 (UTC)

== Google Code ==
We still have around 50 articles that link to Google Code. My understanding is that the next phase of Google Code's shutdown process will happen on 2016-01-25 (two weeks from today). It would be good to update as many of these as possible before then.
* [http://fileformats.archiveteam.org/index.php?title=Special%3ALinkSearch&target=http%3A%2F%2Fcode.google.com&namespace= links to http://code.google.com]
* [http://fileformats.archiveteam.org/index.php?title=Special%3ALinkSearch&target=https%3A%2F%2Fcode.google.com&namespace= links to https://code.google.com]
[[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 21:05, 11 January 2016 (UTC)

== Cleanup of top-level categories ==
(Call for objections.) I want to do some cleanup of the [[:Category:Top Level Categories|top-level categories]], and make sure there's at least one category for virtually every article. (See [[Special:UncategorizedPages]].) My plans:
* A new "Meta" category, for articles about the File Formats Wiki (e.g. [[FAQ]], [[Original Plan]], [[Statement of Project]], [[Main Page]], ...).
* Rename the [[:Category:Geek humor|Geek humor]] category to "Humor"
* Remove the [[:Category:Computer facts|Computer facts]] category
* A new "Information" category, for relevant informative articles ([[Ontology]], [[Patents]], ...) that don't have a more suitable top-level category.
* Maybe someday: A category named "Devices", or "Hardware", or even "Things". Most computers and [[Networked devices]] just aren't formats, IMHO. (But I'm not going to delete the infobox from all the "Networked devices" articles. If we can't figure out a way to have infoboxes for nonformats, then I'll leave them be.)
[[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 15:56, 1 June 2017 (UTC)

[[Category:Meta]]

== Love It! ==
Hi there, kudos to all you guys who helped create this valuable resource. Wikipedia is such a snob when it comes to detailed technical documentation so this wiki is a lifesaver. I added a few things to:

* [[SWF#Software]]
* [[FLA#Software]]
* [[BSON#Libraries]]

Thanks again!

PS: Can the "thumbs up" icon be changed to something better? Do you want me to design a possible logo?

[[User:Hgupta|Hgupta]] ([[User talk:Hgupta|talk]]) 05:42, 17 August 2017 (UTC)

:Nice work! As for the thumb icon, you'd have to ask Jason Scott, the owner of this site (and the one who put the thumb up). [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 13:02, 17 August 2017 (UTC)

:I support the idea of changing the logo. [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 16:08, 18 August 2017 (UTC)

== What time is it? ==
I'm making this edit at 17:10 UTC, but the timestamp is: [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 17:25, 2 May 2018 (UTC)
:"Does anybody really know what time it is; does anybody really care?" -- Chicago
[posted at 01:20 UTC; let's see when it thinks it is] [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 01:36, 3 May 2018 (UTC)

== Type / Creator codes ==

Curious what everyone's thoughts are on collecting Type/Creator Codes for Macintosh formats. There seems to be a few attempts at doing this around the webs. Is there a way here to gather them all into one area of the wiki? --[[User:Thorsted|Thorsted]] ([[User talk:Thorsted|talk]]) 17:46, 4 May 2019 (UTC)

* [[Wikipedia:Type_code|Type Code : Wikipedia]]
* [[Wikipedia:Creator_code|Creator Code : Wikipedia]]
* [http://www.lacikam.co.il/tcdb/ TCDBx unmaintained]
* [https://vintageapple.org/macprogramming/pdf/The_Programmers_Apple_Mac_Sourcebook_1989.pdf The Programmers Apple Mac Sourcebook]
* [https://www.macdisk.com/macsigen.php Mac Signatures]

:Maybe do it similar to how file extensions are handled, as an item in the infobox that links to a category? [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 19:09, 4 May 2019 (UTC)

::An article for Mac type/creator codes has been on my to-do list for a while, so we could at least do that, and see if there's any interest in listing lots of codes there. Should it be one article, or two? FormatInfo already has a "type code" param that is supposed to be for the Mac code. Maybe we are supposed to make a "Type Code" template to go along with it, so we can do like "<nowiki>|type code={{Type Code|XXXX}}</nowiki>". [[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 21:07, 4 May 2019 (UTC)

:::If they were listed in a single article as opposed to a series of categories, I don't see what there would be for a template would do. In that case, the text on the left side of the infobox could link to the list page (although this might be ugly). (It would be convenient if there was something between the complexity of the MediaWiki category system and a list page, but I don't think anything like that exists in a plain Mediawiki installation.) [[User:Effect2|Effect2]] ([[User talk:Effect2|talk]]) 21:30, 4 May 2019 (UTC)

::Even if they went into the infobox, the category system could potentially be left out out, as is currently done with FOURCCs and MIMETypes (the latter links to an external database, but whether anything is there is based on luck more than anything else, as there are so many unregistered mimetypes). These can still be found with the wiki's search feature. [[User:Effect2|Effect2]] ([[User talk:Effect2|talk]]) 21:13, 4 May 2019 (UTC)

:::And there's also the Creator Code, as noted above; that refers to what program created the file, so there might be several associated with one file type code (and several file type codes associated with one creator). Perhaps there needs to be a section of the article listing all the code values associated with a given format and/or program (depending on what's covered by the article). [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 21:44, 4 May 2019 (UTC)

::I like the idea of at least a uniform template for using codes within format descriptions. Since most of the files from the early macintosh days don't have an extension, unless they were cross platform and the Windows extension is used, then the only way to identify the file is from its Type/Creator code. I don't think Apple ever released the full registry, but some estimates are well over 50,000 entries.--[[User:Thorsted|Thorsted]] ([[User talk:Thorsted|talk]]) 03:24, 5 May 2019 (UTC)

== Reverse engineering formats ==
I am trying to reverse engineer some formats. Sometimes successfully, sometimes not. My most recent attempt is:
* http://fileformats.archiveteam.org/wiki/DGI_(Digi-Pic)

Maybe we can do this together instead of everyone here focusing on different things? Also is there a better way to discuss things than writing here?
[[User:Tekkno|Tekkno]] ([[User talk:Tekkno|talk]]) 01:39, 9 May 2019 (UTC)

:You should set up a NNTP for reverse engineering file formats discussions (if there isn't already the appropriaate newsgroup). (I had done some of my own reverse engineering file formats too, but I have not set up a NNTP to discuss them. I do have a NNTP server, so you can suggest newsgroups there if wanted, I suppose) --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 21:26, 23 August 2021 (UTC)

== CAPTCHA ==

AT is no longer on EFnet: https://archiveteam.org/index.php?title=Archiveteam:IRC#Special_ArchiveTeam_IRC_rules [[User:Arlo James Barnes|Arlo James Barnes]] ([[User talk:Arlo James Barnes|talk]]) 02:57, 8 November 2020 (UTC)

: This is a pretty serious problem. Are there any plans to fix it? -[[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 16:22, 12 November 2020 (UTC)

:: <del>Seems like it has been fixed, by removing the CAPTCHA altogether.</del> [2021-12-30 edit: I spoke too soon; still 'efnet'.] Let's all keep a keen eye out for spamdalism. [[User:Arlo James Barnes|Arlo James Barnes]] ([[User talk:Arlo James Barnes|talk]]) 03:33, 23 November 2020 (UTC)

== [[special:interwiki]] ==
don't see it at [[special:specialpages]]? [[User:Arlo James Barnes|Arlo James Barnes]] ([[User talk:Arlo James Barnes|talk]]) 02:57, 8 November 2020 (UTC)

: Perhaps the [https://www.mediawiki.org/wiki/Extension:Interwiki Interwiki extension] is not installed? — [[User:Rjt|rjt]] ([[User talk:Rjt|talk]]) 15:16, 30 December 2023 (UTC)

== List of my idea what maybe should be added on ==
My idea of what things I think that probably should be added on (when someone has the information of it to add):
* TRON character encoding
* TRON Application Databus
* BANCStar
* C67 (music)
(I might add a few others later if I will remember some more later, too) --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 09:30, 31 July 2021 (UTC)

==Hello==
I’ve joined and made a few edits. Starting with the Linux page by modification of the attribution of Linux to iOS, which is BSD.
Mad a few tweaks to HLP by creating a page for the source file.
As a retro tech enthusiast I think I could help a bit on some of the older files. Especially tape and disk formats; then, and their modern emulation files. As well as format info.
You can take a look at the original writeup I did here [http://wiki.digital-digest.com/index.php?title=History_of_AV]. Thought it has errors and is considerably lacking many formats.

==SSL and SEO==
SSL has been a factor in web site indexing for a while https://security.googleblog.com/2014/08/https-as-ranking-signal_6.html. Anecdotally I am seeing this more profoundly with personal websites. I wonder if Just Solved can be upgraded to HTTPS sometime in the near future. This should help SEO rankings which benefits us all as it attracts more users. It also protects us using the site too. [[User:Ross-spencer|Ross-spencer]] ([[User talk:Ross-spencer|talk]]) 07:34, 3 January 2023 (UTC)

:You'd have to ask Jason about this server-level stuff. [[User:Dan Tobias|Dan Tobias]] ([[User talk:Dan Tobias|talk]]) 16:16, 3 January 2023 (UTC)

:Please do not make TLS mandatory. However, optional TLS is a good idea. --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 21:13, 5 January 2023 (UTC)

:NB. SSL is listed with further rationale in the TODO: http://fileformats.archiveteam.org/wiki/JustSolve:To_Do [[User:Ross-spencer|Ross-spencer]] ([[User talk:Ross-spencer|talk]]) 09:49, 11 June 2024 (UTC)

== Mediawiki version ==

What are the plans to upgrade this site? Wikimedia is currently at 1.38, Just Solved File Formats is 1.19 with it's dependencies MySQL and PHP somewhat far behind current standards too. [[User:Ross-spencer|Ross-spencer]] ([[User talk:Ross-spencer|talk]]) 07:34, 3 January 2023 (UTC)

: Over the past six weeks, I've sent a few emails to Jason Scott asking how we could get some maintenance for this site. Although he has replied, with some indication that he might be willing to help, I haven't been able to figure out the right thing to say to make that actually happen. At this point, I'm not optimistic that my emails will be sufficient. -[[User:Jsummers|Jsummers]] ([[User talk:Jsummers|talk]]) 14:48, 25 February 2023 (UTC)

:: [[special:version#Installed_software]] is where MW stores the version, for anyone wondering. [[User:Arlo James Barnes|Arlo James Barnes]] ([[User talk:Arlo James Barnes|talk]]) 05:08, 3 January 2025 (UTC)

== About this community portal ==

Hi there! I'm a new editor here <sup>^^</sup> I would like to know a bit more on the status of the project.

Also, I've noticed that both this page and its associated [[{{TALKPAGENAME}}|talk page]] are being used as the community portal. I think we should choose just one of them, consolidate all the discussion there, and redirect the unused page to the one we decide to keep.

For reference, Inkipedia uses the [https://splatoonwiki.org/wiki/Inkipedia_talk:Ink_Pump talk page] as the central community space, with the project page redirecting to it, which works well since the talk page includes an "Add topic" button for easy participation. [[User:It's moon|It's moon]] ([[User talk:It's moon|talk]]) 04:18, 23 May 2025 (UTC)

== File information template ==

On another topic, would y'all be interested in having a template to add to file pages so we can provide license, source and author info? I created [https://deadmau5.miraheze.org/wiki/Template:File this template] on another wiki I edit. [[User:It's moon|It's moon]] ([[User talk:It's moon|talk]]) 04:18, 23 May 2025 (UTC)

== CAPTCHA issue ==

The other day I made some edits to the [[Quattro_Pro]] page. I was able to save most of them, but after another edit that inserted a "references" tag I end with a CAPTCHA: "Your edit includes new external links. To help protect against automated spam, please answer the question that appears below", with the "question" being "Write wiki@textfiles.com to ask for an account". This is a bit puzzling as I already have an account and I'm seeing this while logged in! Does anyone know why this happens (or better, have a fix)? Thanks!

[[User:Johanvanderknijff|Johanvanderknijff]] ([[User talk:Johanvanderknijff|talk]]) 11:55, 3 June 2025 (UTC)

:Try inputting the password that was provided via email, when creating your account. [[User:Anonymoususer852|Anonymoususer852]] ([[User talk:Anonymoususer852|talk]]) 21:45, 30 July 2025 (UTC)

:: What about when I never received verification email and I still can't make any constructive edits due to it? [[User:265 993 303|265 993 303]] ([[User talk:265 993 303|talk]]) 07:50, 22 February 2026 (UTC)

== Removing copyrighted images ==

There are some images on the wiki that are likely copyrighted, and thus are not suitable for the wiki's CC0 license. In particular:
* [[:File:Boyfriend.gif]], [[:File:Girlfriend.png]] (art assets from the game Friday Night Funkin; see [https://web.archive.org/web/20250511162953/https://en.wikipedia.org/wiki/File:Boyfriend-2.png discussion of copyright situation at Wikipedia])
* [[:File:Arcade-gaming-1361761483gDu.jpg]] (the author might have uploaded this to the internet as "public domain", but the focus of the image is a screenshot of the copyrighted Sega game ''Rambo'')
Is there a standard way to request their deletion? [[User:Havoc Crow|Havoc Crow]] ([[User talk:Havoc Crow|talk]]) 06:10, 28 June 2025 (UTC)
:If nothing else, there should be contact information at [[Talk:Main Page]].

== IRC links are invalid on Main page ==
<s>[[Main_Page]] contains potentially obsoleted IRC server and channel. These should be pointed to Hackint IRC network with the same channel name; #justsolve. I don't have permissions to edit that page, or create talk page for that.</s>

<s>[[User:Anonymoususer852|Anonymoususer852]] ([[User talk:Anonymoususer852|talk]]) 18:35, 2 August 2025 (UTC)</s>

This is already noted on [[JustSolve:To_Do]], under "Admin side". --[[User:Anonymoususer852|Anonymoususer852]] ([[User talk:Anonymoususer852|talk]]) 18:47, 2 August 2025 (UTC)

Plain text

2025-09-13T09:59:20Z

265 993 303: Unicode as of 17.0 still does not include U+0A00 or U+0A0D

{{FormatInfo
|formattype=electronic
|subcat=Document
|extensions={{ext|txt}}, {{ext|text}}, {{ext|doc}}, {{ext|asc}}, {{noext}}, many others
|mimetypes={{mimetype|text/plain}}
|pronom={{PRONOM|x-fmt/111}}
|wikidata={{wikidata|Q1145976}}
}}
'''Plain text''' files (also known by the extension TXT) consist of characters encoded sequentially in some particular [[character encoding]]. Plain text files contain no formatting information other than white space characters. Some data formats (usually those intended to be human-readable) are based on plain text; see [[Text-based data]] for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).

Traditionally, [[ASCII]] was used much of the time for maximum interoperability, though many platform-specific character sets were also in use. For non-English text an encoding supporting a broader character repertoire is needed, often [[UTF-8]] nowadays. Note that if the file consists only of 7-bit ASCII characters, the bytes of the file are identical in us-ascii, ISO-8859-1, UTF-8, and a number of other encodings, so such a file can be identified as any of these depending on what is most convenient for a particular application. It is only when characters out of this repertoire are used that encoding-specific details need be considered. Some formats, such as [[HTML]] and [[XML]], provide some sort of escape sequences (such as ampersands used for character references and entities) allowing special characters to be referenced within the document while leaving the document itself entirely ASCII.

Another point of contention or incompatibility in text-file formats is the conventions for line and paragraph breaks. Depending on what system the file was created on or intended to be viewed on, line breaks may be done as Carriage Return (ASCII 0D hex) and Linefeed (ASCII 0A hex) together (usually in that order, though in rare cases in the opposite order), or just one of those characters alone. Some text viewing or editing programs that are not cross-platform-friendly will really mess up badly in attempting to view/edit files using a different line break convention than the program expects, so you might see lines overwriting one another instead of going to the next line, or peculiar control characters show up within the file, or other strangeness. Files with linefeed alone are often referred to as "UNIX mode" (and the linefeed, in this context, referred to as NL for Newline), while files with carriage return alone are referred to as "Mac mode" (though it's also common in other early platforms such as the Apple II and Commodore 64, and no longer used in current Macs), while the CR+LF format is called "DOS" or "PC" or "Windows" mode (though it was used in various mainframes and network protocols as well).

Files may also use hard line breaks to keep line length within a fixed number of columns (usually 80, but other values such as 40 or 65 are used sometimes), or just have line breaks at the end of paragraphs and expect systems to word-wrap long lines; encountering files of a different convention than you expect may result in lines running way off to the right of the screen and requiring horizontal scrolling, or else short, choppy lines. Many text editors have a "paragraph reformat" command to bring paragraphs into compliance with your desired conventions.

Most operating systems include a simple text editor (e.g., Windows Notepad) which can open text files, but many other text editors exist (and computer people sometimes have "holy wars" over which one is best). Some of the common text editors are EMACS, vi, and UltraEdit. In the earlier days of computing, there was less distinction between text editors and word processors than there is now, as word processors generally used a format that was mostly plain text and could even be completely plain text if you refrained from using special embedded commands and features. However, modern word processors such as Microsoft Word default to using program-specific save formats that have little resemblance to plain text, unless you go out of your way to "Save As" .txt. A common "newbie error" is to attempt to create or edit plain text files in such a program, leaving the files as proprietarily-formatted in a way that messes up the operation of other programs that expect to find plain text.

Creating artwork using text characters is known as [[ASCII Art]], or other variants such as [[ANSI Art]] if special control or escape codes are used in addition to the plain text characters.

== Extension ==

The traditional extension for text files is <code>.txt</code>, but lots of other extensions have been used. Occasionally on systems permitting extensions longer than three letters, <code>.text</code> has been used, and <code>.asc</code> for ASCII has also had some use; <code>.doc</code> has also sometimes been used for files "documenting" something (like the manual accompanying a piece of downloaded software), but that went out of common use once that extension became associated with Microsoft Word's [[DOC]] format.

== Identification ==

[[UTF-32]] text files are arrays of 32-bit integers representing Unicode code points and are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes <code>FF FE 00 00</code> (for little endian <code>0x0000FEFF</code>) or <code>00 00 FE FF</code> (for big endian <code>0x0000FEFF</code>). In some cases UTF-32 files may occur without the BOM, however, only <code>0x00000000</code>—<code>0x0000D7FF</code> and <code>0x0000E000</code>—<code>0x0010FFFF</code> are valid ranges for dwords; <code>0x0000D800</code>—<code>0x0000DFFF</code> and <code>0x00110000</code>—<code>0xFFFFFFFF</code> are invalid.

[[UTF-16]] text files are arrays of 16-bit integers representing code units and are usually detected by starting with the byte order mark (BOM) consisting of the bytes <code>FF FE</code> (for little endian <code>0xFEFF</code>) or <code>FE FF</code> (for big endian <code>0xFEFF</code>). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (<code>0x000A</code>) in its byte reversal (<code>0x0A00</code>) is not in ''Unicode 17.0'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned <code>00 0A</code> or <code>0A 00</code> can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes <code>0D 0A</code> in little endian form <code>U+0A0D</code> which is not in ''Unicode 17.0'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by <code>0xD800</code>—<code>0xDBFF</code> followed by <code>0xDC00</code>—<code>0xDFFF</code>, with other combinations of <code>0xD800</code>—<code>0xDFFF</code> being invalid.

[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all <code>0x01</code>—<code>0x7F</code> bytes. <code>0x80</code>—<code>0xFF</code> are not used in ASCII encoding, and null characters by <code>0x00</code> are not typically found in plain text; null bytes are much more likely to be in UTF-16 or UTF-32 text.

[[UTF-8]] text files may be detected by presence of any bytes from <code>0x80</code>—<code>0xFF</code> (to avoid processing ASCII-only files as UTF-8), absence of null bytes (if UTF-16 and UTF-32 haven't been ruled out yet), and verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are <code>0xxxxxxx</code> (where x forms <code>0x00</code>—<code>0x7F</code>), <code>110xxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x0080</code>—<code>0x07FF</code>, but not <code>0x00</code>—<code>0x7F</code>), <code>1110xxxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x0800</code>—<code>0xD7FF</code> <code>0xE000</code>—<code>0xFFFF</code>, but not <code>0x0000</code>—<code>0x07FF</code> or <code>0xD800</code>—<code>0xDFFF</code>), and <code>11110xxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x10000</code>—<code>0x10FFFF</code>, but not <code>0x0000</code>—<code>0xFFFF</code> or <code>0x110000</code>—<code>0x1FFFFF</code>). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF), but should still be verified for validity.

When a file is known to be a plain text file but [[UTF-32]], [[UTF-16]], [[ASCII]], and [[UTF-8]] were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as [[JIS|Shift JIS]]) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as [[Windows 1252|CP1252]], [[Windows 1250|CP1250]], [[CP437]], [[CP852]], etc..

== See also ==
* [[Text file creation software]]

== Software ==
* [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats]

== Sample files ==
* {{DexvertSamples|text/txt}}
* {{DexvertSamples|text/utf16Text}}

== Links and References ==
* [http://en.wikipedia.org/wiki/Text_file Text file (Wikipedia)]
* [http://textfiles.com/ textfiles.com: a site full of old text files]
* [http://www.greenwoodsoftware.com/less/index.html Less: a Unix/Linux text file pager (for viewing files)]
* [http://www.webarchive.org.uk/wayback/archive/20120518233003/http://www.openplanetsfoundation.org/blogs/2011-08-16-scenario-discussion-text-files Scenario for discussion: Text files.]
* [http://graydon2.dreamwidth.org/193447.html Always bet on text]

[[Category:Text-based data]]
[[Category:File formats with too many extensions]]

Plain text

2024-09-10T21:22:35Z

265 993 303: Unicode as of 16.0 still does not include U+0A00 or U+0A0D, so the heuristic still works

{{FormatInfo
|formattype=electronic
|subcat=Document
|extensions={{ext|txt}}, {{ext|text}}, {{ext|doc}}, {{ext|asc}}, {{noext}}, many others
|mimetypes={{mimetype|text/plain}}
|pronom={{PRONOM|x-fmt/111}}
|wikidata={{wikidata|Q1145976}}
}}
'''Plain text''' files (also known by the extension TXT) consist of characters encoded sequentially in some particular [[character encoding]]. Plain text files contain no formatting information other than white space characters. Some data formats (usually those intended to be human-readable) are based on plain text; see [[Text-based data]] for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).

Traditionally, [[ASCII]] was used much of the time for maximum interoperability, though many platform-specific character sets were also in use. For non-English text an encoding supporting a broader character repertoire is needed, often [[UTF-8]] nowadays. Note that if the file consists only of 7-bit ASCII characters, the bytes of the file are identical in us-ascii, ISO-8859-1, UTF-8, and a number of other encodings, so such a file can be identified as any of these depending on what is most convenient for a particular application. It is only when characters out of this repertoire are used that encoding-specific details need be considered. Some formats, such as [[HTML]] and [[XML]], provide some sort of escape sequences (such as ampersands used for character references and entities) allowing special characters to be referenced within the document while leaving the document itself entirely ASCII.

Another point of contention or incompatibility in text-file formats is the conventions for line and paragraph breaks. Depending on what system the file was created on or intended to be viewed on, line breaks may be done as Carriage Return (ASCII 0D hex) and Linefeed (ASCII 0A hex) together (usually in that order, though in rare cases in the opposite order), or just one of those characters alone. Some text viewing or editing programs that are not cross-platform-friendly will really mess up badly in attempting to view/edit files using a different line break convention than the program expects, so you might see lines overwriting one another instead of going to the next line, or peculiar control characters show up within the file, or other strangeness. Files with linefeed alone are often referred to as "UNIX mode" (and the linefeed, in this context, referred to as NL for Newline), while files with carriage return alone are referred to as "Mac mode" (though it's also common in other early platforms such as the Apple II and Commodore 64, and no longer used in current Macs), while the CR+LF format is called "DOS" or "PC" or "Windows" mode (though it was used in various mainframes and network protocols as well).

Files may also use hard line breaks to keep line length within a fixed number of columns (usually 80, but other values such as 40 or 65 are used sometimes), or just have line breaks at the end of paragraphs and expect systems to word-wrap long lines; encountering files of a different convention than you expect may result in lines running way off to the right of the screen and requiring horizontal scrolling, or else short, choppy lines. Many text editors have a "paragraph reformat" command to bring paragraphs into compliance with your desired conventions.

Most operating systems include a simple text editor (e.g., Windows Notepad) which can open text files, but many other text editors exist (and computer people sometimes have "holy wars" over which one is best). Some of the common text editors are EMACS, vi, and UltraEdit. In the earlier days of computing, there was less distinction between text editors and word processors than there is now, as word processors generally used a format that was mostly plain text and could even be completely plain text if you refrained from using special embedded commands and features. However, modern word processors such as Microsoft Word default to using program-specific save formats that have little resemblance to plain text, unless you go out of your way to "Save As" .txt. A common "newbie error" is to attempt to create or edit plain text files in such a program, leaving the files as proprietarily-formatted in a way that messes up the operation of other programs that expect to find plain text.

Creating artwork using text characters is known as [[ASCII Art]], or other variants such as [[ANSI Art]] if special control or escape codes are used in addition to the plain text characters.

== Extension ==

The traditional extension for text files is <code>.txt</code>, but lots of other extensions have been used. Occasionally on systems permitting extensions longer than three letters, <code>.text</code> has been used, and <code>.asc</code> for ASCII has also had some use; <code>.doc</code> has also sometimes been used for files "documenting" something (like the manual accompanying a piece of downloaded software), but that went out of common use once that extension became associated with Microsoft Word's [[DOC]] format.

== Identification ==

[[UTF-32]] text files are arrays of 32-bit integers representing Unicode code points and are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes <code>FF FE 00 00</code> (for little endian <code>0x0000FEFF</code>) or <code>00 00 FE FF</code> (for big endian <code>0x0000FEFF</code>). In some cases UTF-32 files may occur without the BOM, however, only <code>0x00000000</code>—<code>0x0000D7FF</code> and <code>0x0000E000</code>—<code>0x0010FFFF</code> are valid ranges for dwords; <code>0x0000D800</code>—<code>0x0000DFFF</code> and <code>0x00110000</code>—<code>0xFFFFFFFF</code> are invalid.

[[UTF-16]] text files are arrays of 16-bit integers representing code units and are usually detected by starting with the byte order mark (BOM) consisting of the bytes <code>FF FE</code> (for little endian <code>0xFEFF</code>) or <code>FE FF</code> (for big endian <code>0xFEFF</code>). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (<code>0x000A</code>) in its byte reversal (<code>0x0A00</code>) is not in ''Unicode 16.0'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned <code>00 0A</code> or <code>0A 00</code> can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes <code>0D 0A</code> in little endian form <code>U+0A0D</code> which is not in ''Unicode 16.0'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by <code>0xD800</code>—<code>0xDBFF</code> followed by <code>0xDC00</code>—<code>0xDFFF</code>, with other combinations of <code>0xD800</code>—<code>0xDFFF</code> being invalid.

[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all <code>0x01</code>—<code>0x7F</code> bytes. <code>0x80</code>—<code>0xFF</code> are not used in ASCII encoding, and null characters by <code>0x00</code> are not typically found in plain text; null bytes are much more likely to be in UTF-16 or UTF-32 text.

[[UTF-8]] text files may be detected by presence of any bytes from <code>0x80</code>—<code>0xFF</code> (to avoid processing ASCII-only files as UTF-8), absence of null bytes (if UTF-16 and UTF-32 haven't been ruled out yet), and verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are <code>0xxxxxxx</code> (where x forms <code>0x00</code>—<code>0x7F</code>), <code>110xxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x0080</code>—<code>0x07FF</code>, but not <code>0x00</code>—<code>0x7F</code>), <code>1110xxxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x0800</code>—<code>0xD7FF</code> <code>0xE000</code>—<code>0xFFFF</code>, but not <code>0x0000</code>—<code>0x07FF</code> or <code>0xD800</code>—<code>0xDFFF</code>), and <code>11110xxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x10000</code>—<code>0x10FFFF</code>, but not <code>0x0000</code>—<code>0xFFFF</code> or <code>0x110000</code>—<code>0x1FFFFF</code>). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF), but should still be verified for validity.

When a file is known to be a plain text file but [[UTF-32]], [[UTF-16]], [[ASCII]], and [[UTF-8]] were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as [[JIS|Shift JIS]]) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as [[Windows 1252|CP1252]], [[Windows 1250|CP1250]], [[CP437]], [[CP852]], etc..

== See also ==
* [[Text file creation software]]

== Software ==
* [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats]

== Sample files ==
* {{DexvertSamples|text/txt}}
* {{DexvertSamples|text/utf16Text}}

== Links and References ==
* [http://en.wikipedia.org/wiki/Text_file Text file (Wikipedia)]
* [http://textfiles.com/ textfiles.com: a site full of old text files]
* [http://www.greenwoodsoftware.com/less/index.html Less: a Unix/Linux text file pager (for viewing files)]
* [http://www.webarchive.org.uk/wayback/archive/20120518233003/http://www.openplanetsfoundation.org/blogs/2011-08-16-scenario-discussion-text-files Scenario for discussion: Text files.]
* [http://graydon2.dreamwidth.org/193447.html Always bet on text]

[[Category:Text-based data]]
[[Category:File formats with too many extensions]]

Color format

2024-08-16T11:22:11Z

265 993 303:

{{FormatInfo
|formattype=electronic
|subcat=Elements of File Formats
}}
A '''color format''', not to be confused with [[Color profiles]] or [[Graphics|Graphics formats]], describe the way in which data is stored within an image and specifically concerns the way that data is loaded by dedicated graphics hardware. The concept of a '''color format''' has an inconsistent nomenclature, often referred to as ''Pixel'', ''Texture'', ''Image'', or ''Graphics'' formats. In [[Direct3D]], these are known as ''Surface Formats''.

Often, dedicated graphics hardware has a fixed range of supported '''color formats''', chosen to meet both industry needs of those which integrate with the hardware and whatever will offer the best performance or stability on said hardware.

A color format is broken down into the number of bits reserved for each color, and the order of those bits, i.e. ''R8G8B8''. As color formats concern graphics hardware, they often specify the ''red'', ''blue'', ''green'' and ''alpha'' channels. Although, some formats exist for other graphics purposes, such as ''normal maps'' or ''depth maps''.

The more bits which are allocated, the more potential colors can be reproduced with that format. However, this comes at the cost of size. As humans are more sensitive to changes in luminance than colour, and more sensitive to certain wave-lengths of light, some formats specifically under-allocate bits to certain channels as a compromise between visual fidelity and size.

Whilst there is no standardisation of the letters when describing a format, the following are commonly used:

{| class="wikitable"
! Letter
! Purpose
|-
| R
| ''Red''
|-
| G
| ''Green'' (or ''Grey'')
|-
| B
| ''Blue''
|-
| A
| ''Alpha''
|-
|-
| L
| ''Luminance''
|-
| U, V, W & Q
| Bump or normal data
|-
| P
| Palette index (i.e. a ''P8'' denotes a palette of 256 colors)
|-
| X
| Padding bits (ignored but kept for alignment)
|}

== Example ==

The most simplest color format is a 1-bit format, which has 2 colors (usually black and white).

The most common color format is '''R8G8B8''', signifying that 8-bits (a [[byte]]) are reserved for the ''red'', ''blue'', ''green'' channels, respectively. Totalling 24-bits per pixel. This allows each channel to range between 0-255 values, for a total of over 16 million potential colours. The range is not linear but an [[sRGB]] scale.

By comparison, the '''A8B8G8R8''' is similar to above, with an additional 8-bits for alpha information — totalling 32-bits per pixel. Whether the alpha is expected to be ''premultiplied'' or ''straight'' is determined by the hardware.

== R5G6B5 ==

'''R5G6B5''' was a common format for early 3D game consoles. Which allowed 5-bits for ''red'' and ''blue'', and 6-bits for ''green''. The additional ''green'' bits were owed to the human eye's higher-sensitivity to green light. This results in 16-bit pixels.

== See also ==

* [https://www.khronos.org/opengl/wiki/Image_Format Image formats] on the OpenGL wiki
* [https://switchbrew.org/wiki/GPU_Texture_Formats GPU Texture formats] on the SwitchBrew wiki
* [https://learn.microsoft.com/en-us/previous-versions/windows/desktop/bb153344(v=vs.85) Direct3D Surfaces] and [https://learn.microsoft.com/en-us/previous-versions/ms859044(v=msdn.10)?redirectedfrom=MSDN Surface Format] documentation.

[[Category:Graphics]]

Plain text

2023-09-13T18:24:21Z

265 993 303:

{{FormatInfo
|formattype=electronic
|subcat=Document
|extensions={{ext|txt}}, {{ext|text}}, {{ext|doc}}, {{ext|asc}}, {{noext}}, many others
|mimetypes={{mimetype|text/plain}}
|pronom={{PRONOM|x-fmt/111}}
|wikidata={{wikidata|Q1145976}}
}}
'''Plain text''' files (also known by the extension TXT) consist of characters encoded sequentially in some particular [[character encoding]]. Plain text files contain no formatting information other than white space characters. Some data formats (usually those intended to be human-readable) are based on plain text; see [[Text-based data]] for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).

Traditionally, [[ASCII]] was used much of the time for maximum interoperability, though many platform-specific character sets were also in use. For non-English text an encoding supporting a broader character repertoire is needed, often [[UTF-8]] nowadays. Note that if the file consists only of 7-bit ASCII characters, the bytes of the file are identical in us-ascii, ISO-8859-1, UTF-8, and a number of other encodings, so such a file can be identified as any of these depending on what is most convenient for a particular application. It is only when characters out of this repertoire are used that encoding-specific details need be considered. Some formats, such as [[HTML]] and [[XML]], provide some sort of escape sequences (such as ampersands used for character references and entities) allowing special characters to be referenced within the document while leaving the document itself entirely ASCII.

Another point of contention or incompatibility in text-file formats is the conventions for line and paragraph breaks. Depending on what system the file was created on or intended to be viewed on, line breaks may be done as Carriage Return (ASCII 0D hex) and Linefeed (ASCII 0A hex) together (usually in that order, though in rare cases in the opposite order), or just one of those characters alone. Some text viewing or editing programs that are not cross-platform-friendly will really mess up badly in attempting to view/edit files using a different line break convention than the program expects, so you might see lines overwriting one another instead of going to the next line, or peculiar control characters show up within the file, or other strangeness. Files with linefeed alone are often referred to as "UNIX mode" (and the linefeed, in this context, referred to as NL for Newline), while files with carriage return alone are referred to as "Mac mode" (though it's also common in other early platforms such as the Apple II and Commodore 64, and no longer used in current Macs), while the CR+LF format is called "DOS" or "PC" or "Windows" mode (though it was used in various mainframes and network protocols as well).

Files may also use hard line breaks to keep line length within a fixed number of columns (usually 80, but other values such as 40 or 65 are used sometimes), or just have line breaks at the end of paragraphs and expect systems to word-wrap long lines; encountering files of a different convention than you expect may result in lines running way off to the right of the screen and requiring horizontal scrolling, or else short, choppy lines. Many text editors have a "paragraph reformat" command to bring paragraphs into compliance with your desired conventions.

Most operating systems include a simple text editor (e.g., Windows Notepad) which can open text files, but many other text editors exist (and computer people sometimes have "holy wars" over which one is best). Some of the common text editors are EMACS, vi, and UltraEdit. In the earlier days of computing, there was less distinction between text editors and word processors than there is now, as word processors generally used a format that was mostly plain text and could even be completely plain text if you refrained from using special embedded commands and features. However, modern word processors such as Microsoft Word default to using program-specific save formats that have little resemblance to plain text, unless you go out of your way to "Save As" .txt. A common "newbie error" is to attempt to create or edit plain text files in such a program, leaving the files as proprietarily-formatted in a way that messes up the operation of other programs that expect to find plain text.

Creating artwork using text characters is known as [[ASCII Art]], or other variants such as [[ANSI Art]] if special control or escape codes are used in addition to the plain text characters.

== Extension ==

The traditional extension for text files is <code>.txt</code>, but lots of other extensions have been used. Occasionally on systems permitting extensions longer than three letters, <code>.text</code> has been used, and <code>.asc</code> for ASCII has also had some use; <code>.doc</code> has also sometimes been used for files "documenting" something (like the manual accompanying a piece of downloaded software), but that went out of common use once that extension became associated with Microsoft Word's [[DOC]] format.

== Identification ==

[[UTF-32]] text files are arrays of 32-bit integers representing Unicode code points and are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes <code>FF FE 00 00</code> (for little endian <code>0x0000FEFF</code>) or <code>00 00 FE FF</code> (for big endian <code>0x0000FEFF</code>). In some cases UTF-32 files may occur without the BOM, however, only <code>0x00000000</code>—<code>0x0000D7FF</code> and <code>0x0000E000</code>—<code>0x0010FFFF</code> are valid ranges for dwords; <code>0x0000D800</code>—<code>0x0000DFFF</code> and <code>0x00110000</code>—<code>0xFFFFFFFF</code> are invalid.

[[UTF-16]] text files are arrays of 16-bit integers representing code units and are usually detected by starting with the byte order mark (BOM) consisting of the bytes <code>FF FE</code> (for little endian <code>0xFEFF</code>) or <code>FE FF</code> (for big endian <code>0xFEFF</code>). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (<code>0x000A</code>) in its byte reversal (<code>0x0A00</code>) is not in ''Unicode 15.1'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned <code>00 0A</code> or <code>0A 00</code> can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes <code>0D 0A</code> in little endian form <code>U+0A0D</code> which is not in ''Unicode 15.1'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by <code>0xD800</code>—<code>0xDBFF</code> followed by <code>0xDC00</code>—<code>0xDFFF</code>, with other combinations of <code>0xD800</code>—<code>0xDFFF</code> being invalid.

[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all <code>0x01</code>—<code>0x7F</code> bytes. <code>0x80</code>—<code>0xFF</code> are not used in ASCII encoding, and null characters by <code>0x00</code> are not typically found in plain text; null bytes are much more likely to be in UTF-16 or UTF-32 text.

[[UTF-8]] text files may be detected by presence of any bytes from <code>0x80</code>—<code>0xFF</code> (to avoid UTF-8 for ASCII-only files), absence of null bytes (if UTF-16 and UTF-32 haven't been ruled out yet), and verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are <code>0xxxxxxx</code> (where x forms <code>0x00</code>—<code>0x7F</code>), <code>110xxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x0080</code>—<code>0x07FF</code>, but not <code>0x00</code>—<code>0x7F</code>), <code>1110xxxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x0800</code>—<code>0xD7FF</code> <code>0xE000</code>—<code>0xFFFF</code>, but not <code>0x0000</code>—<code>0x07FF</code> or <code>0xD800</code>—<code>0xDFFF</code>), and <code>11110xxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x10000</code>—<code>0x10FFFF</code>, but not <code>0x0000</code>—<code>0xFFFF</code> or <code>0x110000</code>—<code>0x1FFFFF</code>). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF), but should still be verified for validity.

When a file is known to be a plain text file but [[UTF-32]], [[UTF-16]], [[ASCII]], and [[UTF-8]] were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as [[JIS|Shift JIS]]) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as [[Windows 1252|CP1252]], [[Windows 1250|CP1250]], [[CP437]], [[CP852]], etc..

== See also ==
* [[Text file creation software]]

== Software ==
* [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats]

== Links and References ==
* [http://en.wikipedia.org/wiki/Text_file Text file (Wikipedia)]
* [http://textfiles.com/ textfiles.com: a site full of old text files]
* [http://www.greenwoodsoftware.com/less/index.html Less: a Unix/Linux text file pager (for viewing files)]
* [http://www.webarchive.org.uk/wayback/archive/20120518233003/http://www.openplanetsfoundation.org/blogs/2011-08-16-scenario-discussion-text-files Scenario for discussion: Text files.]
* [http://graydon2.dreamwidth.org/193447.html Always bet on text]

[[Category:Text-based data]]
[[Category:File formats with too many extensions]]

Plain text

2023-08-23T05:34:39Z

265 993 303: 0xFEFF is byte order mark

{{FormatInfo
|formattype=electronic
|subcat=Document
|extensions={{ext|txt}}, {{ext|text}}, {{ext|doc}}, {{ext|asc}}, {{noext}}, many others
|mimetypes={{mimetype|text/plain}}
|pronom={{PRONOM|x-fmt/111}}
|wikidata={{wikidata|Q1145976}}
}}
'''Plain text''' files (also known by the extension TXT) consist of characters encoded sequentially in some particular [[character encoding]]. Plain text files contain no formatting information other than white space characters. Some data formats (usually those intended to be human-readable) are based on plain text; see [[Text-based data]] for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).

Traditionally, [[ASCII]] was used much of the time for maximum interoperability, though many platform-specific character sets were also in use. For non-English text an encoding supporting a broader character repertoire is needed, often [[UTF-8]] nowadays. Note that if the file consists only of 7-bit ASCII characters, the bytes of the file are identical in us-ascii, ISO-8859-1, UTF-8, and a number of other encodings, so such a file can be identified as any of these depending on what is most convenient for a particular application. It is only when characters out of this repertoire are used that encoding-specific details need be considered. Some formats, such as [[HTML]] and [[XML]], provide some sort of escape sequences (such as ampersands used for character references and entities) allowing special characters to be referenced within the document while leaving the document itself entirely ASCII.

Another point of contention or incompatibility in text-file formats is the conventions for line and paragraph breaks. Depending on what system the file was created on or intended to be viewed on, line breaks may be done as Carriage Return (ASCII 0D hex) and Linefeed (ASCII 0A hex) together (usually in that order, though in rare cases in the opposite order), or just one of those characters alone. Some text viewing or editing programs that are not cross-platform-friendly will really mess up badly in attempting to view/edit files using a different line break convention than the program expects, so you might see lines overwriting one another instead of going to the next line, or peculiar control characters show up within the file, or other strangeness. Files with linefeed alone are often referred to as "UNIX mode" (and the linefeed, in this context, referred to as NL for Newline), while files with carriage return alone are referred to as "Mac mode" (though it's also common in other early platforms such as the Apple II and Commodore 64, and no longer used in current Macs), while the CR+LF format is called "DOS" or "PC" or "Windows" mode (though it was used in various mainframes and network protocols as well).

Files may also use hard line breaks to keep line length within a fixed number of columns (usually 80, but other values such as 40 or 65 are used sometimes), or just have line breaks at the end of paragraphs and expect systems to word-wrap long lines; encountering files of a different convention than you expect may result in lines running way off to the right of the screen and requiring horizontal scrolling, or else short, choppy lines. Many text editors have a "paragraph reformat" command to bring paragraphs into compliance with your desired conventions.

Most operating systems include a simple text editor (e.g., Windows Notepad) which can open text files, but many other text editors exist (and computer people sometimes have "holy wars" over which one is best). Some of the common text editors are EMACS, vi, and UltraEdit. In the earlier days of computing, there was less distinction between text editors and word processors than there is now, as word processors generally used a format that was mostly plain text and could even be completely plain text if you refrained from using special embedded commands and features. However, modern word processors such as Microsoft Word default to using program-specific save formats that have little resemblance to plain text, unless you go out of your way to "Save As" .txt. A common "newbie error" is to attempt to create or edit plain text files in such a program, leaving the files as proprietarily-formatted in a way that messes up the operation of other programs that expect to find plain text.

Creating artwork using text characters is known as [[ASCII Art]], or other variants such as [[ANSI Art]] if special control or escape codes are used in addition to the plain text characters.

== Extension ==

The traditional extension for text files is <code>.txt</code>, but lots of other extensions have been used. Occasionally on systems permitting extensions longer than three letters, <code>.text</code> has been used, and <code>.asc</code> for ASCII has also had some use; <code>.doc</code> has also sometimes been used for files "documenting" something (like the manual accompanying a piece of downloaded software), but that went out of common use once that extension became associated with Microsoft Word's [[DOC]] format.

== Identification ==

[[UTF-32]] text files are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes <code>FF FE 00 00</code> (for little endian <code>0x0000FEFF</code>) or <code>00 00 FE FF</code> (for big endian <code>0x0000FEFF</code>). In some cases UTF-32 files may occur without the BOM, however, only <code>0x00000000</code>—<code>0x0000D7FF</code> and <code>0x0000E000</code>—<code>0x0010FFFF</code> are valid ranges for dwords; <code>0x0000D800</code>—<code>0x0000DFFF</code> and <code>0x00110000</code>—<code>0xFFFFFFFF</code> are invalid.

[[UTF-16]] text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes <code>FF FE</code> (for little endian <code>0xFEFF</code>) or FE FF (for big endian <code>0xFEFF</code>). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (<code>0x000A</code>) in its byte reversal (<code>0x0A00</code>) is not in ''Unicode 15.0'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned <code>00 0A</code> or <code>0A 00</code> can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes <code>0D 0A</code> in little endian form <code>U+0A0D</code> which is not in ''Unicode 15.0'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by <code>0xD800</code>—<code>0xDBFF</code> followed by <code>0xDC00</code>—<code>0xDFFF</code>, with other combinations of <code>0xD800</code>—<code>0xDFFF</code> being invalid.

[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all <code>0x01</code>—<code>0x7F</code> bytes.

[[UTF-8]] text files may be detected by presence of any bytes from <code>0x80</code>—<code>0xFF</code>, absence of null bytes (if UTF-16 hasn't been ruled out yet), or verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are <code>0xxxxxxx</code> (where x forms <code>0x00</code>—<code>0x7F</code>), <code>110xxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x0080</code>—<code>0x07FF</code>, but not <code>0x00</code>—<code>0x7F</code>), <code>1110xxxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x0800</code>—<code>0xD7FF</code> <code>0xE000</code>—<code>0xFFFF</code>, but not <code>0x0000</code>—<code>0x07FF</code> or <code>0xD800</code>—<code>0xDFFF</code>), and <code>11110xxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> <code>10xxxxxx</code> (where x forms <code>0x10000</code>—<code>0x10FFFF</code>, but not <code>0x0000</code>—<code>0xFFFF</code> or <code>0x110000</code>—<code>0x1FFFFF</code>). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF).

When a file is known to be a plain text file but [[UTF-32]], [[UTF-16]], [[ASCII]], and [[UTF-8]] were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as [[JIS|Shift JIS]]) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as [[Windows 1252|CP1252]], [[Windows 1250|CP1250]], [[CP437]], [[CP852]], etc..

== See also ==
* [[Text file creation software]]

== Software ==
* [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats]

== Links and References ==
* [http://en.wikipedia.org/wiki/Text_file Text file (Wikipedia)]
* [http://textfiles.com/ textfiles.com: a site full of old text files]
* [http://www.greenwoodsoftware.com/less/index.html Less: a Unix/Linux text file pager (for viewing files)]
* [http://www.webarchive.org.uk/wayback/archive/20120518233003/http://www.openplanetsfoundation.org/blogs/2011-08-16-scenario-discussion-text-files Scenario for discussion: Text files.]
* [http://graydon2.dreamwidth.org/193447.html Always bet on text]

[[Category:Text-based data]]
[[Category:File formats with too many extensions]]

Plain text

2023-06-20T10:45:33Z

265 993 303:

{{FormatInfo
|formattype=electronic
|subcat=Document
|extensions={{ext|txt}}, {{ext|text}}, {{ext|doc}}, {{ext|asc}}, {{noext}}, many others
|mimetypes={{mimetype|text/plain}}
|pronom={{PRONOM|x-fmt/111}}
|wikidata={{wikidata|Q1145976}}
}}
'''Plain text''' files (also known by the extension TXT) consist of characters encoded sequentially in some particular [[character encoding]]. Plain text files contain no formatting information other than white space characters. Some data formats (usually those intended to be human-readable) are based on plain text; see [[Text-based data]] for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).

Traditionally, [[ASCII]] was used much of the time for maximum interoperability, though many platform-specific character sets were also in use. For non-English text an encoding supporting a broader character repertoire is needed, often [[UTF-8]] nowadays. Note that if the file consists only of 7-bit ASCII characters, the bytes of the file are identical in us-ascii, ISO-8859-1, UTF-8, and a number of other encodings, so such a file can be identified as any of these depending on what is most convenient for a particular application. It is only when characters out of this repertoire are used that encoding-specific details need be considered. Some formats, such as [[HTML]] and [[XML]], provide some sort of escape sequences (such as ampersands used for character references and entities) allowing special characters to be referenced within the document while leaving the document itself entirely ASCII.

Another point of contention or incompatibility in text-file formats is the conventions for line and paragraph breaks. Depending on what system the file was created on or intended to be viewed on, line breaks may be done as Carriage Return (ASCII 0D hex) and Linefeed (ASCII 0A hex) together (usually in that order, though in rare cases in the opposite order), or just one of those characters alone. Some text viewing or editing programs that are not cross-platform-friendly will really mess up badly in attempting to view/edit files using a different line break convention than the program expects, so you might see lines overwriting one another instead of going to the next line, or peculiar control characters show up within the file, or other strangeness. Files with linefeed alone are often referred to as "UNIX mode" (and the linefeed, in this context, referred to as NL for Newline), while files with carriage return alone are referred to as "Mac mode" (though it's also common in other early platforms such as the Apple II and Commodore 64, and no longer used in current Macs), while the CR+LF format is called "DOS" or "PC" or "Windows" mode (though it was used in various mainframes and network protocols as well).

Files may also use hard line breaks to keep line length within a fixed number of columns (usually 80, but other values such as 40 or 65 are used sometimes), or just have line breaks at the end of paragraphs and expect systems to word-wrap long lines; encountering files of a different convention than you expect may result in lines running way off to the right of the screen and requiring horizontal scrolling, or else short, choppy lines. Many text editors have a "paragraph reformat" command to bring paragraphs into compliance with your desired conventions.

Most operating systems include a simple text editor (e.g., Windows Notepad) which can open text files, but many other text editors exist (and computer people sometimes have "holy wars" over which one is best). Some of the common text editors are EMACS, vi, and UltraEdit. In the earlier days of computing, there was less distinction between text editors and word processors than there is now, as word processors generally used a format that was mostly plain text and could even be completely plain text if you refrained from using special embedded commands and features. However, modern word processors such as Microsoft Word default to using program-specific save formats that have little resemblance to plain text, unless you go out of your way to "Save As" .txt. A common "newbie error" is to attempt to create or edit plain text files in such a program, leaving the files as proprietarily-formatted in a way that messes up the operation of other programs that expect to find plain text.

Creating artwork using text characters is known as [[ASCII Art]], or other variants such as [[ANSI Art]] if special control or escape codes are used in addition to the plain text characters.

== Extension ==

The traditional extension for text files is <code>.txt</code>, but lots of other extensions have been used. Occasionally on systems permitting extensions longer than three letters, <code>.text</code> has been used, and <code>.asc</code> for ASCII has also had some use; <code>.doc</code> has also sometimes been used for files "documenting" something (like the manual accompanying a piece of downloaded software), but that went out of common use once that extension became associated with Microsoft Word's [[DOC]] format.

== Identification ==

UTF-32 text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes FF FE 00 00 (for little endian 0x0000FEFF) or 00 00 FE FF (for big endian 0x0000FFFE). In some cases UTF-32 files may occur without the BOM, however, only 0x00000000—0x0000D7FF and 0x0000E000—0x0010FFFF are valid ranges for dwords; 0x0000D800—0x0000DFFF and 0x00110000—0xFFFFFFFF are invalid.

UTF-16 text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes FF FE (for little endian 0xFEFF) or FE FF (for big endian 0xFFFE). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (0x000A) in its byte reversal (0x0A00) is not in Unicode 15.0, and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned 00 0A or 0A 00 can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes 0D 0A in little endian form U+0A0D which is not in Unicode 15.0 either but it is a common newline in 8-bit encodings. The detection of UCS-2 text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by 0xD800—0xDBFF followed by 0xDC00—0xDFFF, with other combinations of 0xD800—0xDFFF being invalid.

ASCII only text files may be detected by verifying that the file has all 0x01—0x7F bytes.

UTF-8 text files may be detected by presence of any bytes from 0x80—0xFF, absence of null bytes (if UTF-16 hasn't been ruled out yet), or verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are 0xxxxxxx (where x forms 0x00—0x7F), 110xxxxx 10xxxxxx (where x forms 0x0080—0x07FF, but not 0x00—0x7F), 1110xxxx 10xxxxxx 10xxxxxx (where x forms 0x0800—0xD7FF 0xE000—0xFFFF, but not 0x0000—0x07FF or 0xD800—0xDFFF), and 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (where x forms 0x10000—0x10FFFF, but not 0x0000—0xFFFF or 0x110000—0x1FFFFF). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF).

When a file is known to be a plain text file but UTF-32, UTF-16, ASCII, and UTF-8 were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as Shift JIS) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as CP1252, CP1250, CP437, CP852, etc..

== See also ==
* [[Text file creation software]]

== Software ==
* [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats]

== Links and References ==
* [http://en.wikipedia.org/wiki/Text_file Text file (Wikipedia)]
* [http://textfiles.com/ textfiles.com: a site full of old text files]
* [http://www.greenwoodsoftware.com/less/index.html Less: a Unix/Linux text file pager (for viewing files)]
* [http://www.webarchive.org.uk/wayback/archive/20120518233003/http://www.openplanetsfoundation.org/blogs/2011-08-16-scenario-discussion-text-files Scenario for discussion: Text files.]
* [http://graydon2.dreamwidth.org/193447.html Always bet on text]

[[Category:Text-based data]]
[[Category:File formats with too many extensions]]

OpenType

2023-06-20T10:12:23Z

265 993 303:

{{FormatInfo
|formattype=electronic
|subcat=Fonts
|extensions={{ext|otf}}, {{ext|ttf}}
|pronom={{PRONOM|fmt/520}}
}}
[[OpenType]] is an outline font format developed by Microsoft and Adobe. It is the successor to both [[TrueType]] and [[Adobe Type 1]] font formats.

== Identification ==
{{PAGENAME}} files containing "CFF data" begin with ASCII "{{magic|OTTO}}" (4F 54 54 4F), whereas {{PAGENAME}} files containing TrueType outlines begin with the bytes 00 01 00 00.

== See also ==
* [[Embedded OpenType]]
* [[Open Font Format]]
* [[TrueType]]
* [[Adobe Type 1]]

== Specifications ==
* [https://www.microsoft.com/typography/otspec/ OpenType specification]

== Sample files ==
* http://tug.ctan.org/tex-archive/fonts/cm-unicode/fonts/otf/

== Metaformat files ==
* {{Synalysis|opentype}}

== Links ==
* [[Wikipedia:OpenType|Wikipedia article]]
* [http://www.adobe.com/products/type/opentype/opentype-faq.html Adobe's OpenType FAQ]

[[Category:Microsoft]]
[[Category:Adobe]]