Categories
josephscott

Things That Shouldn’t Be in File Names for $1,000 Alex

> Queue Jeopardy music and image of Alex Trebek <

Space, mixed case, slash, backslash, question mark, colon, asterisk, quotation mark and control codes.

What are, things that shouldn’t be in file names?

Okay, all kidding aside, having goofy file names can make life miserable. On most days I move between Mac OS X (HFS+), Windows XP (mostly NTFS, some FAT32), Windows 2003 (NTFS), FreeBSD (UFS/UFS2) and Linux (pick one). Most filesystem interaction falls into two camps: NTFS and Unix like. Sadly they have different limitations on file names, and worse yet they have drastically different social norms for allowed file names.

So here is my list of things that should not appear in file names:

Spaces : Both camps support spaces in file names, but it is generally frowned upon in the Unix camp. Those using NTFS are generally encouraged to use spaces. Including spaces in a file name is a pain because they’ve have to be escaped. Fortunately they are easy enough to spot.

Mixed case : The NTFS camp is case preserving, while the Unix camp is case sensitive. Moving files from Unix to NTFS can be unpleasant if you have to rename several files because they only differ by case. Please just make you life easier an use lower case characters for file names unless you have a compelling reason not to (which there are).

Slash and Backslash : NTFS uses backslash as a directory separator and Unix uses forward slash. Neither of them have any business being used in a file name. You’ve been warned.

Question Mark and Asterisk : Both of these characters are meant to be used as wild cards, not as characters in a file name.

Colon and Vertical Bar : I can understand why these may be tempting to use, but please don’t. Colon is a problem in NTFS and vertical bar is used for pipes.

Quotation Mark (double and single) : Quotes are used for grouping on the command line. These are worse than spaces because there really is no reason to use them in a file name.

Trailing Period : After my recent run in with a trailing period I have a special dislike for them. For Windows systems the last period is usually followed by a three character extension, so having a period as the last character will only confuse things. I’d put leading period in here as well, but the use of that has been long establish in Unix systems for semi-hidden files.

Trailing Space : Like the trailing period, I have a special place in my heart for trailing spaces. This merits a specific mention because detecting that a file name has one is not easy to do visually.

Greater Than and Less Than : Really, why would you do this to your poor sysadmin? These characters are used for redirecting input and output of a program on the command line.

And the worst possible offender:

Control Codes : Most Unix systems are kind enough to allow just about anything in a file name. Unfortunately this means that control codes (except for NULL) are allowed. To do include one of these is just plain evil. I really don’t want to hear the BELL beep as part of the file name. Sure it’s funny once, after that is pure, unrefined annoyance.

When it comes to name your files you should be descriptive, brief and conservative. Ideally this means a simple series of lower case letters, possibly separated by a dash (-) or and underscore(_) that isn’t absurdly long. If you are using Windows then you’ll also include a period followed by three characters that are determined by the type of file you are naming.

By keeping your file names simple and consistent you’ll save yourself a lot of headaches.

17 replies on “Things That Shouldn’t Be in File Names for $1,000 Alex”

You forgot files starting in –

Its not hard to get around, but it can be annoying when you do:

rm -asdf

and rm tells you asdf are not options.

You can tell rm (or anything that uses getopt(3)) to stop processing options using the special option ‘–‘. I.e. rm — -asdf.

/N

I’m aware of the technical issues for these restrictions, but you’re thinking entirely about issues that an end user (imagine an administrative assistant using a shared directory on your network to store her memos, spreadsheets, etc.) does not know about and should not have to know about.

The MacOS (and eventually Windows) enhancements that allowed user to write long file names (now even longer) and put (relatively) arbitrary characters in them — including spaces — should be regarded as improvements, don’t you think? Allowing the computer to get out of the way? _Humans_ use spaces in written communication, not underscores.

Would you also argue that metadata is a bad idea?

If you’re going to start imposing all these technical restrictions on the end-user, why not be done with it and just require that the user choose a unique, upper-case 8.3 filename, because, you know, Windows hashes to one of those internally anyway on FAT32 filesystems, and you never know. You could get rid of metadata by forcing the user to include the creation and modification date and version number in the filename.

Or you could _actually support_ the end user and support the flexibility and user-friendliness that each OS actually provides! But I agree, it is far easier to blame the end user…

And you can use to escape a space, it still doesnt mean its any less of a pain in the ass. At least the shell will tab complete the escaped file name normally, where as one has to remember to stop option processing.

Good list. A few points:

You can also say “rm ./-x” to remove a file whose name starts with a hyphen.

Mac OS X is among the Unix systems which now support case-preserving filenames.

IMHO, allowing spaces in filenames is definitely not an improvement. For *me*, as an end-user, they’re a pain in the butt whenever I want to write a small shell script to move, copy, or otherwise process some subset of files, and for which no GUI tools exist.

I’d include a period as the first character in a filename to be a bad idea as well .. on a Mac that indicates an invisible file .. and what a pain it is trying to work with THAT kind of file ..

Just try cleaning your hidden files with ‘rm -rf .*’ and you might have a reason to think before you glob.

Another nice way, in linux or unix, to find and get rid of files with problem characters is to run “ps -i” to note the “inode number of the file. Then (making sure you’re in the directory you want to delete the file name from, since inode numbers aren’t unique across partition) running something like:

find . -inum 5678 -exec rm {} ;

It almost always works, but I almost always use it a last resort, also 🙂

Nice post!

, Mike

I just realized when loading files on the internet that most search engines don’t understand the underscore in a file name. They don’t see it as a seperator they view it as a character. If you want to use a seperator for internet files use a hyphen.

1) What about the semicolon? ‘;’

2) Which cases is upper case justifiable? Do you mean jsut README-like, or even for say, names of people?

Tnx for a nice article

nice 🙂
@ in macOS file names when using the console annoys me like hell, but sadly they’re part of the standard retina png naming scheme…

Once I was performing a GUI-based install of some software and got an error pop-up telling me “C:Program not found”. I see no good reason to put the default separator smack in the middle of a file name. One of the team’s favorites around here is a name like “my file .doc” … Yuck!

I had a colleague that once put a space in the host name of his windows computer and claimed it broke things so badly he had to reinstall the OS.

ifYouCantReadThisYouDoNotBelongInTheComputerBusiness

Leave a Reply

Your email address will not be published. Required fields are marked *