Getting Your Filesystem Hierarchy Less Wrong

16 minute read

All mail clients suck. This one just sucks less.
motto of the Mutt email client

Figuring out what folders to use is a fundamental part of organizing files. It can also be quite difficult. It turns out there’s a reason that it’s difficult: hierarchical folders are a terrible model for organizing files.

Although discussing the weaknesses of folders may seem like an off-topic rant, I don’t want to bypass it because unless you understand the weaknesses of a system, you don’t understand the system well enough to reduce the impact of those weaknesses. Therefore, we’ll start by looking at how hierarchical models are fundamentally flawed as a system for organizing files. Then, since there are no other viable general-purpose options for organizing files and we’re stuck with hierarchies, we’ll explore some principles that assist us in making hierarchies that are less wrong. Not right – because the right way would require using a better model – but better than they would be without the principles.

Filesystem hierarchies suck

Hierarchies have their advantages. The model is easy to understand (it works just like putting things in boxes or folders in the physical world) and easy to implement in software. Everyone understands the idea and every filesystem in common use supports it. Plenty of things are naturally hierarchical (chronology, for example, divides neatly into years, then subdivides into months).

These benefits mean hierarchies excel at certain small-scale tasks. If you need to keep track of physical items without maintaining duplicates or relying on elaborate indexing schemes, they’re probably the most practical option. If you only have a few files, or if your needs for searching them are limited, hierarchies do a great job.

Unfortunately, they don’t scale. Here’s the essential problem: it’s extremely rare to need to search a collection of things in only one way. If you’re looking for an eBook on your computer, say, you might know the title, or part of the title, or the author, or the year it was published, or just a vague topic you want to find books about, or some combination of all of those. If you try to organize your books in a hierarchy, you have to pick one Single True Path™ that you follow every time you look for a book. Maybe you decide to organize by author, then by year, then by title. But what happens if you know the title but not the author or year? Then you have to look through every author and year folder on your computer to find it? (As I am told the 1897 Sears catalog suggested, “If you don’t find what you are looking for in the index, look very carefully through the entire catalog.”)

It goes without saying that organizing the hierarchy in a different order does nothing to solve the problem; at best, you might be able to optimize by designing your hierarchy in the order that’s most often helpful, thus slightly decreasing the frequency with which you run into the dreaded “look through everything” problem.

Of course, you can try to use the “search” feature in your operating system to locate the file. This often helps, but it’s a bolted-on feature that lies outside of the normal organizational system and is often slow and error-prone. Even if you can get the search function to locate a file you want, that doesn’t mean you’ve actually organized your files in a logical way that’s easy to work with, it just means you managed to dig that file out this time.

Another major problem with hierarchical filesystems is that you periodically have to reorganize them to maintain a usable structure as your needs change. This would be only a minor annoyance were it not for the fact that there’s no way to reference a file except for its full path, which means reorganizing breaks any existing links to all files that have moved. Nowhere is this problem more obvious than on the Web, whose URL structure is essentially a glorified hierarchical filesystem spanning millions of computers. Several studies have found as many as 50% of links in various kinds of published material are broken!

Project Nayuki has a fabulous article about all the problems with hierarchies and a proposal for a radically different kind of filesystem that could improve matters. If you’re interested in this topic and don’t mind some technical details, definitely check it out.

However, for now this proposal and others like it are still a pipe dream. All current general-purpose filesystems are organized hierarchically, so we’re stuck with hierarchies. Therefore, let’s move on to explore several principles that can make cramming your files into a hierarchy less painful.

In order to find any file in a hierarchical system, you must be able to identify its Single True Path and follow it in your filesystem to its conclusion. Following these principles will help you do that more reliably, making it easier to find your files.

Principle 1: The Single-Question Principle

At each level of your hierarchy, strive to make all folder names answer the same question.

This is the most crucial principle, and virtually all tips for filesystem organization can be derived partially from this one. If you can fully understand and use this principle, your ability to efficiently create and find things in hierarchies will improve enormously. It’s also a difficult principle to describe. I find it best illustrated by example: which of these two hierarchies would you rather try to find an item in?

Example hierarchy #1

Cars
   |- Sedans
      |- Blue
         |- Two-door
         |- Four-door
      |- Red
         |- Two-door
         |- Four-door
      |- Gray
         |- Two-door
         |- Four-door
   |- SUVs
      |- Red
         |- With seat warmers
         |- Without seat warmers
      |- Gray
         |- With seat warmers
         |- Without seat warmers
   |- Pickup trucks
      |- Gray
         |- Front-wheel drive
         |- All-wheel drive

Example hierarchy #2

Household Objects
   |- Doors
   |- Horizontal Surfaces
      |- Beds
      |- Floors
         |- Bare
            |- Clean
            |- Dirty
            |- Linoleum
            |- Wood
         |- Carpeted
      |- Roads
      |- Tables
   |- Kitchen
      |- Refrigerators
      |- Toasters
   |- Light
      |- Lamps
      |- Lightbulbs
      |- Light switches
      |- Windows

In example #1, although we subdivided sedans differently than SUVs and pickup trucks, at each individual level of the hierarchy, all options are different responses to the same question: What color is the car? How many doors does it have? Does it have seat warmers? As a result, finding what you’re looking for is a cinch; just answer a series of questions with obvious answers and you’re there (provided you’ve actually put the right cars in the right folders).

In example #2, in stark contrast, the hierarchy is in a shambles. Try explaining the function of each level, or what question it’s asking – you can’t! The options under “Kitchen”, “Light”, and “Floors” are fine, but:

  • Why are bare floors divided into clean and dirty as well as linoleum and wood? Two independent distinctions are being made mutually exclusive. You wouldn’t ask someone, “Is that floor clean, or is it wooden?” Presumably the person who set up these folders put different kinds of floors into the clean/dirty and linoleum/wood folder and knew the difference (maybe the clean/dirty folders are used for floors in the person’s own house and the linoleum/wood folders are used for floors in his friend Alice’s house which is always spotlessly clean). But looking at the folder structure, we have no idea. Even if you’re the only one involved, as the maxim in software development goes, “There are always at least two programmers: you and you six weeks from now.” Once you lose track of the original function of the folders, you’ll probably start choosing which to use for new items at random, which only makes the problem worse.
  • “Roads” shouldn’t even be in here at all – while it may be a horizontal surface, it isn’t a household object. Even though it fits with the things it’s next to, you’re never going to think to look in “household objects” for roads unless you happen to remember putting them there.
  • At the top level, there are at least three different types of entries: individual objects (doors), functions that objects have (light, horizontal surfaces), and locations objects are kept (kitchen). This forces you to remember how you laid out the filesystem to determine how to proceed. If you’re looking for a window that’s in a kitchen, did you decide that the salient feature of this window was that it’s found in the kitchen, or that it gives light? You just have to know. If you ask only one question at a time, it will always be obvious how you’re expected to answer. (You might not always remember the answer itself, but at least you know what piece of information you need to continue on the correct path.)

The Single-Question Principle is a direct reflection of the requirement that every item in your hierarchy have a Single True Path: since there is only one path, a floor can’t be clean and wooden at the same time unless you put “clean” inside of “wooden” or vice versa. Putting “clean” and “wooden” at the same level is ignoring the fact that a real-world floor – what we’re trying to model in our filesystem – can in fact be both clean and wooden. It is building a contradiction directly into the design of the filesystem. It should not surprise you that a filesystem with fewer embodied contradictions makes finding things much easier. Unfortunately, most filesystems have dozens of these contradictions.

Following this principle typically entails adding additional folders. As discussed in the Depth Principle, later, there’s no need to be afraid of new folders.

Exercise

To further help you understand this principle, I highly recommend you grab a piece of paper and try redesigning hierarchy #2 (the one I tore apart above). It may help to think about the questions that each level answers.

Answer: Here’s one possible version, annotated with appropriate questions. I would be amazed if you got anything like it, though. Because I didn’t give you any information about how this hierarchy is used, it’s hard to say what kind of ordering or organization would be most effective. Focus on whether you were able to devise straightforward and easy-to-answer questions.

Household Objects          [What broad category of object?]
   |- Appliances           [What function?]
      |- Food preparation  [What appliance?]
         |- Refrigerators
         |- Toasters
      |- Lighting          [What appliance?]
         |- Lamps
   |- Building components  [What component?]
      |- Doors
      |- Floors
         |- Bare           [What surface material?]
            |- Linoleum    [What cleanliness level?]
                |- Clean
                |- Dirty
            |- Wood        [What cleanliness level?]
                |- Clean
                |- Dirty
         |- Carpeted
      |- Light switches
      |- Windows
   |- Consumable items     [What item?]
      |- Lightbulbs
   |- Furniture            [What piece of furniture?]
      |- Beds
      |- Tables

Corollary 1: The Separation Principle

Whenever possible, limit each folder to containing only files or only other folders.

This is a corollary of the Single-Question Principle. Having both files and additional folders at a single level nearly always violates the principle that all items at one level should answer a single question.

Following this principle is easier than you would expect. Frequently you see a pattern like this:

My Interesting Project
   |- old versions/
      |- document1-2018-01-01.txt
      |- document1-2018-01-05.txt
   |- photos/
      |- image1.jpg
      |- image2.jpg
   |- document1.txt
   |- document2.txt
   |- document3.txt

Refactoring this into a more intuitive form is straightforward:

My Interesting Project
   |- documents/
      |- current/
         |- document1.txt
         |- document2.txt
         |- document3.txt
      |- old/
         |- document1-2018-01-01.txt
         |- document1-2018-01-05.txt
   |- photos/
      |- image1.jpg
      |- image2.jpg

I can understand if you’re saying, “What’s the difference?” I thought the same thing myself. Then I tried it and discovered it actually does reduce mental load and makes it faster to find things. Give it a try.

Caution: It is absolutely possible to make your filesystem harder to use by following this principle blindly. Sometimes a file really does belong with folders; for example, maybe inside a project folder you have a number of subfolders along with a document that explains what’s in each of the folders. Creating a new folder for this file isn’t going to reduce mental load, it’s just going to make it harder to spot the file that’s supposed to help you find things! So use your own judgment on this one.

For extra points, avoid mixing folders that contain only other folders with folders that contain files; again, this suggests you may be violating the Single-Question Principle.

Principle 2: The Domain Principle

Organize files in different domains differently.

The way you generate Single True Paths cannot reasonably be the same for all of your files. For instance, if you try to subdivide your photos, your music library, your tax returns, and your recipes all by chronology, you’re asking for trouble. A chronological hierarchy probably works very well for your tax returns and as well as anything will for your photos, but you won’t have much fun trying to remember the date associated with your favorite curry recipe. (On the other hand, your knowledge of music history will probably be second to none among your friends. There’s a silver lining to every terrible design.)

In order to comply with the Single-Question Principle while still organizing your files into useful folders, you need to identify specific domains which you will organize with their own set of questions. Domains have a specific function or relate to a specific kind of activity. Some of mine include Jobs, Software, Photos, Music, Writing, and Websites. You’ll probably have a folder for each domain containing all the files and folders in that domain, though you could end up with more than one depending on how you decide to organize in the end.

In order to reliably determine the Single True Path of a file you want to find, it must be abundantly clear what goes in each domain. If it’s not, try experimenting with changing up the domains. While you shouldn’t settle too easily, also recognize that you aren’t going to get it perfect; the nature of hierarchies means there’s always bound to be some ambiguity. (If I take photos to document what my workplace looks like, do those go in the Photos domain or the Jobs domain? They’re not really photos in the sense that they document an event like most of my photos do, and I’d certainly like to see them when looking at information about the job, but it’s also confusing to have photos outside of the Photos directory.)

Caution: Don’t make the common mistake of creating your domains based on the file type. For instance, it isn’t very useful to have all your JPG images in one folder and all your PNG images in another, or all your word-processor documents in one folder and all your spreadsheets in another – this does a poor job at keeping files you’re going to need at the same time next to each other, and it’s rarely the most relevant information when you’re trying to locate something.

This said, some segregation by file type often happens naturally. For instance, you should expect to find only image files with your family photos; if you find yourself putting MP3s or executable programs there, you’re probably doing something wrong. Similarly, it might make sense within a particular project to separate your reports (word-processor documents) from your numerical data (spreadsheets). But there’s no need to force the issue; if some of your numerical data is in spreadsheets, some of it is in CSV files, and some of it is in WordPerfect 5.1 files because it came from that one crazy professor upstairs who still uses Windows 95, those files can all go in the same place. And you certainly shouldn’t put all the Excel spreadsheets you use for every project you do on your computer in a single folder. Domains are far more useful when they’re based on function or activity rather than on trivial details like the program that opens the files in them.

Principle 3: The Depth Principle

Prefer deep hierarchies over shallow ones.

Shallow hierarchies require you to navigate through only a couple of levels of folders to get to a file. Deeper hierarchies may require eight or more.

The advantage of shallow hierarchies is that you don’t have to click on or type as many folder names to get to a given file. The advantage of deep hierarchies is that they’re more precise and there are fewer items in each category, so it’s easier to figure out where you’re going and to spot the right file once you get there.

I’ve tried both methods and concluded, as have many others, that deeper hierarchies are almost always superior. The reason is simple: there are all kinds of tricks to quickly navigate through lots of folders, but there are no tricks to compensate for poor organization. Additionally, it’s far less annoying to click through lots of folders than you might think when it means you know exactly where you’re going.

It’s possible to get overzealous; if you find yourself creating dozens of folders that contain just two or three files, you may have gone overboard. In general, though, it’s better to have too many folders than too few.

In the next post, we start looking at links, which can allow us to circumvent some of the limitations of hierarchies that better design doesn’t help with.