Why Keep a Local Filesystem?

8 minute read

I like to start out my series here on Control-Alt-Backspace by discussing why you should care about the topic. After all, if I can’t convincingly answer that question, I’m spending quite a few hours over the course of the series writing useless content.

A series about organizing a filesystem implies that you have files to organize, and specifically that they are stored all in one place on your local computer (or perhaps on your company’s network drive or a home network-attached-storage unit). Fifteen or twenty years ago, the answer to the question posed in the title might have been obvious: if you didn’t have a local filesystem, you didn’t have any data and you couldn’t do a whole lot with your computer. Many people didn’t even have webmail yet, so you couldn’t even read email without putting files on your computer!

Nowadays, though, it’s quite a bit less obvious. Software as a Service like Google Docs and Evernote, social media sites, and mobile apps have taken over many of the roles that standard files created with desktop programs used to have for many people. With (nearly) always-on connections to the Internet, it’s now possible to get away with not keeping any files on your computer at all, and it’s often easier to work this way.

Is it a good idea, though? I don’t think so. Here are four reasons I keep my files local when possible and why you should think about it too. One simple principle unites them: saving yourself time and trouble in the short run often comes at the expense of limiting available time-saving options and causing worse problems in the long run.

Centralization and Organization

I currently have 230 items in my password manager. Without going through and counting them individually, probably a solid 200 of them are websites that I have accounts on or have used in the past. That’s a lot of websites! Most of them are places I’ve bought things from or forums or customer service sites I’ve commented or posted on; some are financial institutions; but many are webapps of various kinds on which I might have data.

Now, I try to limit the amount of work I do on these services and keep a local copy of any important data they generate. Imagine that I didn’t do that. I could easily have 20, 30, or more services that had important data on them. There is no way I could keep my data organized and backed up across 20 services. Heck, I wouldn’t even be able to remember where I put something, much less easily find related content, if I hadn’t used it recently! Computers are supposed to help us integrate and organize our information, not spread it out even more thinly than in the physical world.

If you keep most of your data local, it’s all together and you can organize it in whatever manner makes sense. If you don’t, you’re forced to organize it by the service you used to create it, which often doesn’t make any sense at all. Cloud services may be “easier” at the start in the sense that all you have to do is sign up and start using the service, but once your needs start to get complex, the overhead of hopping back and forth between lots of different services which often don’t integrate with each other starts to cost you.

If you think you don’t have 200 accounts, you probably don’t; I’m a geek who’s been using and trying out all kinds of webservices for years. But I guarantee you that unless you have a password manager you use to track all your accounts, however many accounts you think you have on various websites, you have at least twice that number.

Performance and Uptime

What? It’s only a few seconds [to load your email program]? Brothers and sisters, this is a computer. It should open instantaneously. You should be able to flit in and out of it with no delay at all. Boom, it’s here. Boom, it’s gone. Not, “Switch to the workplace that has the Web browser running, open a new tab, go to gmail, and watch a company with more programming power than any other organization on planet earth give you a…progress bar.”
Stephen Ramsay

Even with today’s high internet speeds, having your files on your local machine is still nearly always noticeably faster for day-to-day work. (Sure, if you’re doing complicated computations or data analysis, you might end up saving time by running it on a high-powered server, but that’s not the normal case.)

You can also keep using your computer when you’re on an airplane, you’re out of the house and are running out of data on your cell phone plan, the next city over where your internet access comes from gets hit by a hurricane, or your cable provider screws up and cuts off your connection again. These things may not happen to most people too often, but they have a way of completely stopping your work for hours at a time at exactly the moment you can least afford it.

Flexibility and Freedom

Having your files in a particular app or cloud service works great as long as you only want to use the features available in that app or website. When you want to do something else, you often are left with no options at all. Having your data in ordinary files in your local filesystem means you don’t have to do anything special at all to use different software with them – just point the new software at the same location!

Even better, you can quickly create your own software, in the form of shell scripts, to retrieve information from or take automated actions on many files at once. Imagine renaming thousands of photos to match a new format, locating a particular page in hundreds of PDF files by providing a reference number, automatically downloading and filing recordings or photos as soon as you plug your device in, or determining how many times you’ve used the word “hallelujah” across all the documents you’ve ever written. Heck, when I told you above that I had 230 passwords in my password manager, I wrote a script on the fly to count the number since the software didn’t show that value to me:

find .password-store/ -type f | 
    grep -v 'git' |
    egrep -v '/\.gpg(-id)?$' |
    wc -l

Despite the ominous-looking punctuation and scripting’s reputation as a “power user feature,” it’s actually not difficult and could easily be part of every computer user’s education. (Once upon a time, it was. Then we developed “superior” point-and-click interfaces that, while offering real advantages, caused us to lose sight of the whole picture and what we were giving up.) Later in this series we’ll take a look at the basics of shell scripting.

Recently there has been a very positive trend towards cloud services offering a usable, effective data-export feature that allows you to take your content elsewhere, but this is by no means universal, and even when it is available it often involves losing a great deal of metadata and organization, which means you’re really only able to export some of your work.

Reliability and Safety

Not that long ago, I heard about a service called Picturelife on the podcast Reply All. The service automatically uploaded, stored and organized photos users took on their phones. Then one day, it suddenly dropped off the face of the Internet, taking all its users’ photos with it. A woman called in to the Super Tech Support segment on the podcast asking for help retrieving several years of photos of her kid that she had stored only on Picturelife and had no other copy of. The hosts managed to get in touch with the owner of Picturelife, who was strangely optimistic and not as apologetic as you might have expected about the whole situation. The owner explained that their server company had essentially held their user data hostage because they had failed to pay their bills, and when they managed to retrieve it they lost the database that kept track of which images belonged to which users and were having to rebuild it manually.

Several months later, the company closed down but managed to get almost all of the data restored to a different, similar service that stepped in to help them out. But that was pretty much dumb luck!

Of course, this is a worst-nightmare horror story. It’s not as if dozens of service providers are failing every day across the world. That said, I’m frankly shocked that we haven’t seen more disasters like this. Regulations about reliability, service providers’ liability to customers, and customer data storage are minimal to nonexistent in most countries. Sure, losing substantial portions of your customers’ data likely means the end of your business, so you’re going to put some work into making sure it doesn’t happen, but every year thousands of non-cloud businesses go bankrupt as a result of similarly dumb practices like not making proper backups, not purchasing appropriate insurance, and so on.

The point is, you should never, ever have any data in only one place unless you’re willing to lose it. That goes doubly true when you leave that single copy with someone else and trust them to do the right thing with it. I don’t trust random startups with the only copy of my data. I don’t even trust Google or Apple with the only copy of my data. Do you?