Saving Uploaded Images to the Database August 12th, 2006

Judging from the questions people ask on the #rubyonrails channel, it seems that it's still a common practice to store uploaded images in the database.

At first glance, it seems sensible. You're storing all the metadata about the image in the database; its filename, its accessibility, its dimensions, when was it uploaded. Why not put everything about the image (including the bits of the file itself) in one place?

The approach even makes sense when you start to think about backups. By storing the images in the database, you can rest assured that your database backups will include the images, and that the images and metadata will be consistent in those backups.

It's an idea that I've found attractive myself. But I discussed this with Rob, and he came up with some really solid arguments against it:

Once you've considered this, the only benefits of storing images in the database are that they're centrally accessible, covered by your database backups, and easy to code for. But these benefits are weak in the face of the arguments above.

As far as central accessibility is concerned, all modern operating systems support some kind of network filesystem. The central accessibility goal is reached simply by exporting the image store as a network filesystem like CIFS or NFS.

When it comes to backups, production servers should almost certainly have a backup regimen that includes more than just the database. Since you should already have at least some paths in your filesystem backed up regularly, it's almost no additional effort to include images in the filesystem backup regimen.

The last remaining benefit is ease of coding. And yes, I imagine that in some application frameworks, it requires more work to store images outside the database. However, put this aside for a moment, and look at everything else we've looked at so far. If an RDMS is a bad fit for image storage, why isn't your application framework making it easy for you to store your images outside the database?

For Ruby on Rails developers, there are at least two rock-solid APIs for attaching images (and other binary files) to your models in a way that's so transparent, you couldn't tell from the code that the images are not stored in the database. For Singles Everywhere (a free online singles / dating site I recently completed for a customer), I used the file_column plugin, which was a pleasure to work with. There's also techno weenie's Acts as attachment plugin.

I suppose there may be cases where it makes more sense to put the images in the database. But I recommend that your default preference be for external storage, and that you insist on strong motivation that outweighs the concerns outlined above.

10 Responses to “Saving Uploaded Images to the Database”

  1. Larry Myers said on
    Or if you want to do this by hand without a plugin, it's pretty easy:

    http://myersds.com/notebook/2006/07/25/basic_file_uploads_with_rails
  2. Sheldon Hearn said on
    Nice hand-rolled solution for simple cases.

    Thing is, the file_column plugin means less work even for simple examples like this, is attached to a model, and has support for test fixtures and testing uploads.

    By the way, does your by-hand method protect against upload filenames that contain relative paths like ../config/environment.rb?
  3. Larry Myers said on
    I didn't really bother with a lot of protection since my upload method is only accessible internally through the admin section.

    But yes, that would be necessary were this method to be used for a upload system accessible to the public.
  4. Anonymous said on
    I also like the simplicity of the solution for simple cases. It definetely makes sense to save them. - ben @ http://rubyonrailsblog.com
  5. Anonymous said on
    It makes a LOT of sense to store images in the database if you couple that with Rails caching mechanism. That way you can just add another physical server, even, and only the first hit for each image would go to the database - after that, the new server has a copy in its file system too. Compare that to loading images from a network drive...

    And you get the additional benefit from simple backups, too. To Keep Things Simple (Stupid), the app goes in subversion/other source control and repository + database has backups.

    Even if everything else burns to the ground, you could set up and configure a totally new environment from that within an hour from the new computers arriving.

    Sometimes it pays to not listen to conventional "we tried that and it didn't work because..." :)
  6. Anonymous said on
    No, actually all this discussion proves is that you have decided what you think and will not even look at the alternative. It is quite funny that you see it as reinforcement though. Apparently you haven't ever managed your own server for any real pressure, but only regurgitate something someone else once told you, out of context.

    Of course it's view caching, I thought that went without saying... 9_9 You have, for instance, a route like :id.:ext which maps to different jpg, gif, png etc. It could even do on the fly conversion if you like, with rmagick. With the new resources system that is even easier than that (check edge rails).

    Cache expiry is, unless you already have another solution in place for your OTHER caching (oh right, you haven't been doing big setups, I forgot).

    Otherwise it is, just like other maintenance and sync always needed on large enough setups, handled by a daemon (Rails even have specialized ones now) or, for more relaxed needs, a 1-per-minute cron job. Usually an image update is no more critical than that.

    Actually, speaking of "usually": Usually images never need expire at all. That is true for almost any type of site, and most of them don't. Avatars in forums usually get a unique id and the old ones stay around. Exception might be high traffic image boards like 4chan or something, which is a highly specialized case. In which a sweeper daemon/cron would do the job very nicely.

    Oh, how? By marking it in the database. That is always shared (or replicated for enormous setups). Sweeper notices (by polling, callbacks, what have you) that image file older than marked time should be deleted. Boom. Fixed.

    A network drive can never be as fast as just delivering the images directly from disk. That is what spreading the images across web servers do. One slow hit per server, then fast. You completely failed to understand even this simple point.

    And like I said, images almost never needs expiring, so once out there, it is there.

    And yes, no need to go into backups, because your solution is overly complex and much worse.

    I could go on, but you will just continue not thinking it through anyways. Just wanted you to know that you are just repeating "truths" without mind, and maybe someone else reading this can get a bit of help with possible alternatives.

    The whole idea is that it should be easy to build, easy to maintain, and easy to fix. My way does that, yours - not so much. Too much spread in both data and code. Not good.

    Real professionals are LAZY. =)
  7. Sheldon Hearn said on
    It makes a LOT of sense to store images in the database if you couple that with Rails caching mechanism. That way you can just add another physical server, even, and only the first hit for each image would go to the database - after that, the new server has a copy in its file system too. Compare that to loading images from a network drive...

    Sure thing.

    First, Rails caching applies to the actions and the view, not models. And it's still more resource intensive than cached filesystem access, which you get for free, even with NFS.

    If you don't use Rails view caching, you'll need an on disk image cache, per server. This requires cache invalidation across the cluster of servers. So now storing images in the database not only increases cluster-wide storage requirements, it also introduces application complexity.

    So actually, if you carefully compare the storing of images in the database with storing the images on a network filesystem, you'll see that the network filesystem is better in terms of performance and complexity.

    I haven't responded to the comment about backups, because I think I covered that adequately in the original post. But thanks for your comment, which provided a good opportunity to reinforce my argument that the database has no value to add with respect to caching.
  8. Sheldon Hearn said on
    Thanks for taking the time to continue the discussion. I'm sure it will be of benefit to other readers.

    There are two cases to consider for displaying images.

    In the first case, images are publicly available. Here, the images exist in some (possibly networked) location in the filesystem and can be served directly by Apache, or whatever proxies requests into your application. Having Apache serve images is significantly faster than having Rails do it, whether the images exist on local disk or on a network filesystem, and even if you route away from your controllers.

    This is because NFS is not without its own cache control mechanisms, which are fast. In fact, NFSv4 supports callback cache invalidation, so that clients don't have to perform any network IO to determine the staleness of objects they cache. Earlier version of NFS had varying degrees of support for cache control.

    The second case is where access to images must be mediated by your controllers, for example, for profile photos that are only made available to certain users. Here, Rails action and view caching don't help much if you're making a per-user display decision. And so it comes back to the filesystem's ability to cache filesystem objects better than Rails can cache models.

    In both cases, leaving caching up to the filesystem is effortless; the work is done for you by the operating system. This is as lazy as can be, but also produces the best performing result, whether the filesystem is local or not.

    Also, I your argument seems to assume that local disk access is faster than network filesystems, even without buffer caching. However, for large clusters, fiber-connected network storage can and usually does outperform local storage on application servers, which typically have cheap storage (since they aren't responsible for the persistence of domain data).

    And in clusters of the scale that make fiber-connected network storage fiscally feasible, the number of network connections to the database becomes a serious issue. Of course, at that point you have bigger problems to worry about, since Rails defaults to one connection per request handler and has no out-of-the-box connection pooling support. :-)

    I don't know CIFS nearly as well as I know NFS. Perhaps CIFS has poor cache control, and every read on CIFS-exported object involves network IO? That would certainly explain your resistance to trusting the filesystem to do what it does very well.
  9. Anonymous said on
    Actually, Rails caching does use the filesystem, so all the above arguments are moot. Cache a view, there's a file in the file system for it. The web server running the Rails app always looks for a file in your public directory before turning to the actual Rails app (that's why you have to delete index.html from your public directory before your default route will work), so it is actually no different from simply serving static files on the filesystem.
  10. Sheldon Hearn said on
    Rails has several options for caching, and at least one of them (memcached) does not necessarily use the local filesystem on the web servers (or even necessarily the filesystem of the network share that hosts the images).

    Using index.html as an example of how Rails always reads files locally is strange.

    In a properly configured production environment, Apache or lighttpd is configured to ensure that requests that don't match an object on disk are passed to a request handler.

    And keep in mind that a stat() system call is much cheaper than reading a file. The stat() call just reads file metadata, which is almost always cached, even for network filesystems.

    There's no substitute for a solid systems background when structuring a production environment.

Leave a Reply