Blogs‎ > ‎Tech Stuff‎ > ‎

2015.06.21 Hybrid SSD/HDD/SSHD Windows Software Technical Specifications

posted Jun 21, 2015, 3:31 AM by Troy Cheek   [ updated Jun 21, 2015, 6:56 AM ]
What I'm asking for is a simple, plain, automatic hybrid utility that can move files from the SSD to the HDD and back.  It will sit silently in the background watching the SSD drive, scanning all files except in folders we've told it to exclude, like maybe the Windows folder because that's probably best left on the SSD anyway.  When the SSD drive gets filled to a user-specified amount, say 80%, the utility will check the SSD looking for the least used files or the oldest files or the files whose "accessed by" date is oldest or whatever's easiest to program.  Let's call these "stale" files.  Enough of these stale files will be moved to the HDD to free up enough space to bring the SSD under 80%.  Symbolic links or junctions will be created so that as far as the user is concerned, these stale files are still on the SSD.  If he tries to access the files, he'll be able to in the usual fashion, and he might not even notice that it's loading off the slower HDD.  If a file is accessed like this, it's no longer considered stale.  The utility will at the first opportunity silently move the file back to the SSD and remove the symbolic link.  Optionally, if SSD usage drops below a certain threshold, say the user defines this as 50%, then the freshest of the stale files will be moved back from the HDD to fill the SSD to that level.

In a previous article, I talked about how I had this flash memory SSD that was new and fast but unfortunately small and expensive. I also had this standard mechanical HDD that was old and slow but also very large and cheap. I talked about how I had to manually decide which games and applications and data I had to put on which drive to get the perfect balance between speed and storage space. You'd think this would be something your computer could do for you automatically. And, it turns out, if you're running a recent Linux or Mac OS X or "enterprise" version of Windows, your computer can do this for you automatically. If you're running standard Windows 7 like I am, you can't.  I didn't like that and set out to see if such a thing was possible.  With a few hundred lines of BASIC code, I proved that the project was technically feasible, almost simple.  However, I'm not a good enough programmer to write something that I want to entrust all my data to.  However, I might be a good enough communicator in general to describe what I'm asking for in terms that a good programmer could create what I want.  What follows is my attempt at such a description.  Since my weakest point in programming has always been the user interface, I'll be describing the software in terms of the GUI.

Configuration Screen

This is where the user sets all the options for the program.  The first time the program runs, this should be the first screen they see.  Options should include...

Set SDD:  This lets the user specify which drive should contain the "hot" files, the games or applications or data that the user uses often.  It doesn't have to be an SSD; it could just be a faster drive or an internal drive as opposed to an external one.  NAS is outside the scope of this specification.  Default would be the smallest fixed drive.

Set Free Space:  I read somewhere that an SSD drive operates most effectively if it has 25% free space.  (My 120 GB SSD should then have 30 GB free instead of the current 18.  This whole project started because I had 12 GB free and wanted to install a Steam game that required 15, causing me to spend half the night moving files around instead of playing the game I'd just spent $30 on.)  It could be set as a percentage or perhaps a number of GB.  When "leveling" the drives as explained later, this would be the goal the program strives toward.  Default would be 25% of SSD.

Set HDD:  This lets the user specify which drive/folder should contain the "cool" files, the games or applications or data that the user hasn't used in a while but wants to keep available.  ("Cold" files would be those stored in archives or offline storage.)  It can be an internal or external drive, but not NAS because reasons.  Default would be a folder on the largest fixed drive.  Check to make sure the user specified a folder and not the whole drive.  Part of the beauty of this process is that the user can continue to use the HDD as normal if he wants to manually move files around on it.  We might want to hide the folder.

Always Ignore:  Allow user to specify folders or file extensions or individual files to exclude from the whole hot/cool game.  I think it would be a good idea to ignore the WINDOWS folder, system temporary folders, temporary files, and the like.  A professional video editor would probably want his current project folder to be ignored.  Default would be WINDOWS, TEMP, TMP, and whatever folder we've installed this program in.

Always Ignore Files Smaller Than:  In my tests, I found that you sometimes have to move a lot of small files to equal moving one large file.  Which would you rather work with: a thousand 1 kilobyte files or one 1 megabyte file?  This option allows users to ignore smaller files.  Default size of 1 MB.

Always Process:  The opposite of Always Ignore, specifies folders or extensions or individual files which should always be moved to the cool drive.  Examples would include archives, downloads, torrent files, etc.  These are files that you know you aren't going to use often or don't need to access quickly or plan to sort into another folder if you actually start using them.  No default suggested.

Always Process Files Larger Than:  Some files are huge and even if they're used daily, it's unlikely that the entire huge files is loaded in one fell swoop.  A video file viewer or editor is probably only going to load a relatively small section of the file at a time.  A game loading a huge game data file is probably going to load part of the file, process it, load more, process, etc.  In other words, for most really large files, disk access time isn't always the bottleneck; it's what you do with the files as/after you load them that slows things down.  So, in order to free up space quickly and easily, always move large files to the cool disk.  Default size of 512 MB.  (On my whole SSD, I found about 6 of these, and most were old test files or installation files that I thought I'd already deleted.)

Process Automatically:  Allow the user to decide if/when the program would level the drives automatically.  This wold include options like "nightly" (every day at midnight or 2:00 AM or whenever), "when computer is idle for" (no user interaction for X hours or when the screen saver kicks in), "at start up" (when Windows first starts or reboots), "at shut down" (duh), and "low disk warning" (Windows throws a warning in a system log when free space is less than 10%).  I think most/all of these options can be accomplished with Windows Task Scheduler, meaning that the program doesn't have to worry about implementing any type of monitoring or scheduling option; it just has to know how to schedule.

Scan SSD

This command scans all the files in all the folders on the drive specified as the SSD or hot storage.  What we're scanning for is the file name, complete path, size, and file created date (or modified or accessed).  Yes, that's potentionally thousands of folders and millions of files (in my case, 310K files in 36K folders), but if we ignore hidden files/folders and system files/folders and user-specified files/folders and files smaller than the default 1 MB, the numbers are more manageable (in my case, less than 9K files).  This info should be prettied up and presented to the user in a nice list sortable by any of the criteria (bonus if you use bubble sort!) including a notification as to if the file should be moved into cool storage and why.  Reasons for moving the file into cool storage include:  file is in an Always Process folder or wildcard match, file is too big as previously specified, or file is too old or hasn't been accessed in a while and would count towards the free space goal.  If there's enough free space, we shouldn't be moving any files.

Determining the last accessed time might be a bit tricky.  The file may have been created years ago and modified months ago, but if it's read multiple times every day we don't want to move it.  NTFS keeps track of file creation, modification, and last accessed times, but since the time of Windows Vista keeping track of last accessed time has defaulted to disabled.  Apparently, there's a tiny performance hit every time a file is accessed if this option is enabled, but I've enabled it and haven't noticed a difference.  Some have reported that last accessed times sometimes are the same as creation time, the same as modified time, or seemingly random.  And, of course, if the user has been running with accessed times disabled, the last accessed time is going to be incorrect and probably set to modified time.  By the way, do not check last accessed time by right clicking on the file and choosing Properties, because that counts as accessing the file and will change the time.  It took me an embarrassingly long time to figure that one out.  Instead, modify your folder view to add a Date Accessed column or use the dir /ta command.  There may be other file metadata maintained by Windows or NTFS that will help us here that I don't know about.  The point being is that we're trying to do this whole process without having to install some kind of monitoring program or system hook to keep track of every single file as it's being accessed in real time.  Unless it's easier for you to program it that way, in which case you totally kick ass and I want to have your babies.

Once the information is presented to the user in a nice list, the user should be able to right click on any file or folder and make some selections such as adding it to the Always Process group, adding it to the Always Ignore group, or "freshening" the file (changing Date Accessed to current time), or just pinning this file to SSD or HDD.  I will probably use this option to check out all the files and say about most of them, "What?  That's still around?  I thought I deleted that file years ago!" or "Oh, that's right!  I thought that game sounded cool and downloaded it but must have never played it."

Move to Cool Storage

Or whatever we're going to call the command that actually moves the files from the SSD to the HDD.  I keep wanting to call it cold storage, but in IT terms that apparently means data that is packed into archive files or stored offline somewhere and isn't immediately available.  We're going to move these files to new locations.  Make sure that the Date Accessed file attribute doesn't change, because having every file we move look like it was just accessed would make things difficult later.  To keep the files available at the old locations, use the NTFS mklink command to create symbolic links in the old locations pointing to the new locations.  This command is, I'm told, only available from Windows Vista on.  There is a "junction" program available that does much the same thing and works on older Windows, but a) it only works on folders and not individual files, and b) if you're using an SSD on Windows XP/2000/3.1.1 then you've probably got bigger problems than hybridizing your SDD with your HDD.  A symbolic link looks like a standard shortcut file of 0 KB but acts like a super shortcut invisible link to the file at a different location, whether a different folder, different partition, or different drive.  By creating symbolic links, the user or application or game or the operating system itself can access the file at the old location without noticing a difference.  We just need a little bit of error checking.  If a file can't be moved because it's in use or locked or read only, it probably needs to stay where it is.  If the mklink command fails and we can't create the symbolic link, we probably need to move the file back where it was.  What I did was to replicate the folder structure of the SSD on the HDD (in a hidden folder) for any file I wanted to move.  That way I didn't have to create a database keeping track of what file came from where.

If the Move command immediately follows the Scan command, we can get right to work.  If not, we probably need to Scan before we Move.

Scan the HDD

This command scans all the files in all the folders of wherever the user specified for the HDD or cool storage.  In addition to what we scan for when we Scan the SSD, we also need to check for the presence of a symbolic link at the old location.  If the symbolic link is missing, then maybe the user moved or deleted the file.  We probably need to do the same, but only after asking the user about it.  The nice list of files should mention this, along with any files that have recently become "hot" again.  If the Date Accessed file attribute works like I think it does, accessing the cool file through any means (symbolic link at old location or file itself at new location) should update the Date Accessed time on either the link or on the moved file or both.  This will give us an indication that the file was used since the last time the program was run without having a background process constantly checking file handles or something.  The nice list of files would allow the user to right click on any file or folder and make selections as described in Scan the SSD.

Move to Hot Storage

Or whatever we're going to call the command that moves the file back from the HDD to the SSD.  All you have to do is delete the symbolic link and put the file back at its old location, making sure to update the Date Accessed file attribute to the current date.  Again, if the file can't be moved, leave it where it is and re-create the symbolic link.  As before, unless you've just use Scan on the disk, you'll have to Scan before doing this step.

Level the Disks

This definitely needs a better name.  This combines both Move to Cool Storage and Move to Hot Storage.  Remember to Scan first.  At any given time, let's assume that the user has both some cool files on the SSD that can be moved to cool storage and some hot files on the HDD that definitely need to be moved back to hot storage.  While we could just move all the hot files from HDD to SSD, that might lead to a case where we have 30 GB of files we want to move and only 20 GB of free space.  So we might want to move 10 GB of cool files from the SSD to the HDD first.  There is also the goal of space we want to keep free on the SSD.   We'd end up moving more cool files from the SSD to the HDD to just to free up space.  This might be done at the start or in sections, moving a few files one way and then the other, just in case the user gets bored and wants to cancel the operation half way in.  Hmm.  Start by bringing the SSD down to the free space goal plus an additional 5 GB by moving cool files to the HDD.  Move 5 GB of hot files from the HDD to the SSD.  Repeat until done.

Also, this may not be a problem, but we want to avoid a case of data ping ponging.  We don't want to move a file to the SSD, only for it to take up space requiring we move other files to the HDD, only for the user to want those files back on the SSD, only for the program to move the original file back to the HDD, other files taking up space, ad infinitum.  Unless the user is accessing a lot of huge files daily and has a microscopic SSD, I can't see this as being a major problem, but I thought it should be mentioned.

Automatic Operation

Technically, this isn't part of the GUI.  It's how the program works when it automatic mode.  Ideally, it should be as simple as scanning both disks, then moving files around as described in Level the Disks.  I'd personally like to do this at 2AM every day, but I understand some people don't leave their computers running all night, so they may wish to run the program automatically when the computer is idle, or whenever the computer starts up, whenever the computer is shut down, or just manually.  We need a lot of options because I can see any one of these options pissing off somebody for some reason.  I personally know people who would conceivably schedule a task for 2 AM, shut off their computer at 9 PM, and then wonder why the task was just now running the next morning.  I can see people upset that Windows is taking too long to start up.  I can see people upset that Windows is taking too long to shut down.  Part of the reason for having an SSD in the first place is to make Windows start and shut down faster.  Another reason is so games and applications can start faster, but if we accidentally slow down their favorite game because we moved a file to cool storage, they'll get upset, even if moving the file didn't really slow anything down or if they only think we moved the file.  If we have to interrupt saving some file because they've run out of disk space and we have to free some up for them, they'll get upset even if it meant they wouldn't be able to save the file unless they freed up space manually in the first place.  In other words, I want to somehow make this program easy and convenient for people who understand the reasoning behind it and completely unavailable to people who don't.

The program would have to be sure to generate plenty of log files for those of us trying to debug programs or who just like to look at such things.

The Philosophy Behind All This

I heard that SSD drives could really speed up a computer, but I couldn't afford one big enough to hold all my files.  I supplemented my storage space with a large mechanical HDD.  I put my larger files on the HDD.  I reconfigured my Windows Documents and similar folders to point to the HDD.  As space on the SSD became scarce, I moved more files to the HDD.  If I needed more speed, I'd move things back to the SSD.  This sometimes involved uninstalling and re-installing games and applications to a different drive.  It sometimes meant installing or saving new things on the HDD and avoiding the SSD altogether.  Then it hit me:  Why am I doing all these things to make it easier on the computer when it's the computer's main function to make life easier for me?

I decided that it would make more sense for me to always install or save files to the SSD and let the computer decide what files needed to be moved where for fastest access times and most storage space.  I began searching for a way for the computer to do that for me.  Solid State Hybrid Drives (SSHD) sounded like an answer, but then I read that most of them had microscopic amounts of flash RAM (the SSD part) which was used mostly as a cache for the HDD part.  Even the ones with a larger SSD part seemed to move all the data to the HDD part and then maybe move the most frequently used data back to the SSD part eventually.  That wasn't what I was looking for.

Then I read about Apple's Fusion Drive which operated like what I wanted.  Data is saved to the SSD part, optionally mirroring it to the HDD when the drive is idle.  When the SSD gets full, data is moved to he HDD (or, if it was mirroring all along, simply deleted from the SSD).  If you start using data from the HDD, more space is cleared on the SSD and the hot data is moved there.  Exactly what I was looking for!  Unfortunately, I'd need to buy a new Fusion Drive and move all my data there.  Then I'd need to buy a Mac because this product only worked with one.  But, wait!  It turns out that Fusion Drive only works with Apple products because all the hard work is done by the Mac OS X operating system.  In fact, OS X can provide the same function with separate SSD and HDD devices.  Exactly what I was looking for!  Unfortunately, that meant I'd have to sell my Windows computer and buy a Mac, or figure out how to run OS X on my existing computer.  Let's throw away 20 years of Windows programs and knowledge.

Then I read about Linux and btier, which again operated exactly like what I wanted.  Again, I'd have to scrap Windows to use it.

Then I read about certain data tiering options for "enterprise" level operations using Windows Server 2012 RC2.  Exactly what...  Oh, who am I kidding?  While it's basically a "data center" version of Windows 7/8, it's overkill for a single-user desktop system and costs somewhere in the arm/leg/testicle range.  Or maybe not.  There's about 5 different editions, some less expensive than the others, but I'm not sure which provide for data tiering.  If I understand what I read correctly, I'd have to at the very least install a new version of Windows and convert my drives to the new ReFS, which will probably mean losing all my data.  I think I can still use all my existing programs.

Update!  I forgot about Intel's Smart Response Technology.  It does mostly what I want, but it requires an Intel CPU and certain Intel chip sets on the motherboard.  I don't have those, so I'd have to scrap my current computer and buy another just to get that capability.

The thing is, to get this functionality, I shouldn't have to change hardware or operating systems or buy new drives.  While it's a crappy little BASIC program that I threw together in a few days, I've got proof of concept that this functionality can exist on NTFS file systems on consumer versions of Windows starting with Vista.

If you can program your way out of a paper bag and are interested in coding this project for me, please let me know.

Comments