Friday, 4 February 2011

Concurrency - Obvious Limitations

I've discovered over 280 duplicate folders in my Pictures folder -  it may be to do when I attempted to merge my "modified" and "original" iPhoto folders when moving back to Windows for full time development, but for whatever reason I now have a cluttered view of my Photos - time to re-organise.

I wrote a program (TDD all the way - always an interesting test of your TDD resolve when you're at full liberty at home to write something that will work, but will have 0 tests) that crawled through the root directory of my pictures and reported directories that were identical (name, file contents, timestamps etc.), and others that were identical in name only (i.e. some files within were different).

All good so far - it takes 86 seconds to run and from the verification everything works as expected.

I knew that the vast majority of the work being done was disk bound, but as an excerise I went ahead and multithreaded it.  I started off with 2 threads doing the directory comparisons, then 3, then 4.

The most I gained was 5 seconds - barely worth the additional complexity, but this was pretty much what I expected.  I reverted the code to pre-threading as it was simpler (not a lot though - the Java Concurrency package makes multi-threading much easier).

6 Cores available to me, but only 1 disk.  I wonder if I would gain a bigger improvement if I had 2 disks and split the pictures evenly over the 2?  Mmm, I feel another exercise coming on...

No comments:

Post a Comment