Checksum algorithm

From Ultracopier wiki
Jump to: navigation, search

stream checksum, wrong algorithm

like version of Teracopy I have see

 block=source.getBlock();
 source_checksum=checksumMD5(block);
 destination.write(block);

into the destination.write(block):

 destination_checksum=checksumMD5(block);

Then it's like:

 block=source.getBlock();
 source_checksum=checksumMD5(block);
 destination_checksum=checksumMD5(block);

It's double check exactly the same data, and can't manage with the seek on the file.

Data security: See the next algorithm

Performance: See the next algorithm (except the time to calcute the checksum is doubled)

stream checksum, right algorithm

 block=source.getBlock(); or block=destination.getBlock();
 both_checksum=checksumMD5(block);

It's usefull if you have .md5 or .sha1 near the file do have quick checksum and compare it. It's correct if no error is occured.

Data security: But during the transfer, both media is secure, or the transfer is not safe in all case. Then this checksum need be do when have possible corruption. If corrupted, need pass by the same way to get the file, and can other time corrupt the file. It can write 1, but due to media corruption, at the next read, it can read 0. The media problem/corruption is not detected.

Performance: This checksum need be do into the transfer loop, then it will slow down the transfer, mainly with big file. It's because introduce latency into sequencial code (it's do like throling, and the copy the file only during the X% of the time). That's have greate impact too on OS/FS which are sensible on latency because it out of buffer/cache (or wrong management), or with FS with lot of syncronism.

full checksum

 source.seek(0);
 while(position<source.size())
 {
 	block=source.getBlock();
 	source_checksum=checksumMD5(block);
 }
 destination.seek(0);
 while(position<destination.size())
 {
 	block=destination.getBlock();
 	destination_checksum=checksumMD5(block);
 }

It's checksum all, usefull on error to check all source and destination checksum matching. Allow to matching with .sha1 or .md5 file.

Data security: It's good because it not check data in transfer, but the data really returned by the OS, and then when other application.

Performance: Re-read all file, source not in cache for big file, if cache/buffer is not mutualized the destination is never is the cache (depand of the OS/FS). Due to file data read, no other transfer can be do. The time of the checksum is added too. Read all file to compare the sum.

full matching

 source.seek(0);
 destination.seek(0);
 while(position<source.size())
 {
 	if(checksumMD5(source.getBlock())!=checksumMD5(destination.getBlock()))
 		then the both file is not the same
 }

It's checksum all, usefull on error to check all source and destination checksum matching.

Data security: It's good because it not check data in transfer, but the data really returned by the OS, and then when other application. But it can't compare with a .md5 or .sha1 file.

Performance: Re-read all file, source not in cache for big file, if cache/buffer is not mutualized the destination is never is the cache (depand of the OS/FS). Due to file data read, no other transfer can be do.


Into Ultracopier

Into Ultracopier, only the full mathing on error seam be correct. I considere too that's lot of file have not .md5 or .sha1 to compare the checksum.

Quick checksum

During the copy do the sum, compare with the sum. Only if sum file exists.

If error found during the copy, re-read all the sources, and compare with the sum. (Full checksum)

Full checksum

Re-read sources. The destination is open as read/write, to re-read all.

If can't open destination as read/write, then do quick checksum.

Checksum error

If checksum error, for now just ignore for quick, retry all for full.

MD5

Note: MD5 is broken, that's mean, all attacker can modify a file and do the same md5 than the original.

Then, now, md5 is considered deprecated, and sha1 start to be used every where.