General documentation

From Ultracopier wiki
Jump to: navigation, search

Algorithm for file transfer

Introduction

Hello, here I will speak about file transfer to do copy/move. You need understand some basic therms.

I have discover which each OS and FS is different.

  • Windows/ntfs have correct fs layer, asynchronous inode and data layer, synchronous is not allowed and then flush return instantly regardless if is really written on the disk (prevent freeze the application, in all case data in memory lost application blocked or not).
  • Linux/ext4 with partial asynchronous for data layer (in loop read/write the read block the write, and the write block the read), inode creations and manipulations are synchronous but due to low volume the aggressive access can be done without big lost of performance.


In my work I have found lot of problem, the main was:

  • close file descriptor after write lot of content call flush function or like, then is slow down while all file is not write, regardless if the hdd is idle or have few occupation.
  • In loop: read 1 block, write 1 block, the out of cache/buffer block the other operation without reason.
  • Inode access can be parallel to be grouped by the OS. At parallel copy, the mkdir to create the destination folder can concurrence then self.
  • The graphic thread, then the main thread can be slow down in some condition (like linux with slow open source graphic drivers and large file copy list), and the IO access is blocking.


Ultracopier 0.1

Ultracopier-0.1-copy-engine.png

 While { read (position); write (position) }

Plain red: write, Plain blue: read, Dashed red: pre and post operation for the write file, Dashed blue: pre and post operation for the read file

  • Advantage: Very simple, and used by the most of developer to have simple copy file.
  • Disadvantage: The missing cache do read blocking, then if buffer is not full, the write can be down without blocking but the loop and thread is blocked at read function and the contrary.
  • Implementation mistake: The slow down in interface slow down the copy.

Ultracopier 0.2

Ultracopier-0.2-copy-engine.png

 Thread 1: While { read (position); } , Thread pool for write: While { write (position); } close(); , and pipe like communication

Plain red: write, Plain blue: read, Dashed red: pre and post operation for the write file, Dashed blue: pre and post operation for the read file

  • Advantage: The close function is blocked in thread, and while is blocked another write thread is used. The read not block anymore the write, and the contrary (the different media have advantage, and the buffer and cache level can change separately and be in concurrency).
  • Disadvantage: It’s complicated on some programming aspect, I have need use goto to minimize the code, and have great big read function. The write thread decision is can be complicated too. The list parsing and read is same code for intuitive programming. Not parallel the extra programming like variable initialization, to prevent slow down for not real copy operation. Not parallel inode and data parallel operation possible. Can’t recovery destination file write corruption.
  • Implementation mistake: Do with thread and lot of blocking function (not event to have cleaner design), based on 0.1 design. The slow down in interface slow down the copy.

Ultracopier 0.3

Ultracopier-0.3-copy-engine.png

 Thead 1: Copy list send/receive event (start transfer, stop transfer event), Thread list of {Read thread, Write thread, transfer thread} with pipe like communication.

Plain red: write, Plain blue: read, Dashed red: pre and post operation for the write file, Dashed blue: pre and post operation for the read file

  • Advantage: Can parallel the inode and data access, prevent no copy operation to slow down the copy. Can group inode access via parallel access, but data parallel is bad in general. Have asynchronous behavior like for all OS/FS (included synchronous OS/FS like linux/ext4). Very cleaner design, possible separate control on each transfer. Can recovery destination file write corruption.
  • Disadvantage: Need master multi-thread, data locality, and lot of advanced algorithm.

More details on: Copy engine 0.3

Binary analyse

Hello, the analyse of the binary version of Ultracopier is here:

Disk-size-ultracopier.png

The analysed version, it's the windows version with 16 languages.

Ultracopier program structure

The old structure

Ultracopier-structure-0.2.png

The new structure

Ultracopier-0.3-structure-full-r1.png

Transfer and list management

Transfer-thread-copy-engine-ultracopier-0.31.png Ultracopier-CopyEngine-0.3.png Syncro-interface-copy-engine-list.png

Order of transfer

Ultracopier start transfer the most quickly opened file. That's allow prevent idle time, and improve the performance. Mostly is same than the transfer list, but not always.

The transfer order and open order will depends of the FS, the medias, the OS, ...

Advanced transfer view

Transfer normal.png Transfer read error.png Transfer write error.png

Ultracopier plugin

Draft 1 - Plugin/resource

Full information on Tar format on CatchChallenger wiki

This is draft 1 for the plugin/resource package.

The package will be released as: [category]-[internalname]-[version]-[architecture].urc (ultracopier resource compressed)

Add debug before the end if needed: [category]-[internalname]-[version]-[architecture]-debug.urc (ultracopier resource compressed)

Or where architecture is not needed: [category]-[internalname]-[version].urc (ultracopier resource compressed)

And will be as the format .tar.xz where xz is run with --check=crc32, `--owner=0 --group=0 --mtime='2010-01-01' -H ustar` allow better compression and drop information leak:

 tar cf - plugin-name/ --owner=0 --group=0 --mtime='2010-01-01' -H ustar | xz -9 --check=crc32 > category-plugin-name-version.urc

And include in the root: informations.xml (meta data for the plugin package) and the ultracopier tree of the plugin.

Every plugins have their function, ... Category of plugin:

  • Languages
  • CopyEngine
  • Listener
  • PluginLoader
  • SessionLoader
  • Themes

Architecture:

  • windows-x86_64
  • windows-x86
  • mac-os-x
  • linux-i386-pc
  • linux-i486-pc
  • linux-i586-pc
  • linux-i686-pc
  • linux-x86_64-pc

All is not finalised, I search the best way to implement it.

See each informations.xml into each plugin to have example.

Plugin binary release

  • Windows 32Bits: compiled with official Qt 4.8.0 + Mingw32 shipped with it
  • Windows 64Bits: mingw-w64-bin_x86_64-mingw_20111031_sezero (Mingw64 -> Toolchains targetting Win64 -> Personal Builds -> sezero_4.4_20111031), + Qt 4.8.0 (configure.exe -release -opensource)
  • Mac: Official QT 4.8.0 version + xcode 3.2.3 on snow leopard

FS

General overview

It's mainly like that's, but can differ:

File-access-time.png

Latency overview

Inode-latency-cumulated.png

The normal inode resolution include hdd time, system time (fs/vfs), memory time (cache/buffer), device layer time. The problem of that's, it's all is sequencial, then all latency if cumuled, any part is used to 100%.

Grouping query allow to not do cumulated latency multiplied by the number of file, but near of semi-fixed latency. The wait of hdd, can block/lock some stuff into system or FS layer, then the query will be cumulated and grouped for do to the next time.

Then the program in user space will have greate slow down when it have loop to open/read/close the file. To maximize the bandwith, is better user large inode parallelisation due to low hdd bandwith, group inode polling and when the hdd/fs get a block, in lot of case it have multiple inodes informations and is very very near other then other.

The parallelisation of file's data it's bad idea, all this conditions:

  • because have less layer to access to this information, and this layer have cache/buffer
  • due to large block access, the hdd latency to get this data will be big, all latency is minor
  • in case of write, the data is writen into buffer partially async (async but out of buffer on large data, then in lot of case in file copy)

Then most part of hdd data bandwith is used. Do parallelisation in this case do useless hdd head moving (due to amount of data it's frequently far position to store), risk of data locking (to keep the integrity need be thread safe)

Specific FS informations

Ext4

It's very slow on concurrency write on hdd.

Btrfs

Have not the slow down on concurrency write on hdd as ext4.

Transfer speed

Speed-into-ultracopier.png

Ultracopier guide line for developing and maintenance

All frequently used features need be into the mainline. To not be a bloatware:

  • The less used features need be into plugins out of the mainline (or just provided as patch)
  • Keep the complexity as O(1) as possible, O(n) in worse case
  • Do the software for large and small copy list (1 to 1 millions of items), plugins, ...
  • Benchmark before new release in different case of work (copy/move lot of small file, big file, ...)
  • Don't do plugin for do all with options, split features into different plugins (included for the themes)
  • Test the major release extensively before release it (the user will hate you if lost their files)
  • The minor features need contain only bug fix

Ultracopier platform

The platform setup and used for Ultracopier and Supercopier: Ultracopier-platform.png