[physfs] 7z/lzma in trunk

Wed Sep 27 12:03:58 EDT 2006

Am Mittwoch, 27. September 2006 17:56 schrieb Dennis Schridde:
> Am Mittwoch, 27. September 2006 16:06 schrieb Ryan C. Gordon:
> > > After I hardfixed the CRC calculation I tested with random access on 2
> > > files. Was very slow, too. Even more than expected... I'll use multiple
> > > caches in the next version...
> >
> > Generally speaking, it's not worth going out of the way to optimize
> > random access to formats that don't support it, like zipfiles, which
> > have to decompress the whole file to get to the seek point...I assume
> > lzma has the same problem.
> >
> > The solution is usually to cache the decompressed file in RAM inside
> > PhysicsFS, but this is basically unacceptable on low-memory systems like
> > the PlayStation 2...especially considering that most apps do not seek
> > randomly through a file, or seek at all. Games and apps that need to
> > deal with a slow file can just as easily cache the data through
> > PHYSFS_read() without adding complexity or resource usage to the library.
> >
> > So if the solution is "cache more," I'd encourage you to leave it as a
> > slow operation. If there was a fast way to jump to roughly the correct
> > location in the compressed stream and then figure out the right
> > plaintext offset by uncompressing a block or two of data, that would be
> > a win, but that's not usually possible.
>
> Thanks for that advice.
>
> The current implementation (which just redirects down to to 7z / LZMA)
> works like this:
> - Decompress the whole block, the file we want to read from is in, into the
> archive's cache.
> - Every subsequent read will just find the correct position in that block
> and copy the requested part into the buffer passed to LZMA_read().
> - If a different file is read from the archive's cache is freed and the
> block of the new file is cached.
>
>
> I bet there is some documentation on it, but this is what I found in
> experiments:
> - Apparently a file is allways completely in one block. Or a block allways
> includes at least one complete file.
> - Apparently it is possible that multiple files are in one block, either
> when working on a completely solid archive or an solid archive with a
> blocksize greater than the filesize of 2 files.
> - If not using such solid voodoo there is exactly one file in each block.
>
> > (10 minutes to read 100 kilobytes of sequential data would be a problem
> > worth optimizing though!)
>
> Yes, I am working on this. :)
>
> Question:
> Is the current approach of using 7z's cache ok?
> Or should I try to decompress only the needed part of the file? (I don't
> know yet how this could work...)
If it would be ok (memory usage, delay on first access, etc.), then I would 
implement it so that each block which is decompressed is cached and the files 
keep references to the block they are in and their offset.
That way I only need to call SzExtract on the first access to a block and if 
multiple files are in a block they each keep a reference to the "shared" 
block.
That would also circumvent the CRC digest calculation, as I call SzExtract 
only once per block and not on every access to the file as is done currently.

(SzExtract does nothing but calculate the digest if the block is allready 
cached. But this CRC calculation was what took hours.)

--Dennis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://icculus.org/pipermail/physfs/attachments/20060927/46eb7fb1/attachment.pgp>