[physfs] 7z/lzma in trunk

Dennis Schridde devurandom at gmx.net
Wed Sep 27 11:56:38 EDT 2006


Am Mittwoch, 27. September 2006 16:06 schrieb Ryan C. Gordon:
> > After I hardfixed the CRC calculation I tested with random access on 2
> > files. Was very slow, too. Even more than expected... I'll use multiple
> > caches in the next version...
>
> Generally speaking, it's not worth going out of the way to optimize
> random access to formats that don't support it, like zipfiles, which
> have to decompress the whole file to get to the seek point...I assume
> lzma has the same problem.
>
> The solution is usually to cache the decompressed file in RAM inside
> PhysicsFS, but this is basically unacceptable on low-memory systems like
> the PlayStation 2...especially considering that most apps do not seek
> randomly through a file, or seek at all. Games and apps that need to
> deal with a slow file can just as easily cache the data through
> PHYSFS_read() without adding complexity or resource usage to the library.
>
> So if the solution is "cache more," I'd encourage you to leave it as a
> slow operation. If there was a fast way to jump to roughly the correct
> location in the compressed stream and then figure out the right
> plaintext offset by uncompressing a block or two of data, that would be
> a win, but that's not usually possible.
Thanks for that advice.

The current implementation (which just redirects down to to 7z / LZMA) works 
like this:
- Decompress the whole block, the file we want to read from is in, into the 
archive's cache.
- Every subsequent read will just find the correct position in that block and 
copy the requested part into the buffer passed to LZMA_read().
- If a different file is read from the archive's cache is freed and the block 
of the new file is cached.


I bet there is some documentation on it, but this is what I found in 
experiments:
- Apparently a file is allways completely in one block. Or a block allways 
includes at least one complete file.
- Apparently it is possible that multiple files are in one block, either when 
working on a completely solid archive or an solid archive with a blocksize 
greater than the filesize of 2 files.
- If not using such solid voodoo there is exactly one file in each block.


> (10 minutes to read 100 kilobytes of sequential data would be a problem
> worth optimizing though!)
Yes, I am working on this. :)

Question:
Is the current approach of using 7z's cache ok?
Or should I try to decompress only the needed part of the file? (I don't know 
yet how this could work...)

--Dennis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://icculus.org/pipermail/physfs/attachments/20060927/611e6547/attachment.pgp>


More information about the physfs mailing list