[physfs] 7z/lzma in trunk

Dennis Schridde devurandom at gmx.net
Thu Sep 28 09:30:47 EDT 2006


Am Mittwoch, 27. September 2006 19:40 schrieb Dennis Schridde:
> Am Mittwoch, 27. September 2006 18:03 schrieb Dennis Schridde:
> > Am Mittwoch, 27. September 2006 17:56 schrieb Dennis Schridde:
> > > Am Mittwoch, 27. September 2006 16:06 schrieb Ryan C. Gordon:
> > > > > After I hardfixed the CRC calculation I tested with random access
> > > > > on 2 files. Was very slow, too. Even more than expected... I'll use
> > > > > multiple caches in the next version...
> > > >
> > > > Generally speaking, it's not worth going out of the way to optimize
> > > > random access to formats that don't support it, like zipfiles, which
> > > > have to decompress the whole file to get to the seek point...I assume
> > > > lzma has the same problem.
> > > >
> > > > The solution is usually to cache the decompressed file in RAM inside
> > > > PhysicsFS, but this is basically unacceptable on low-memory systems
> > > > like the PlayStation 2...especially considering that most apps do not
> > > > seek randomly through a file, or seek at all. Games and apps that
> > > > need to deal with a slow file can just as easily cache the data
> > > > through PHYSFS_read() without adding complexity or resource usage to
> > > > the library.
> > > >
> > > > So if the solution is "cache more," I'd encourage you to leave it as
> > > > a slow operation. If there was a fast way to jump to roughly the
> > > > correct location in the compressed stream and then figure out the
> > > > right plaintext offset by uncompressing a block or two of data, that
> > > > would be a win, but that's not usually possible.
> > >
> > > Thanks for that advice.
> > >
> > > The current implementation (which just redirects down to to 7z / LZMA)
> > > works like this:
> > > - Decompress the whole block, the file we want to read from is in, into
> > > the archive's cache.
> > > - Every subsequent read will just find the correct position in that
> > > block and copy the requested part into the buffer passed to
> > > LZMA_read(). - If a different file is read from the archive's cache is
> > > freed and the block of the new file is cached.
> > >
> > >
> > > I bet there is some documentation on it, but this is what I found in
> > > experiments:
> > > - Apparently a file is allways completely in one block. Or a block
> > > allways includes at least one complete file.
> > > - Apparently it is possible that multiple files are in one block,
> > > either when working on a completely solid archive or an solid archive
> > > with a blocksize greater than the filesize of 2 files.
> > > - If not using such solid voodoo there is exactly one file in each
> > > block.
> > >
> > > > (10 minutes to read 100 kilobytes of sequential data would be a
> > > > problem worth optimizing though!)
> > >
> > > Yes, I am working on this. :)
> > >
> > > Question:
> > > Is the current approach of using 7z's cache ok?
> > > Or should I try to decompress only the needed part of the file? (I
> > > don't know yet how this could work...)
> >
> > If it would be ok (memory usage, delay on first access, etc.), then I
> > would implement it so that each block which is decompressed is cached and
> > the files keep references to the block they are in and their offset. That
> > way I only need to call SzExtract on the first access to a block and if
> > multiple files are in a block they each keep a reference to the "shared"
> > block.
>
> Implemented in the attached patch. If it is not ok this way just tell me
> and I'll dig into LZMA further and find out what I can do about it.
> Speed has greatly improved, but now every block (folder) which has opened
> files stays allocated as long as the files are opened.
The size of the folder array could be reduced to 1/2 if I would not store the 
size and the index of the folder and instead pass dummy variables to 
SzExtract. Size benefit would of course differ if sizeof(PHYSFS_uint8*) != 4, 
eg. on 64bit systems.
This would be bit hacky, but would work with the current implementation of 
SzExtract and LZMA_read.

If the implementation of SzExtract would some day rely on the passed index 
being correct or LZMA_read would call SzExtract more than once (and thus 
SzExtract would fail because the cache size is incorrect) it would create 
problems.

--Dennis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://icculus.org/pipermail/physfs/attachments/20060928/4108e4f2/attachment.pgp>


More information about the physfs mailing list