So, we check if the size of the bits string is multiple of 8 or not by checking the modulo of 8. Our decompressing/un-huff program must have some mechanism to account for these extra or "padding" bits since these bits do not represent compressed information. Also, because of buffering, if all output is done in eight-bit chunks and your program writes exactly 61 bits explicitly, then 3 extra bits will be written so that the number of bits written is a multiple of eight. In any case, when you write 3 bits, then 2 bits, then 10 bits, all the bits are eventually written, but you cannot be sure precisely when they're written during the execution of your program. On many systems all file sizes are multiples of 8 or 16 bits so that it isn't possible to have a 122 bit file. Operating systems also typically require that disk files have sizes that are multiples of some architecture/operating system specific unit, e.g., a byte or word. This means output is actually written to disk when some internal buffer is full, not every time you write to a stream in a program. When you write output the operating system typically buffers the output for efficiency. For non-leaf nodes there's no information that needs to be written, just the bit that indicates there's an internal node. For leaf nodes, you will also need to write the character stored. One way to do this is write a single bit for each node, say 1 for leaf and 0 for non-leaf. You must differentiate leaf nodes from internal/non-leaf nodes. One method for doing this is to do a pre-order traversal, writing each node visited. You can store the tree at the beginning of the file. You could use a "standard" character frequency, e.g., for any English language text you could assume weights/frequencies for every character and use these in constructing the tree for both compression and uncompression. If you do the latter, you must include some method for indicating the character, e.g., store character/count pairs. You can store counts for every character, or counts for the non-zero characters. Store the character counts at the beginning of the file. There are several alternatives for storing the tree: You must store some initial information in the compressed file that will be used by the uncompression/un-huffing program (the tree basically). One priority queue to apply the minimum heap minimum extraction property in the tree construction algorithm.Two unordered maps, one for the character and coding pairs and another for the character and the frequency or number of times repeated.The program that does the reverse, producing a regular file from a compressed file, will be called the uncompression or un-huffing program.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |