GitHub - mds96589/File-Compression-and-Decompression-Using-Huffman-Coding: File Compression and Decompression Using Huffman Coding

A Project on Text File Compression and Decompression using Huffman Coding in C++

Huffman Coding is a lossless data compression algorithm. The key idea is to use variable-length codes for encoding characters, where more frequent characters get shorter codes and less frequent characters get longer codes. This is achieved using a binary tree (Huffman tree).

Key Functionalities

Frequency Calculation: Compute the frequency of each character in the input text.
Heap Construction: Build a min-heap of nodes where each node contains a character and its frequency.
Huffman Tree Construction: Construct a Huffman tree from the min-heap.
Code Generation: Generate Huffman codes for each character based on the Huffman tree.
Text Encoding: Encode the input text using the generated Huffman codes.
Padding and Byte Conversion: Pad the encoded text to ensure its length is a multiple of 8, then convert it to a byte array.
File Writing: Write the byte array to a binary file.
Text Decoding: Decode the binary file back to the original text by reversing the above steps.

Process Flow

Compression:
- Read the input file and calculate character frequencies.
- Build a min-heap from the frequency dictionary.
- Construct the Huffman tree from the heap.
- Generate Huffman codes by traversing the Huffman tree.
- Encode the input text using the Huffman codes.
- Pad the encoded text and convert it to a byte array.
- Write the byte array to a binary file.
Decompression:
- Read the binary file and convert it to a bit string.
- Remove the padding from the bit string.
- Decode the bit string to the original text using the reverse mapping of Huffman codes.
- Write the decoded text to an output file.

We get an average of 45% compression on varying file sizes.

The test cases are stored in "Input" Folder, the resulting compressed bin files are stored in "Binary Compressed Files" Folder, and the decompressed files from the compressed ones are stored in "Decompressed Files" folder.

This implementation efficiently compresses and decompresses text files using the Huffman coding algorithm, leveraging binary trees, heaps, and hash maps for optimal performance.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Binary Compressed Files		Binary Compressed Files
Decompressed Files		Decompressed Files
Input		Input
FileCompressionHuffmanCoding.cpp		FileCompressionHuffmanCoding.cpp
FileCompressionHuffmanCoding.exe		FileCompressionHuffmanCoding.exe
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

mds96589/File-Compression-and-Decompression-Using-Huffman-Coding

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages