Wednesday, June 24, 2009

Huffman Code

So this is another little assignment that we had just before final exams. It's about Huffman code and its application in file compression. We were supposed to write a program to read a file, determine total bits for that file, then use Huffman code to compress it and calculate the total bits after the compression. The file should only contain alphanumeric characters [A-Za-z0-9], and maybe spaces, new lines and dots and commas.

This script is not the most elegant since I only had 2, 3 days to do it (really it's a dump of chunk of codes) but it works so hoorah! Basically what the script does is:

  1. Read a file (*.txt) and determine frequency of each characters and total bits.

  2. Build a Huffman tree based on previous frequency table. Each node keeps a character & its frequency, path (eg. 011, 0011, 01110 etc), vertex flag (whether it's a left or right child) and visit flag (whether it has been visited or not - for node searching later). Note though, not every node keeps a character, only leaf nodes do (Huffman tree properties). Sorting algorithm to build priority queue to work with when building the tree is selection sorting algorithm.

  3. Follow every root-to-leaf path to get the bits for each character. This script uses breadth-first search algorithm to look for nodes inside the tree.

  4. Write compression information in a new file (compressed one) but this script only write the bits sequence into the file.


The file here: huffman.cpp (C++)

Some screenshots:

Media_httpblogcawanpi_jgmfc



Media_httpblogcawanpi_hvyna



Media_httpblogcawanpi_odnwx

No comments:

Post a Comment