问渠网-数据结构实战-第六节哈夫曼编码

本文作者：李德强

第六节哈夫曼编码

哈夫曼编码又称为前缀码，即：每一个编码都不会是另一个编码的前缀。哈夫曼编码的生成非常简单，即从哈夫曼树的根节点开始，采用前序遍历找到每一个叶子节点，经过的路径中左节点为0，右节点为1。我们来看一下下面这两棵哈夫曼树：

由根节点向叶子节点遍历，找到节点7时，经过了两次左节点，则7的哈夫曼编码为00；而找到节点8时，经过的路径为01，所以节点8的哈夫曼编码为01；节点4的哈夫曼编码为100；节点5的哈夫曼编码为101……

再来看这棵哈夫曼树，节点8的编码为000，节点11的编码为001，节点23的编码为01，节点29的编码为10……

下面我们来实现生成哈夫曼编码和解哈夫曼编码的算法：

//生成哈夫曼编码
void tree_huffman_code(s_tree *tree, s_node *node, int *no, char **code)
{
	if (tree == null)
	{
		return;
	}
	if (tree->root == null)
	{
		return;
	}
	if (node == null)
	{
		return;
	}

	//如果左右节点均为空，说明是叶子节点
	if (node->left_child == null && node->right_child == null)
	{
		s_node *p = node;
		int i = 0;
		char *temp = malloc(MAX_CODE_LEN);
		//由叶子结节向根节点循环生成哈夫曼编码
		while (p->parent != null)
		{
			//如果当前结节为其父节点的左孩子，则编码为0
			if (p->parent->left_child == p)
			{
				temp[i] = '0';
			}
			//如果当前结节为其父节点的左孩子，则编码为1
			else if (p->parent->right_child == p)
			{
				temp[i] = '1';
			}
			//向上遍历
			p = p->parent;
			i++;
		}

		//由于上由叶子节点向上遍历，所以要将生成的编码逆转
		int len = i;
		i--;
		for (int j = 0; j < len; j++, i--)
		{
			code[*no][j] = temp[i];
		}
		//字符串结尾
		code[*no][len] = '\0';
		free(temp);

		(*no)++;
	}

	tree_huffman_code(tree, node->left_child, no, code);
	tree_huffman_code(tree, node->right_child, no, code);
}

//找到指定哈夫曼编码的节点数据
bool tree_get_data_by_huffman_code(s_tree *tree, char *code, void **data)
{
	if (tree == null)
	{
		return false;
	}
	if (tree->root == null)
	{
		return false;
	}
	if (code == null)
	{
		return false;
	}

	//从根节点开始遍历
	s_node *p = tree->root;
	int i = 0;
	//编码结束标识
	while (code[i] != '\0')
	{
		//编码0表示左节点
		if (code[i] == '0')
		{
			if (p->left_child == null)
			{
				return false;
			}
			p = p->left_child;
		}
		//编码1表示右节点
		else if (code[i] == '1')
		{
			if (p->right_child == null)
			{
				return false;
			}
			p = p->right_child;
		}
		i++;
	}

	if (p == null)
	{
		return false;
	}
	if (p->data == null)
	{
		return false;
	}

	//返回数据地址
	*data = p->data;

	return true;
}

Click and drag to move

当然，在实际的编码应用中，并不是采用字符串来存放哈夫曼编码的，这样非常浪费存储空间，在数据传输时也浪费了大量的时间。在实际应用中通常采用整数来存放哈夫曼编码，例如在第一个例子中：4520转化为哈夫曼编码为100 101 11111 111100 将这些编码再组合成整数：10010111111111100 = 0x12FFC。本例中为了让printf函数能方便的显示哈夫曼编码所以使用了字符串来存放这些编码。

在main函数中实现生成哈夫曼编码和解哈夫曼编码的测试代码：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "tree.h"
#include "list.h"

int main(int argc, char **args)
{

    //构建哈夫曼树过程省略


	//申请存放哈夫曼编码的字符串的二级指针
	char **code = malloc(sizeof(char *) * 9);
	for (int i = 0; i < 9; i++)
	{
		//申请存放哈夫曼编码的字符串
		code[i] = malloc(sizeof(char) * MAX_CODE_LEN);
	}
	int no = 0;
	//生成哈夫曼编码
	tree_huffman_code(&tree, tree.root, &no, code);

	//显示生成的哈夫曼编码
	for (int i = 0; i < 9; i++)
	{
		printf("%s ", code[i]);
	}
	printf("\n");

	//根据哈夫曼编码解码找到原数据
	for (int i = 0; i < 9; i++)
	{
		int *d = null;
		tree_get_data_by_huffman_code(&tree, code[i], (void **) &d);
		visit_int(d);
		free(code[i]);
	}
	printf("\n\n");
	free(code);

}

Click and drag to move

运行结果：

7 8 4 5 6 3 0 1 2 
00 01 100 101 110 1110 111100 111101 11111 
7 8 4 5 6 3 0 1 2 

8 11 23 29 14 7 3 5 
000 001 01 10 110 1110 11110 11111 
8 11 23 29 14 7 3 5

Click and drag to move

本例代码：

code path   chapter.05/e.g.5.6/

https       https://github.com/magicworldos/datastructure.git
git         git@github.com:magicworldos/datastructure.git
subverion   https://github.com/magicworldos/datastructure

Click and drag to move