04-树6. Huffman Codes

In 1953, David A. Huffman published his paper “A Method for the Construction of Minimum-Redundancy Codes”, and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string “aaaxuaxz”, we can observe that the frequencies of the characters ‘a’, ‘x’, ‘u’ and ‘z’ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a’=0, ‘x’=10, ‘u’=110, ‘z’=111}, or in another way as {‘a’=1, ‘x’=01, ‘u’=001, ‘z’=000}, both compress the string into 14 bits. Another set of code can be given as {‘a’=0, ‘x’=11, ‘u’=100, ‘z’=101}, but {‘a’=0, ‘x’=01, ‘u’=011, ‘z’=001} is NOT correct since “aaaxuaxz” and “aazuaxax” can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2 <= N <= 63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] … c[N] f[N]
where c[i] is a character chosen from {‘0’ - ‘9’, ‘a’ - ‘z’, ‘A’ - ‘Z’, ‘_’}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (<=1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]
where c[i] is the i-th character and code[i] is a string of ‘0’s and ‘1’s.

Output Specification:

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Sample Input:
7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11
Sample Output:
Yes
Yes
No
No

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
#include<iostream>
#include<queue>
#include<functional>
#include<string>
#include<map>
#include<algorithm>
using namespace std;

const int MAX_SIZE = 100;
struct Compare;
struct HuffmanNode
{
char node;
int weight;
HuffmanNode *left, *right;
bool lessthan(const HuffmanNode* hNode) const
{

return weight > hNode->weight;
}
// bool operator < (const HuffmanNode* &a) const
// {
// return weight <= a->weight; //最小值优先
// }
HuffmanNode(char ch) : node(ch), weight(0), left(NULL), right(NULL) {};
HuffmanNode(char ch, int w) : node(ch), weight(w), left(NULL), right(NULL) {};
};

struct Compare
{
bool operator ()(HuffmanNode *hNode1, HuffmanNode *hNode2)
{

return hNode1->lessthan(hNode2);
}
};

class HuffmanTree
{
public:
HuffmanTree();
~HuffmanTree();
void Release(HuffmanNode *hf);
void CreateHuffmanTree(priority_queue<HuffmanNode*, vector<HuffmanNode*>, Compare> pq, int N);
void Insert(char node, string huffmanCode, int &flag, map<char, int> &hMap);
void Check(int &flag, int &mWPL, int WPL);
bool levelOrder(HuffmanNode *hNode, int &mWPL);
void postOrder(HuffmanNode *hNode, int &WPL, int mCase);
void ComputeWPL(int &WPL);
int GetHeight(HuffmanNode *hNode);
private:
HuffmanNode *root;

};

HuffmanTree::HuffmanTree()
{
root = new HuffmanNode('-');
}

HuffmanTree::~HuffmanTree()
{
Release(root);
}

void HuffmanTree::Release(HuffmanNode *hf)
{
if(hf != NULL)
{
Release(hf->left);
Release(hf->right);
delete hf;
}
}
void HuffmanTree::Insert(char node, string huffmanCode, int &flag, map<char, int> &hMap)
{
int length = huffmanCode.size();
int i = 0;
HuffmanNode *hNode = root;
while(i < length)
{
if(huffmanCode[i] == '0')
{
if(hNode->left != NULL)
{
if(hNode->node != '-' || i == length - 1)
{
flag = 0;
cout << "No" << endl;
break;
}
}
else
{
if(i == length - 1)
{
hNode->left = new HuffmanNode(node, hMap[node]);
}
else
{
hNode->left = new HuffmanNode('-');
}
}
hNode = hNode->left;
}
else
{
if(hNode->right != NULL)
{
if(hNode->node != '-' || i == length - 1)
{
flag = 0;
cout << "No" << endl;
break;
}
}
else
{
if(i == length - 1)
{
hNode->right = new HuffmanNode(node, hMap[node]);
}
else
{
hNode->right = new HuffmanNode('-');
}
}
hNode = hNode->right;
}
++i;
}
}

void HuffmanTree::Check(int &flag, int &mWPL, int WPL)
{
postOrder(root, mWPL, 1);
if(WPL != mWPL)
{
flag = 0;
cout << "No" << endl;
}
// if(!levelOrder(root, mWPL) || WPL != mWPL)
// {
// flag = 0;
// cout << "No" << endl;
// }
}

//bool HuffmanTree::levelOrder(HuffmanNode *hNode, int &WPL)
//{
// HuffmanNode *Q[MAX_SIZE], *temp;
// int rear = -1, front = -1;
// Q[++rear] = hNode;
// while(rear != front)
// {
// temp = Q[++front];
// if(temp->left != NULL && temp->right != NULL)
// {
// Q[++rear] = temp->left;
// Q[++rear] = temp->right;
// WPL += temp->weight;
// }
// else if(temp->left == NULL && temp->right == NULL)
// {
// ;
// }
// else
// {
// return false;
// }
// }
// return true;
//}
void HuffmanTree::CreateHuffmanTree(priority_queue<HuffmanNode*, vector<HuffmanNode*>, Compare> pq, int N)
{
HuffmanNode *hNode;
for(int i = 0; i < N - 1; ++i)
{
hNode = new HuffmanNode('-', 0);
hNode->left = pq.top();
pq.pop();
hNode->right = pq.top();
pq.pop();
hNode->weight = hNode->left->weight + hNode->right->weight; //这一步可以省略,后面postOrder函数中重复计算了
pq.push(hNode);
}
root = hNode;
}

void HuffmanTree::ComputeWPL(int &WPL)
{
postOrder(root, WPL, 0);
}

void HuffmanTree::postOrder(HuffmanNode *hNode, int &WPL, int mCase)
{
if(hNode != NULL)
{
postOrder(hNode->left, WPL, mCase);
postOrder(hNode->right, WPL, mCase);
if(hNode->left != NULL && hNode->right != NULL)
{
hNode->weight = hNode->left->weight + hNode->right->weight;
WPL += hNode->weight;
}
}
}

int main()
{

int N = 0; //N is the number of character
cin >> N;
priority_queue<HuffmanNode*, vector<HuffmanNode*>, Compare> pqHuffmanCode; //最小值优先级队列
map<char, int> hMap;
int WPL = 0;
for(int i = 0; i < N; ++i)
{
char ch;
int w = 0;
cin >> ch >> w;
hMap[ch] = w;
HuffmanNode *ln = new HuffmanNode(ch, w);
pqHuffmanCode.push(ln);
}
// while(!pqHuffmanCode.empty())
// {
// cout << pqHuffmanCode.top()->node << " " << pqHuffmanCode.top()->weight << " ";
// pqHuffmanCode.pop();
// }
// cout << endl;
HuffmanTree ht;
ht.CreateHuffmanTree(pqHuffmanCode, N);
ht.ComputeWPL(WPL);
// cout << "WPL is : " << WPL << endl;

int M = 0; //M is number of student submissions
cin >> M;

for(int i = 0; i < M; ++i)
{
HuffmanTree hTree;
int flag = 1;
int mWPL = 0;
for(int j = 0; j < N; ++j)
{
char nodeVal;
string code;
cin >> nodeVal >> code;
if(flag)
{
hTree.Insert(nodeVal, code, flag, hMap);
}
}
if(flag && N > 1)
{
hTree.Check(flag, mWPL, WPL);
}
if(flag)
{
cout << "Yes" << endl;
}
}
return 0;
}

这道题搞了两天,第一次写完往上测试,最后一个点过不了,测试点3说段错误,第一次的算法是先判断是不是叶子结点,如果都是叶子结点在判断树的所有结点是不是有度为1的结点,如果有说明不是哈夫曼树,但是这种方法由于最后一个测试点和测试点3过不了,还是老老实实算WPL,然后刚开始准备用最小堆来做,但是感觉最小堆做实在是麻烦,虽然最小堆已经写过了,后来在网上看别人的代码,有的人也没有实际建立哈夫曼树。
而是用STL里的优先队列,然后就百度百度priority_queue怎么用的,参见这篇文章用的过程中设计到排序,由于我的优先队列元素是结构体,我要用优先队列模拟最小堆,然后建立哈夫曼树,所以我的优先队列的元素就变成了结构体指针,然后如果优先队列的元素如果是结构体,则须重载运算符,但是如果是(结构体)指针,却没有这种用法,然后百度找了这篇文章解决了我的问题,这样就可以保证每次 push操作后,top()是最小的。
然后创建哈夫曼树的算法就是按照mooc里讲的,计算WPL,我并没有按照Wi*Li来计算,而是将所有非叶子结点的权值累加,得到的结果也是 WPL.计算过程采用后序遍历的方法。
最后过了,解题过程真是曲折,能力明显不够,C++primer找时间还要继续看呐。

测试结果:

测试点 结果 用时(ms) 内存(kB) 得分/满分
0 答案正确 1 256 17/17
1 答案正确 1 360 7/7
2 答案正确 1 240 3/3
3 答案正确 54 360 1/1
4 答案正确 1 256 1/1
5 答案正确 1 232 1/1