** SPOILERS ** 3

To prove the answer to 2(d): start by considering what any target function that can be exactly learned using feature set 1 must look like.  What is the minimum number of nodes in such a decision tree?  Call that number m.  You will then show that you can create an equivalent tree using only feature set 2 features with O(mk) nodes.

More **SPOILERS** ?