To prove the answer to 2(d): start by considering what any target function that can be exactly learned using feature set 1 must look like. What is the minimum number of nodes in such a decision tree? Call that number m. You will then show that you can create an equivalent tree using only feature set 2 features with O(mk) nodes.
More **SPOILERS** ?