Summary10

Lecture Summary

The main topic of the lecture is the correctness proofs of minimum spanning tree algorithms, with the goal of gaining a deeper understanding of the graph theory. A number of the homework problems relate to this. At the end of the lecture, clustering is introduced as an application of the minimum spanning tree algorithm. Students were fairly engaged with the clustering discussion at the end.

This lecture is a bit short - it came to just about 44 minutes.

This lecture was given two days after the spanning tree algorithms were introduced. You will be using the algorithm after a much bigger break from the spanning trees - so you may need to spend more time on the slide introducing the algorithms.

The edge inclusion lemma is important - so student should be clear on what the lemma says. If students are not comfortable about what the lemma says, you should stop for discussion.

It was pointed out by an observer, that the lecturer tends to point to the code, without identifying what he is pointing to. In this case the algorithm is simple enough, that it probably isn't an issue.

The instructor wants the students to state how the edge inclusion is used to prove that the algorithm is correct - however, it is not very clear what he is asking. What he is asking is "How is the edge inclusion lemma used to show that when Prim's algorithm adds an edge to T, it is in the minimum spanning tree". A more specific version of the question is "What is the set S that the edge inclusion lemma would use to show that edges are correctly added to the minimum spanning tree". The student answer is "in the cloud versus out of the cloud" - in class, we have been refering to the collection of vertices chosen in Prim's algorithm (or Dijkstra's algorithm) as "The cloud". (This is not standard terminology - so no need to use it).

The same approach is taken for Kruskal's algorithm of asking the students how to apply the Inclusion lemma. On this slide - the instructor is clear when he asks the question. A student gives a correct answer - but it is very hard to understand. The instructor draws a diagram, and then explains what the student said, so you shouldn't worry about trying to understand the student.

At 14:00 the class is asked: "Why can't the most expensive edge on a cycle be on a spanning tree". The student answer is not audible. There are several more student comments - which are easier to understand. These comments are true - but don't go as far as giving a proof. At about 16:00 a student gives a sketch of a correct proof. The instructor starts the proof. At 17:40 he asks "what do we do with the cycle" but gets no response - so continues. At 20:35 the instructor asks why the edge e₁ has lower cost than e. This is a simple question which can test whether the audience is following the discussion.

At abpit 22:00 the instructor applies the lemma to show that the algorithm works. The question asked at about 22:45 is how do we know that the most expensive edge which does not disconnect the graph is the largest edge on the cycle. This is answered by a student, and the answer is explained.

The instructor returns to the detail of assuming edges have distinct cost. This allowed the proofs to be much simpler. However, it could be a source of concern, since many of the cases where these algorithms are used will have edges with the same weights. (Studetns didn't seem too concerned).

At 24:55 the instructor asks how to get rid of duplicates on an example. Stop the video before the student answers. The student correctly says to add small values to the edges.

The clustering application is introduced. Clustering is a very important practical problem, and there are far more sophisticated methods than the one introduced here. However - the spanning tree algorithm is used frequently (and many people use the algorithm, without realizing that it comes from spanning trees!)

In the clustering definition, note that we do not care about the size of clusters - just the number of clusters. (If we put restriction on the size of clusters - the problem becomes much harder).

The instructor was trying to be careful in defining the distance - so you should make sure that students follow the definition. There is a chance that this might be hard to follow because of the languages (and the use of phrases such as "Maximizing the minimum distance").

The next three slides have the same point set and ask it to be split into two, three, and four clusters. These could be done quickly as student submissions - do all three without restarting the video. The way to break it into three sets is obvious, but two or four are unclear (and the answer doesn't really matter) - the goal is just to make the clustering problem clear.

At about 35:00 a student proposes an algorithm for the problem. What the student proposes is the reverse-delete spanning tree algorithm. The instructor admits he had not been thinking of that one - but acknowledges that the algorithm will work.

At 36:00 a student say that to get K clusters you compute a minimum spanning tree and knock out K edges. The instructor says that this is "Approximately correct" (Actually, K-1 edges are removed).

The lecture ends with a discussion of the spanning tree based clustering algorithm. At the end of the lecture the instructor makes reference to the text, and says a few words about the algorithms that are not being covered.