Storing the outgoing set of registers (program state) and loading the new set of registers (program state), since this usually requires (slow) memory access.
Storing PCBs in the processor via multiple sets of registers would make switching nearly free. The OS could then instruct the processor to switch PCBs when a new process is to be scheduled.
A new PCB, address space (including page table entries for code, data, a new stack and heap), and copies of all OS bookkeeping entries (like file handles, etc.).
A thread needs just a new program counter, register set, and stack. No new memory structures or operating system resources are needed.
Kernel threads allow scheduling on a thread level, and prevent all threads in a process from blocking when a single thread (and therefore the process) blocks.
8 [for P1] + (12 - 0.4) [for P2] + (13 - 1) [for P3] = 31.6 units total time, or 10.53 units average.
8 [for P1] + (9 - 1) [for P3] + (13 - 0.4) [for P2] = 28.6 units total time, or 9.53 units average.
1 [for P3] + (6 - 0.4) [for P2] + 14 [for P1] = 20.6 units total time, or 6.87 units average.