Here are my answers for CSE590yb Problem Set 1. E. SCORING: 20 points total; breakdown described below. 1. The Brewer paper claims that most web sites are limited by some I/O bottleneck. List all the different I/O bottlenecks that might exist in a giant scale web site (e.g., I/O interrupt handling, network bandwidth, ...) and identify a popular web service that you believe would be limited by each potential bottleneck (e.g., web searching, Amazon, CNN, ICQ, ...). a. network bandwidth - ICQ produces an large volume of data w/o taxing other I/O bottlenecks, thus network bandwidth may be the principal bottleneck. b. disk bandwidth - What is a web service that spends a relatively large portion of its time reading/writing the disk? Services that access large databases on disk, such web search engines, fit the bill. c. I/O interrupt handling - What is a web service that spends relatively little time computing versus performing I/O? A busy ski report service fits the bill. It doesn't compute. All the data is in memory. And it doesn't transmit much data. Thus handling interrupts may dominate. d. memory system bandwidth - What is a web service spends significant time messaging data in memory? A mail service like hotmail temporarily stores email messages in memory and prepares them for delivery. e. front-end - Any busy web site could suffer from a front-end bottleneck. There are a number of other bottlenecks, but I would not consider them I/O bottlenecks. Some examples are CPU speed, limited server replication, memory, and software efficiency. SCORING: a-e are each worth 2 points (1 for the bottleneck and 1 for the named service). Only 4 of 5 needed. 8 pts total. 2. Almost every hardware computing technology in mass deployment has been improving in cost/performance at a rapid exponential rate. For example, CPU cost/MFLOP, wired and wireless networking cost/bandwidth, and DRAM cost/capacity have all been improving at roughly 60%/year. Disks have been improving faster, at 100%/year. Starting from today's prices, suppose the same rate of improvement continues at the same pace, when will we be able to buy a GFLOP for $5? A terabyte of disk for $5? A gigabit wireless connection? An effective voice recognition system can be built using only a few GFLOPs and a huge disk; when will we be able to afford to put one in every microwave? Let C be a measure of cost/value (e.g., cost/MFLOPS, cost/capacity). If we can expect an annual factor of s improvement (i.e., 60% improvement => s=1.6), then the following equality gives the predicted cost/value C' after n years. C' = C/(s^n) We solve for n. s^n = C/C' n = log_s(C/C') {log_s == log base s} = log(C/C')/log(s) GFLOPS for $5 Because CPUs mostly "spend" their exponential improvement on becoming cheaper (versus becoming faster), I want to base my initial C on a processor with 1 GFLOPS performance. It turns out that n is not highly sensitive to such a careful selection of C. The reason is that small multiplicative factors get lost in the log. According to hardwarecentral.com, you can get 998 MFLOPS (call it 1 GFLOPS) out of a 733MHz Pentium III. According to an article by Mike Magee at "The Register" (http://www.theregister.co.uk/000425-000004.html) This chip goes for $337 each in quantities of 1000. Thus C = $337/1GFLOPS, C' = $5/1GFLOPS, and s=1.6 (i.e., 60% improvement per year). Substituting, we get... n = log(($337/1GFLOPS)/($5/1GFLOPS))/(log(1.6)) = log(337/5)/log(1.6) = log(67)/log(1.6) = 9 years TB of disk for $5 Because disks mostly "spend" their exponential improvement on increasing capacity (versus becoming cheaper), I want to base my initial C on a disk that costs around $5. Well, no such disk exists, so I'll look for a cheap one. Again, the log hides this issue. Scaling C by a factor of 5 only changes n by a couple years. According to pricewatch.com, you can get a 2.1GB (call it 2GB) disk (by Quantum) for $49 (call it $50). Thus C = $50/2GB, C' = $5/1TB, and s=2 (i.e., 100% improvement per year). Substituting, we get... n = log(($50/2GB)/($5/1TB))/log(2) = log(10 * 500)/log(2) = log(5000)/log(2) = 12.3 years = 12 years But I'm skeptical. $5 is quite cheap, given how disks spend their improvement. For context, increasing capacity to 1TB for $50 in 9 years is probably a more accurate prediction. Gb wireless network Notice this question makes no mention of cost, so I'll assume that cost, c, is fixed (i.e., same in C and C'). This conveniently finesses the problem of figuring out what cost is. Connection time cost? Network card cost? Monthly wireless ISP cost? What kind of wireless connections are out there? 56Kbs modems on a cell phone. 1.5Mbs from gainwireless.com in Tuscon. And there exist 11Mbs network cards (wireless ethernet). I'm not going to use the 56Kbs figure, because it doesn't use a "real" data network. I'll be aggressive and us the 11Mbs figure. C = c/11Mbs, C' = c/1Gbs, and s = 1.6. n = log((c/11Mbs)/(c/1Gbs))/log(1.6) = log(1000/11)/log(1.6) = log(91)/log(1.6) = 9.6 years = 10 years Ubiquitous voice recognition An effective voice recognition system requires a few GFLOPS ($5/GFLOPS in 9 years) and a huge disk ($5/TB in 12 years?). In about a decade, FLOPS and bytes will be suitably cheap that we can stuff them in microwaves to do wacky things like voice recognition. SCORING: There are 5 parts to this question: (a) identify how price/performance changes over time [2 points], (b) GFLOPS [2 points], (c) disk capacity [2 points], (d) wireless networks [2 points], (e) voice recognition [3 points].