Personal Webtops

February 15th, 2006

Recently I became interested in really customizing my google homepage. Not just having a bunch of news feeds and the weather, but really make a page that did things. So I looked at some of the tasks that I do frequently. Well, I do check the news… ok, so leave one of the news feeds. I read and write emails…. ok, so I need to find a way to track all my email accounts, not just my gmail account. I chat online… hmm, I wonder if there’s something that I can do about that. I also write software… not sure if that can be put on my homepage, yet.

Ok, so there are a few things that I do regularly. Well it turns out that the personal webpage is quite similar to all of the other personalized web pages from the likes of live.com, protopage.com, and pageflakes.com. They all use AJAX techniques to make rich web UIs. So if there is a web service, or a simple UI task that can be done, then a custom javascript widget can be written to accomplish that task. It can be anything from a game, to a calculator, to eventually a full office suite. (There are already a couple of word processors such as Zoho Writer).

So why is this interesting? Well, for one.. imagine not losing your files when your computer crashes. Imagine not having to worry about computer viruses taking over your PC. Imagine not having to drag that dang laptop with you when you go to the in-laws house, just so you can check email. If you can get to the web, you can get to your files. You can get to your desktop. Your PC doesn’t need to run anything but a browser. It doesn’t need a hard drive to hold any files. No hard drive, no viruses (well, almost).

So what’s my part in all of this? Well, I’m starting to code my own webtop apps. I’ve got a calculator already. I’m looking into creating a service for using AIM via my webtop. And in the future, I’d like to have an IDE for my development work.

So what will all this mean? I dunno, but by the looks of it, we’ll be able to have our desktop no matter where we are. Perhaps we can have a music player that ties into iTunes or another music library. Maybe we’ll find a use for that stupid “Active Desktop” feature that nobody has actually turned on for the last 10 years. In any case, we’ll see new and possibly useful ways of interacting with the web.

The State of Media

February 15th, 2006

For quite some time I’ve been watching the various media outlets. I have over two years of links to news articles about the media industry. When reviewing that collection, I notice one overriding theme: A revolution is upon us. We have the music industry suing their customers. We have the movie industry doing the same. And we have music and movie consumers “pirating” content in droves. Why are the producers and the consumers so at odds with each other? It seems that this is more than just somebody wanting something for free. As evidence, I point to the “success” of the iTunes music service from Apple. Not to mention the explosive popularity of LaunchCast by Yahoo.

So why is there such a disconnect between the producers and consumers? For starters, digital media allows lossless duplication. Consumers can learn about music from their friends, and with one click, copy the music to their own players.

What’s to stop them from just keeping the copy that they got from their friends, instead of purchasing their own legitimate copy? This is the question that the music industry is struggling with. So far this is the first time in history that the media companies have had to deal with lossless duplication of their works. In the past, any duplication technology would degrade the media somewhat. After very few iterations, the media would be useless. So, at best, someone would borrow a book, read it, and give it back. Then if they wanted to read it again, they would go out and purchase their own copy. If the person tried to copy the media, their copy would not be as high quality as the original. The same was true of records, and video tapes. Once CDs came out, this started to shift. Now, people could duplicate the media perfectly. There was no degrading. A copy could be made of the copy, and so on. The media companies were fearful, and they started looking for solutions to this threat.

The current solution that the media companies have come up with is a copy protection scheme known as “Digital Rights Management” (DRM) that only lets the media be played by a specific vendor’s player, and only if the consumer is authorized to play that specific media file. This is not a completely bad idea, however, as there is no standard, there are several incompatible schemes in existence. This ends up frustrating the situation because now the copy protection prevents the consumer from enjoying the media that they purchased legitimately. This is acceptable to the media companies since it does prevent the consumer from sharing media with others.

A better solution would be to come up with an open standard copy protection scheme that could be implemented by every player, thereby allowing music to be enjoyed by the authorized consumer, but not shared globally. The problem with this solution is that current copy protection schemes depend on “secret” methods for securing the content. Another problem is, even if a standard was put in place, software could be written to simply skip the authorization step.

Perhaps an even better solution would be to examine the data, and see if a problem really exists. Perhaps duplicating media isn’t as wide-spread as the media companies claim. If there is not such a problem, then DRM is a solution that is looking for a problem. Now, this is not to say that duplication of music is not a problem. It is. But DRM does nothing to curb industrial “fake CD” production. This problem requires policing, investigation, and zero tolerance for purchase managers that turn a blind eye to “really good deals.”

Once the media companies start focusing on ways of delivering what their customers are requesting, instead of suing them out of existence, they will find that the new market is even more lucrative than the old. Otherwise, they may find themselves left by the wayside as new, independent artists produce fresh media outside of their control.

The Difficulty of High Volume Servers

October 5th, 2005

Well, it has been far too long since I’ve posted anything. My apologies. For the last several months, I’ve been working on some interesting projects in the IPTV (Internet Protocol Television) world. In particular I’ve been working on a Real-Time Encryption Server (RTES) for broadcast quality mpeg streams. A typical broadcast stream will run about 3.5-4.0 Mbps. That’s a good chunk of data. In order to encrypt it real-time, you need to have a fairly lean pipeline. Again, no surprises. The trick is getting a server to handle several of these streams concurrently. When I started on this project, the current RTES was capable of handling 20 streams, or channels. For the typical TV service, such as your local cable company, this means that between three and ten servers are needed at the head-end to encrypt all of the channels. This becomes very expensive once you take into account the need for redundant server hardware, giga-bit routers to manage everything, and so forth. Besides, when you look at the numbers, 20 channels is really only 80Mb of data per second. A lot, certainly, but nowhere near the limit for giga-bit, and really only approaching the limit for fast-ethernet.

So why could we only handle 20 channels? Well, it turns out that the previous server was starving CPU resources due to the fact that between 2 and 4 threads were being allocated for each channel. The server was falling under the crushing burden of context switches between all of these threads.

Well, here was a nice chance to dive in and see what we could do. There happens to be a nice little system call to handle querying multiple sockets for data, select(). Sure it has its problems, but for the most part, select is quite good at polling a group of sockets to see if there’s data waiting on them.

After a bit of rewriting, we got RTES to use select to query each socket and notify an object that the given socket was ready for reading. At that point a small thread-pool was used to execute the encryption pipeline on that socket. This yielded a 300% gain. We can now handle 60 channels. An impressive gain, however the story doesn’t end there. With this new design, RTES is now very sensitive to buffer overruns in the network driver. When that happens, flow control is triggered and all threads block on IO, or return a “wouldblock” error in our case. Once flow control is triggered, no sockets may be read from until all IO in the single current thread that’s handing the flow finishes ALL IO. By the time that’s done, everything has fallen so hopelessly behind that flow control is triggered on the next socket read, and the next, and so on. I describe it as the kid on a skateboard holding on to the bumper of the bus. Once he lets go, he can’t ever catch up, and he falls on his face. But I digress.

How do we avoid the flow control problem? Since this is all functionality in the network driver, we can’t really tweak too much, however we can wildly increase buffer sizes so flow control is triggered far less often. This seems to help quite a bit. But it really just pushes the problem farther away, not a real resolution.

Well, I went digging for other designs that could be used. A buddy here at work pointed me at a paper written by Matt Welsh, David Culler, and Eric Brewer at UC Berkely. They describe a modular architecture that they call SEDA (Staged Event Driven Architecture). It’s a fairly interesting read, and strangely enough, right around the time they came up with this back in 2001, a few friends of mine and I were working on a generic data processing engine, for an outfit called Create-A-Check, using a very similar architecture.

So what is SEDA? Well, basically it is a collection of objects, called stages, that each handle a small component of processing. A web server stage list might look like this.

  1. Listen for connections
  2. Read request
  3. return cached static page
  4. fetch static page and add to cache
  5. fetch dynamic page

Each stage has a small queue at the front that holds incoming events. Each stage then acts on the event, then forwards it to the next stage. The execution would look like tree traversal, stage 1 to stage 2 to stage 3, stage 1 to stage 2 to stage 5, etc.

So what are the advantages to this modular architecture? Well, for one, you can dynamically adjust your pipeline based on things like server load. For example, if a server is getting slammed, it can start to reject dynamic page requests, and just handle static requests. Perhaps it can redirect requests that would require a cache-refresh, or insert a new stage that just returns a global “Service is Down” page for all requests. The business rules could be applied in a much more fine-grained fashion, and it would be easier to change them.

So now we’re looking at ways that SEDA could possibly help us solve our problems with high-volume multicast problems that RTES is facing. We are also looking at applying this model to our key request/certificate generation/verification servers. SEDA seems like a perfect fit for them.

In all of this, I’ve learned a couple of things. When it comes to high performace servers, it’s not the first 90% of capacity that’s the hard part. It’s that last 10%. Everything gets a bit blurry, and things can degrade very quickly, and unexpectedly. Handling load gracefully is a fun trick, and there’s no real silver bullet. As computing progresses, and high volume servers become more and more prevelant, new techniques will surely develop. It should be a fun ride.

-Joe

Thinking in objects instead of in code

June 8th, 2005

Lately I have had a couple of conversations regarding code design. One thing that I seem to find is several people do their “thinking” in code. They typically will sit down, start coding up a prototype, get it written from beginning to end, one step at a time. They tend to think in a linear, or at least procedural fashion. Once they have their first run of code, they will reduce it down into objects, frameworks, etc. This is quite different than the way that I tend to code. I am not suggesting that my way is better, it is, however, interesting to look at the differences. Typically I will try to reduce my problem into a “black-box” meaning I will try to find the absolute inputs, and the absolute outputs. Once I have that data path, I will then make an interface that offers that data path. I then make a pass through the interface, filling in high level functionality. I will then make a second pass, filling in lower level functionality, and so on, until all the functionality is present. It looks very much like developing a fractal. I do this because I don’t think about how the code will look so much as how the interfaces will look. This tends to drive me to visualize how the objects will look. I can then see what patterns will fit a given implementation quickly. I can also see if my design is not fitting the problem, and avoid or change things before I spend all my time implementing the bottom level details.

One disadvantage I do have though is I will tend to have to take slightly more time to develop my solution since it’s not usable until it is complete. Typically a “code thinker” will have a functional first run, even if it’s not elegant. However, in their “reduction phase” they may have to restructure the code significantly, with a rippling effect. This tends to end up causing their final product to take more time to develop than starting from a high level and filling in. I also tend to run into more language limitations since I am going from abstract towards concrete, and sometimes the language will not allow a process that I require. Since I don’t fill in the details until later, I may not discover this until it’s too late. This is very rare, but it does happen. When it does, I may have to rework a few bits of my design to compensate for the languages limit, but even these changes seem to be fairly isolated.

So, given these two different ways of coding, which works better? They both do. They seem to be fundamental ways that our brains work. Being aware of how others work can be beneficial for one’s self, even if it’s not practical to implement the other’s solutions. In the (in)famous words of Larry Wall, “there’s more than one way to do it.”

-Joe