Ultraviolet Consulting Blog: May 2008

Monday, May 26, 2008

Scrum at Google: Ken Schwaber Talk

Ken Schwaber addresses Google about Google's Scrum implementation:

In this video he discusses the history of scrum coming out of Japan when Japanese firms were stressed to compete at higher levels. He also talks about how Google's competitor Yahoo uses scrum.

Related Resources

Agile podcast and interviews about Scrum with Ken Schwaber

These three introductions to Scrum by Ken are excellent. They give you very clean and clear definitions of what the major aspects of Scrum are. Highly reocmmended.

Part 1: http://blogs.conchango.com/howardvanrooijen/archive/2005/10/25/2296.aspx
Part 2: http://blogs.conchango.com/howardvanrooijen/archive/2005/10/31/2323.aspx
Part 3: http://blogs.conchango.com/howardvanrooijen/archive/2005/11/05/2362.aspx

Ken Schwaber's book:

Agile Project Management with Scrum, by Microsoft Press: http://www.amazon.com/Agile-Project-Management-Microsoft-Professional/dp/073561993X

Scrum Masters 2: funny video

Review: Scrum and XP from the Trenches

I am reading the book Scrum and XP from the Trenches by Henrik Kniberg. It is about his personal experience successfully implementing a Scrum-based methodology in his development team of 40 people in Stockholm, Sweeden.

You can read the book as a PDF on InfoQ's web site here: http://www.infoq.com/minibooks/scrum-xp-from-the-trenches

Or, you can purchase the book at LuLu here:
http://www.lulu.com/content/899349

Review
Here are my comments on each section of the book:

Foreward by Jeff Sutherland

Jeff co-created Scrum. In his foreward he makes these points:

JSG paraphase:

Scrum is becoming very important to venture capitalists (especially his own partners)
Teams must know their project velocity -- This is how much of a backlog a team can accomplish in a given sprint. (See http://www.scrumalliance.org/articles/39-glossary-of-scrum-terms#1110 for details)
Only teams that match all of the criteria below in the excerpts will be funded by his venture capitalist partners.

Direct excerpts:

Iterations must have fixed time boxes and be less than six weeks
long.
Code at the end of the iteration must be tested by QA and be
working properly.
A Scrum team must have a Product Owner and know who that
person is.
The Product Owner must have a Product Backlog with estimates
created by the team.
The team must have a Burndown Chart and know their velocity.
There must be no one outside a team interfering with the team
during a Sprint.

Foreward by Mike Cohn

Mike is a founding member of the Scrum Alliance. He has written books, articles, and speaks regularly. Learn more at http://www.scrumalliance.org/profiles/8-mike-cohn.

Mike makes these points:

Scrum and XP are both practical, results-oriented approaches. They are about Getting Things Done
Prototype early and often rather than documenting requirements at an exhaustive level
Avoid excess modeling, prefer prototyping instead
Work on things that have potential to become part of the actual working solution
Refer to other resources for the theory behind Scrum. It is out there, but this book focuses on implementing Scrum successfully

Preface

He was skeptical going in, but was convinced shortly thereafter starting
Scrum worked for the author's 40 person team
Team's quality was way below acceptable, but implementing Scrum solved their problems
He will use Scrum by default for new projects in the future unless there is a specific reason not to use it

1. Introduction

Scrum is not a magic bullet
You tailor it to your context
His team was fighting fires all the time
Quality was low
Overtime was up
Deadlines were missed
"Scrum" was just a buzzword to most people
Implemented across teams of 3 to 12 people
Learned about different ways of managing backlog (Excel, Jira, index cards, etc)
Experimented w/different sprint sizes (2 to 6 weeks)
Used other XP practices like pair programming
Used continuous integration
Used TDD
Took Ken's certification course
Most useful info came from real "War Stories" and case studies of people actually solving problems with Scrum

2. How we do product backlogs

Backlog is hear and soul of Scrum
Backlog contains customer's desired features, prioritized by most critical
Call the backlog items User Stories or just Stories
ID, Name, Importance, Initial Estimate, How-To-Demo, Notes
6 fields were most often used
Kept in Excel spreadsheet on shared drive with Multiple Editing allowed
Additional: Track, Components, Requestor, Bug Tracking ID if needed
Keep the product backlog at the Business Level not a technical level
Let the team figure out the How-To Technical Level
Ask Why as many teams as needed to get to the underlying intent if the Product Owner does state it in technical terms, and move technical language to Notes field.

3. How we prepare for Sprint planning

Product Backlog MUST exist first in ship-shape form
One Product Backlog and One Product Owner per Product
All items should have a unique Importance level
Leave large gaps. 100 does not mean 5 times more important than 20. Easier than making 20.5 if you did 20 and 21 instead when item C comes up in the middle
Product Owner should understand the intent of each story
Others can add stories, but only Product Owner can assign importance
Only the team can add an estimate
Tried using Jira for keeping the backlog, but too many clicks for Product Owner
Have not yet tried VersionOne or other Scrum tools

4. How we do Sprint planning

TODO page 25

5. How we communicate Sprints

6. How we do Sprint backlogs

7. How we arrange the team room

8. How we do daily Scrum

9. How we do Spring demos

10.How we do Spring retrospectives

11.Slack time between Sprints

12.How we do release planning and fixed priced contracts

13.How we combine Scrum with XP

14.How we do testing

15.How we handle multiple Scrum teams

16.How we handle geographically distributed teams

17.Scrum master checklist

18.Parting words

Recommended reading

Saturday, May 24, 2008

Getting Real, Release It!, and The Perpetual Beta: Modern Web Apps

I've been thinking an awful lot lately about web application architecture and release strategies. This has a lot to do with a project I am working on professionally, but it ties into everything I've been thinking about regarding architecture in general.

Here are some links for reading more about these topics and insights from some of today's top developers, including 37signals.com, the creators of the Ruby-on-Rails framework and developers of http://www.Meetup.com

Getting Real, by 37Signals

Learn about 37Signals by googling them. They are the creators of the Ruby-on-Rails framework that has pushed a lot of other people to higher levels of quality. Only a true Microsoft fanboy would not know that tons of what is inside of Microsoft's forthcoming ASP.NET MVC framework comes straight out of the Ruby-on-Rails framework.

Getting Real is a book by the 37Signals crew about how they do Agile development.

Read it here: http://gettingreal.37signals.com/index.php

Here is a key excerpt from chapter 2 in the essay entitled Fix Time and Budget Flex Scope

"Launching something great that's a little smaller in scope than planned is better than launching something mediocre and full of holes because you had to hit some magical time, budget, and scope window. Leave the magic to Houdini. You've got a real business to run and a real product to deliver.

Here are the benefits of fixing time and budget, and keeping scope flexible:

Prioritization
You have to figure out what's really important. What's going to make it into this initial release? This forces a constraint on you which will push you to make tough decisions instead of hemming and hawing.
Reality
Setting expectations is key. If you try to fix time, budget, and scope, you won't be able to deliver at a high level of quality. Sure, you can probably deliver something, but is "something" what you really want to deliver?
Flexibility
The ability to change is key. Having everything fixed makes it tough to change. Injecting scope flexibility will introduce options based on your real experience building the product. Flexibility is your friend."

Comments
I really like these ideas with regard to web applications. These principles become extremely important when you are migrating a system from an older technology to a new technology, especially when the benefit you seek is primarily for the application infrastructure as opposed to for the end-user benefit. You have to ask yourself questions like:

How does this migration actually benefit the user?
Is is tested well enough to replace the existing system such that users do not notice the change?
Is there a way to minimize any possible damage the new infrastructure could cause to profitability should it not work as hoped?

As a consultant, you have to make clients aware of these questions. Your job is to inform them about possible problems and strategies for risk mitigation. Ultimately, they may select to do something you either agree with or don't, but you have to due your diligence when you recommend a migration strategy.

The situation is more complex when you are implementing not just a back-end migration, but also adding brand new features that you want to introduce to actually improve the user experience.

Suppose you have this scenario:

You have a client that operates a popular shopping web site
The existing application is running ASP with COM objects written in Visual Basic 6.0
The existing application works well and has been tested through real-world use for more than five years
The existing application continues in increase in value, leading to higher and higher ROI each year
Your client wants to migrate to Visual Basic.NET and ASP.NET 3.5
Your client wants to add new features to the system that will increase the usability and utility for the system's users such as Ajax-enhanced search pages and RESTful URLs that offer strong Search-Engine-Optimization benefits

You have to carefully weigh all of these demands and criteria. Ask questions like:

How important is time-to-market?
How important is it that users do not have any interruptions in service?
What are the performance and scalability requirements?

Yes, Yes, and Yes
In my experience, most clients will want it as soon as possible and with as few interruptions as possible and with as good or better performance as the existing system. This is just a given.

But, you really cannot deliver all three of those concurrently. You have to make some trade-offs.

In this case, I believe the best migration strategy is what is called a vertical migration strategy. Read more about this in the Dr. Dobbs link below.

Create the new foundational architecture in Visual Basic.NET to support the system
Create Interop assemblies on top of the COM objects
Create the new, value-added functionality first
Bring up the system in beta side-by-side to the existing system so that the value-added features can be delivered to the users and so that the client realizes early return-on-investment (ROI) and gets early user feedback.
Monitor the system's performance and refactor any of the bottleneck areas caused by interop by implementing them in pure .NET code.
Add more features to the beta to slowly replace the existing application, getting user feedback and important performance and scalability information all the while to help direct your refactorings.

This is different from a horizontal migration strategy. In a horizontal strategy, you would select an entire layer of system, such as the UI, the Business Logic, or the Persistence Layer. A horizontal strategy is typically more complex and time-consuming and requires more testing.

However, you can use a very similar risk mitigation strategy to what you do in a vertical migration. You can bring up the new system side-by-side with the existing one and allow users to alpha and beta test it while you measure the usability, performance, and scalability and refactor as needed before it replaces the existing system.

You can read much more about the various approaches to ASP.NET application migration in this Dr. Dobbs online article: http://www.ddj.com/windows/184406077

Figure 1: Vertical Migration Strategy

Figure 2: Horizontal Migration Strategy

Release It!: Design and Deploy Production-Ready Software

Another book that I have eyed on shelves of late is called Release It!: Design and Deploy Production-Ready Software by Michael Nygard. Here is an interview with the author on InfoQ about the book and his lessons learned:

http://www.infoq.com/articles/nygard-release-it

Michael Nygard: First off, there's quite a bit of variation in what people mean by "feature complete". Even at best, it just means that all the specified functionality for a release has passed functional testing. For an agile team, it should also mean that all the acceptance tests pass. In some cases, though, all it means is that the developers finished their first draft of the code and threw it over the wall to the testers.

"Production ready" is orthogonal to "feature complete". Whether the acceptance tests pass or the testers give it a green check mark tells me nothing about how well the system as a whole is going to hold up under the stresses of real-world, every day use. Could be horrible, could be great.

For example, does it have a memory leak? Nobody actually runs a test server in the QA environment for a week or a month at a time, under realistic loads. We're lucky to get a week of testing, total, let alone a week just for longevity testing. So, passing QA doesn't tell me anything about memory leaks. It's very easy for memory leaks to get into production. Well, now that creates an operational problem, because the applications will have to be restarted regularly. Every memory leak I've seen is based on traffic, so the more traffic you get, the faster you leak memory. That means that you can't even predict when you'll have to restart the applications. It might be the middle of the busiest hour on your busiest day. Actually, it's pretty likely to happen during the busiest (i.e., the worst) times.

This is crucially important. Memory leaks come from third-party vendors just as often as they come from your own internal code. There is nothing like having to log in remotely to a web server when you're trying to have fun and the third-party component is causing your web server to hang. These are things that people rarely think about up front because they typically are problems revealed only by real system usage. I'll give a real world example:

Suppose you have a Visual Basic 6.0 COM object that use ADO internally. It may keep a RecordSet open to allow for consuming code to rewind the cursor and start over from the beginning. Well, .NET uses non-deterministic finalization, so you have to take care to call System.Interop.Marshal.ReleaseCOMObject to inform the runtime that it should destroy the COM object when you are finished with it. If you do not do this, you could end up with long-standing blocks against your database until the garbage collector frees the COM object.

I have run into this problem, and luckily was able to refactor my wrapper class in one single place to alleviate the problem. In the web application, we never rewind the collection, so it was safe for us to free the object after the while loop in the IEnumerable.GetEnumerator() completed.

As for this book, you can read the full table of contents and excerpts from the book in this PDF extracted from the book:

http://media.pragprog.com/titles/mnee/mnee-patterns.pdf

After looking at the TOC, I know this is a book I want to read.

Web 2.0 Applications and Agile Methodologies
This brings us into the territory of web 2.0 applications and the topic of the agile methodology.

The wikipedia entry for Web 2.0 defines the following key characteristics for a web 2.0 application:

"The sometimes complex and continually evolving technology infrastructure of Web 2.0 includes server-software, content-syndication, messaging-protocols, standards-oriented browsers with plugins and extensions, and various client-applications. The differing, yet complementary approaches of such elements provide Web 2.0 sites with information-storage, creation, and dissemination challenges and capabilities that go beyond what the public formerly expected in the environment of the so-called "Web 1.0".

Web 2.0 websites typically include some of the following features/techniques:

Cascading Style Sheets to aid in the separation of presentation and content
Folksonomies (collaborative tagging, social classification, social indexing, and social tagging)
Microformats extending pages with additional semantics
REST and/or XML- and/or JSON-based APIs
Rich Internet application techniques, often Ajax-based
Semantically valid XHTML and HTML markup
Syndication, aggregation and notification of data in RSS or Atom feeds
mashups, merging content from different sources, client- and server-side
Weblog-publishing tools
wiki or forum software, etc., to support user-generated content
Internet privacy, the extended power of users to manage their own privacy in cloaking or deleting their own user content or profiles."

Suppose your task is to migrate a web 1.0 application to a web 2.0 application such that it increasingly resembles these features. How do you do that with maximal risk mitigation and protection of the existing system's ROI?

Taking Cue's from Yahoo and Google's Lead
First, from a process methodology standpoint, both Yahoo and Google have adopted an agile process. In particular, they have adopted a Scrum-based development methodology. You can watch the following videos to learn more about that:

Scrum Tuning: Lessons Learned from Scrum Implementation: http://video.google.com/videoplay?docid=8795214308797356840
Scrum et al: http://video.google.com/videoplay?docid=-7230144396191025011

Just to summarize, however, here is what Scrum looks like visually:

Verbally, the wikipedia article describes it as:

Scrum is a process skeleton that includes a set of practices and predefined roles. The main roles in scrum are the ScrumMaster who maintains the processes and works similar to a project manager, the Product Owner who represents the stakeholders, and the Team which includes the developers.
During each sprint, a 15-30 day period (length decided by the team), the team creates an increment of potential shippable (usable) software. The set of features that go into each sprint come from the product backlog, which is a prioritized set of high level requirements of work to be done. What backlog items go into the sprint is determined during the sprint planning meeting. During this meeting the Product Owner informs the team of the items in the product backlog that he wants completed. The team then determines how much of this they can commit to complete during the next sprint.^[4] During the sprint, no one is able to change the sprint backlog, which means that the requirements are frozen for sprint.

There are several good implementations of systems for managing the Scrum process and the "sprints" while others prefer to use yellow stickers and white-boards. One of Scrum's biggest advantages is that it is very easy to learn and requires little effort to start using.

Why have both Google and Yahoo adopted scrum? Well, just listen to what someone inside of Yahoo had to say about this in the Scrum Mailing List.

http://shmula.com/159/scrum-at-yahoo

"What the Times doesn’t say is that Yahoo! is now 18 month into its adoption of Scrum, and has upwards of 500 people (and steadily growing) using Scrum in the US, Europe, and India. Scrum is being used successfully for projects ranging from new product development Yahoo! Podcasts, which won a webby 6 months after launch, was built start-to-finish in distributed Scrum between the US and India) to heavy-duty infrastructure work on Yahoo! Mail (which serves north of a hundred million users each month). Most (but not all) of the teams using Scrum at Yahoo! are doing it by the book, with active support from inside and outside coaches (both of which in my opinion are necessary for best results).

Pete Deemer Chief Product Officer, Yahoo! Bangalore / CSM"

Microsoft also uses Scrum-based methodologies for building systems. See this eWeek article here for details:

http://www.eweek.com/c/a/IT-Management/Microsoft-Lauds-Scrum-Method-for-Software-Projects/

David Treadwell, corporate vice president of the .Net Developer Platform group at Microsoft, said that while Microsoft welcomes the use of methodologies like Scrum, "were not mandating them, but were encouraging them. So Scrum is one process—the idea that teams meet once a day for half an hour, figure out what theyre going to do then go off and do their work very quickly.
"Its most important to mandate levels of quality. You have to give teams some flexibility to achieve those results as is most effective for those teams."
Indeed, "There are a lot of things going on," at Microsoft on this task, Treadwell said. "We have realized inside Microsoft over the years that software practices we used in the mid-90s dont scale to the size of problems that were tackling today.
"And we made some assumptions around the turn of the century that those processes would scale up and result in certain time frames that we would be able to ship software.
"And what happened is as the projects got larger and larger, we introduced too many complex interdependencies on early software, more so than we could really digest throughout the system," said Treadwell.
"And that was super-challenging for us, given the scale of these projects. But now were being much more precise about where we take those interdependencies."

Microsoft ASP.NET and MVC: A new direction
We see further changes in Microsoft's approach to development in the way they are releasing upgrades to ASP.NET and the MVC framework. Take a look at http://www.codeplex.com/aspnet. This is the new home for the ASP.NET framework, where Microsoft makes "often and early" releases to the developer community. They are still a few steps short of going fully open-source, but I think they will get there sooner rather than later. At least, I hope they do if they hope to survive in the competitive market.

The Perpetual Beta: A way forward that ensures quality

Finally, arrive at the idea of a perpetual beta. This is something that Tim O'Reilly discussed a few years back about the nature of a web 2.0 system. Read more about his comments here:

http://en.wikipedia.org/wiki/Perpetual_beta

and here:

http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html?page=4

His key point about the concept of a perpetual beta is this:

"Users must be treated as co-developers, in a reflection of open source development practices (even if the software in question is unlikely to be released under an open source license.) The open source dictum, "release early and release often" in fact has morphed into an even more radical position, "the perpetual beta," in which the product is developed in the open, with new features slipstreamed in on a monthly, weekly, or even daily basis. It's no accident that services such as Gmail, Google Maps, Flickr, del.icio.us, and the like may be expected to bear a "Beta" logo for years at a time."^[1]

The Perpetual Beta Deployment Model in Practice

In practice, you could implement the Perpetual Beta model as follows:

Web Farm: production.samplesite.com
[ Production Server #1 - App v1.0 ] [ Production Server #2 - App v1.0]

Beta Server: beta.samplesite.com
[ Beta Server #1 - App v1.x ]

To follow Google and Yahoo's lead here, you would deploy release candidate builds to the Beta Server and allow users to select to use that version of the system, but they could always navigate to the production URL instead of the beta.

This provides you the following benefits:

Users become part of your testing process and help you develop and guide the evolution of the system
Your team becomes far less stressed and annoyed because they have a true metric of user satisfaction and system stability to gauge before changes actually go into the true production environment
You decrease your time-to-market by having a system that is continually improving

Cautions and Caveats to this process:

You have to take care in testing the beta build well in advance of pushing it to the beta server. Make sure there are no database concurrency issues or blocking processes that could cause the beta and the production system to conflict.
You should be exercising automated tests and using regression test tools well in advance of the beta deployment.
You will still not catch all problems before it goes into the real production environment. This is just the way development is.

Conclusion
These are just some of my notes and thoughts on the way forward for web application development. I know I'm already behind on most of this stuff, but some of my readers are further behind :-)

Is there a silver lining to all of this change and fast pace?

I believe there is. The silver lining is HTTP and URI.

If you build your URIs to last, last they will. That is the fundamental change in mindset that is taking place for most of the successful players right now. They are realizing that they can construct their services using RESTful designs that allow both applications and users to repurpose content and data in ways that nobody thought possible before.

If you don't believe me, just head on over to http://pipes.yahoo.com or read up on Google Base http://www.google.com/base.

URI stands for Uniform Resource Identifier. It's about time we started treating it like one. We've got to stop reinventing the wheel and start driving the car we have with the wheel we got already.

Check out my previous posts on REST or just read Roy Fielding's dissertation to learn more.

Related Resources

Orbitz.com Lead Architect Brian Zimmer on Challenges on a Large Scale Project: http://www.infoq.com/interviews/Architecture-Brian-Zimmer
James Shore, author of "The Art of Agile Development" discusses his book: http://www.infoq.com/interviews/The-Art-of-Agile-Development-James-Shore

Thursday, May 22, 2008

Querying Google Base using GLinq in C#

Google Base is google's open data repository. Watch a video and learn more about the open API to use Google Base here: http://code.google.com/apis/base/

The GLinq project is a beta project that provides strongly-typed access to Google Base. Check it out at: http://www.codeplex.com/glinq.

Once you download the beta, you can run the sample application, which consists of this code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleTest
{
class Program
{
static void Main(string[] args)
{
//You need a google key. It's easy - http://code.google.com/apis/base/signup.html
Console.WriteLine("Enter your Google Base Key (http://code.google.com/apis/base/signup.html):");
string GoogleKey = Console.ReadLine();
GoogleItems.GoogleContext gc = new GoogleItems.GoogleContext(GoogleKey);
var r = from ipods in gc.products
where ipods.BaseQuery == "mp3 players" && ipods.Brand == "apple"
where ipods.Price > 200 && ipods.Price < 400
orderby ipods.Price descending
select ipods;

foreach (GoogleItems.Product product in r.Take(100))
{
Console.WriteLine("{0} for ${1}",product.Title, product.Price.ToString("#.##"));
}
Console.ReadKey(true);
}
}
}

This returns a big list of the first 100 items returned from Google Base:

16GB Apple iPod Touch for $399.99
IPOD for $399.99
Black 80GB Video Apple Ipod for $399.99
Apple 160GB iPod classic â?" Black for $399.99
Apple iPod Photo 60GB - 15000 songs & 25000 photos in Your Pocket for $399.99
Apple 160GB iPod classic â?" Black for $399.99
16GB Apple iPod Touch for $399.99
Apple 60 GB iPod with Video Playback Black for $399.99
Apple iPod Touch 16GB WiFi Digital Music/Photo/Video Player for $399.99
BOSE SoundDock Portable Black Digital Music System for the iPod for $399.99
Apple iPod touch 16GB* MP3 Player (with software upgrades) - Black for $399.99
Apple 8GB iPod touch for $399.99
APPLE IPOD 8GB TOUCH for $399.99
BOSE SoundDock Portable Black Digital Music System for the iPod for $399.99
Apple 16GB iPod touch for $399.99

Sunday, May 11, 2008

Web Application Architecture in 2008 and Beyond

Application Architecture 2008: The More Things Change the More they Stay the Same
This post is about the future of Web Application Architecture in 2008 and beyond. It details some trends in "returning to the basics" regarding the adoption of REST-based services that are happening right now that I believe will lead to companies being able to build solid long-term platforms for service integration and collaboration with external partners and end-users alike.

Back to the Basics, Son
A good architecture should minimize the degree of difficulty for end-users, software components, and external vendors and third-party software to use a system. It should maximize the opportunity for the same people and software to derive value from and contribute value to the system.

The way the existing World Wide Web works, with the simple HTTP protocol and URI standards, provides a good model that companies like Yahoo, Amazon, Google, and Microsoft (as of late) are capitalizing on to build long-term scalability and integration with disparate systems. The architecture of the web is called Representational State Transfer (REST), a term coined by the principal author of the HTTP and URI specifications, Roy Thomas Fielding .

Prerequisite Reading
To fully appreciate this post, I recommend you read a little more about the history of REST and WWW architecture as well as how they compare and contrast to SOAP / RPC models for Web Services and Service Oriented Architecture.

To summarize what you will find in the background reading, here is what REST boils down. Note that I am applying the example HTTP here because HTTP is an example of a REST-based architectural style.

HTTP focuses on finding named resources through the global addressing scheme afforded by the URI standard
HTTP allows applications to GET, PUT, POST, and DELETE to those URIs.
GET retrieves a representation of an resource located at a specific URI
PUT attempts to replace the resource at the given URI with a new copy
REST is not a "buzzword" or a "fad". It is a description of how the WWW actually operates and how it has scaled to the size and success that it has thus far.

Addressability and Uniform Interface are the Keys to Success

As explained in the Wikipedia entry, and alluded to above, addressability and the uniform interface are two of the most crucial features for a REST-based architecture.

I want to compare HTTP's implementation of a REST based architecture with a similar example of SQL.

In HTTP, we have the standard verbs, or methods, GET, POST, PUT, and DELETE.

Similarly, in SQL, we have SELECT, INSERT, UPDATE, DELETE.

Imagine a web site with the following URL:

http://www.mysite.com/Items/ViewItem.aspx?id=123

Here, we have several parts:

http://www.mysite.com = Host address

/Items = Subdirectory off the root

/ViewItem.aspx?id=123 = This breaks down into several parts:

View = verb (action)
Item = noun (resource)
id = parameter name
123 = parameter value

The actual HTTP request for this would look something like:

GET /ViewItem.aspx?id=123

Now, consider an alternative URL:

http://www.mysite.com/Items/123

Here we still have the first two parts the same, but we roll up all the others into the /123. This makes sense because we already have a verb, it's GET. We already know we are looking at the "Items" subdirectory, so that identifies the type of noun that we want already.

Detailed Example of Exposing and Consuming Data

Let's take the example a little further with microformats.

http://www.mysite.com/Items/123

Background Reading

These are some links for understanding Web (REST) architecture.

Real World Business Point of View

Mark Nottingham (principal architect at Yahoo) discussing how Yahoo uses REST-based principles for integrating all of their properties: http://www.infoq.com/presentations/services-without-soap-yahoo

Quick technical review

Overview of REST: http://en.wikipedia.org/wiki/Representational_State_Transfer
Paul Prescod on the REST vs. SOAP debate: http://www.prescod.net/rest/rest_vs_soap_overview/

Technical In-Depth analysis

Pete Lacey on InfoQ about REST and WS-*: http://www.infoq.com/interviews/pete-lacey-rest
My recent book review of RESTful Web Services: http://joshuagough.blogspot.com/2008/02/book-review-restful-web-services.html
Other recent interviews with architects on InfoQ: http://www.infoq.com/rest/
Steve Jones' post "Want to be cool? Learn REST. Want a career? Learn WS": http://service-architecture.blogspot.com/2006/11/want-to-be-cool-learn-rest-want-career.html

Detailed understanding

Roy Fielding's Ph.D. dissertation that introduced the term, "Architectural Styles and the Design of Network-Based Software Architectures": http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
RESTful Web Services book from O'Reilly: http://www.oreilly.com/catalog/9780596529260/

Miscellaneous
These are just some miscellaneous notes.

The more things change the more they stay the same
During the early years of the web we had a few building blocks:

HTTP (GET, POST, PUT, DELETE)
URI (http://www.hostname.com/resource)
HTML (and friends like the blink and marquee tags)
User agents (Netscape, Mosaic, Lynx, MSIE)
CGI and PERL

Later, we had things like:

Cascading Style Sheets
XML
Java
Javascript

A little later than that we got:

PHP
.NET
SOAP based Web Services
XmlHttpRequest

Most recently, we've seen:

Python popularity growth
Ruby
Ruby on Rails with ActiveRecord
Asynchronous Javascript and XML (AJAX)

Things that will never happen

My grandfather Gene Gough worked at IBM for 35 years as a programmer and manager. He sent out a funny email the other day. I don't know if this list is accurate or not, but here is the list posted on line: http://www.rense.com/general81/dw.htm.

Here are three good ones from the list

"Louis Pasteur's theory of germs is ridiculous fiction." -- Pierre Pachet, Professor of Physiology at Toulouse, 1872
"I think there is a world market for maybe five computers." --Thomas Watson, chairman of IBM, 1943

"There is no reason anyone would want a computer in their home." -- Ken Olson, president, chairman, founder of Digital Equipment Corp. 1977.

I worked with the people at CDC's Epi-X (http://www.cdc.gov/epix) program in the past, and I think they'd strongly disagree not just the first one, but with all three.

We are all familiar with the now famous "Technology Adoption Curve", which looks like a standard bell curve. You can read more about this here: http://en.wikipedia.org/wiki/Diffusion_of_innovations

REST Notes

These are links and notes from various sites about REST architectural styles.

http://www.infoq.com/interviews/pete-lacey-rest#

Q: Can you elaborate a little on why you think these would be tightly coupled? I mean everybody is talking about loose coupling and SOAP

A: I miss that one. When I was selling Systinet's products I talked about loosely coupled, when I present information for Burton Group on SOA I certainly stress "Trying to do your best to loosely couple client to server", because obviously loose coupling is a very good idea. The issue and I have said this before in print, is that you can strive for the largest amount of loose coupling possible in the SOAP WS-* world, and when you do the best job possible and when you obey all the best practices, you still end up with a tightly coupled system.

And you know the answer, but for the audience out there the fact of the matter is when you are creating a SOAP web service, a client, you have to know a great deal about that service, whether you learnt it via WSDL or some other mechanism, you have to know a great deal about it, and generally if the service changes in any not even terribly significant way, your client has to change with it or it will simply stop working. And you can go through heroic efforts to keep the existing clients alive and you can try and do your best at the design and development phase to loosely couple them, but the fact that the client has to know all the operation names and all the message formats before it can get any value out of a web service, is tight coupling to me.

Where of course in contrast all I need to know about a RESTful web service is its URI. Now I might not be able to derive all of the business values that the service can offer, but I can "GET" it and maybe something interesting will come from that, and it may be all the information I need. So properly designed RESTful system is dramatically loosely coupled, whereas a properly designed SOAP WS-* based system is unfortunately tightly coupled and all you can do is your best effort to avoid more tightly coupling than necessary.

Saturday, May 10, 2008

RESTful Web Services: Notes

This is continued from a prior post. These are my notes from the book RESTful Web Services that Mike (http://www.slipjig.org) and I have been reading. He is finished. He's already implementing a commercial application using RESTful design and the ASP.NET MVC framework at http://www.aps.net/mvc.

=Chapter 8: REST and ROA Best Practices=
*Expose all interesting nouns as Resources
*Perform all access to Resources through HTTP's uniform interface: GET, PUT, POST, DELETE
*Serve Representations of the Resource, not direct access to it
*Put complexity into Representations not into access methods!

=The Generic ROA Procedure=
*Figure out data set
*Split data set into resources
*Name the resources with URIs
*Expose a subset of the uniform interface
*Design the representation(s) accepted from the client
*Design the representation(s) served to the client
*Integrate the resource into existing resources, using hypermedia links and forms
*Consider the typical course of events: what's supposed to happen? Standard control flows like the Atom Publishing Protocol can help
*Consider error conditions: what might go wrong? Again, standard control flows can help

=Addressability=
*RESTful web services should be highly addressable through thousand or infinitely variable addresses
*By contrast: RPC / SOAP based web services typically expose just one or a few addresses
*URIs = Universal Resource Identifier
*Never make a URI represent more than one resource
*Ideally, each variation of representation should have its own URI (think of a URI with en-us for English, ko-kr for Korean, etc)
*Set Content-Location header to specify the canonical location of a resource
*URIs travel better if they specify both a resource and a representation

=State and Statelessness=
*Resource state: stays on server, sent to client as a representation
*Application state: stays on client, until passed to server to use for Create, Modify, or Delete operations
* Service is "stateless" if server never maintains in memory or disk any application state
** Each request is considered in isolation in terms of resource state
** Client sends all application state to the server with each request, including credentials

=Connectedness, aka Hypermedia as the Engine of Application State=
=Server guides client to paths for state transitions of resources using links and forms
=Links and forms are the "levers of state" transition
=XHTML and XML are good representational formats

=Uniform Interface=
*Resources expose one or more of HTTP's interface
*GET: requests info about a resource: returns headers and a representation
*HEAD : same as GET, but only headers are returned
*PUT: assertion about resource state. Usually a representation is sent to the server and the server tries to adjust the resource state to match the representation
*DELETE: assertion that the resource should be removed.
*POST: attempt to create a new resource from an existing one. Root resource may be a parent resource or a "factory" resource. POST can also be used to append state of an existing resource.
*OPTIONS: attempt to discover which other methods are supported (rarely used)
*You can overload POST if you need another method, but consider first whether you can implement your need by simply designing another resource
*For transactions, consider making them resources as well

=Safety and Idempotence=
*GET or HEAD should be safe: resource state on server should not change as a result
**Server can log, increase view count, but client is not at fault
*PUT or DELETE should be idempotent: making more than one of the same request should have the same effect as one single request.
**Avoid PUT requests that are actually instructions, like "Increment x by 5"
**Instead, PUT specific final values
*POST requests for resource creation are neither safe nor idempotent
** Consider: Post Once Exactly

=New Resources: Put Versus Post=
*PUT can only create new resources when it can calculate the actual URI
*POST can create new resources even when the server decides the new URI
**Ex: /{dbtable}/{itemid}, POST to /{dbtable} and the server returns the new URI

=Overloading POST=
*Can use POST to transform the resource into an RPC-style message processor (think: SOAP web services)
*Use of overloaded POST (for XML-RPC or SOAP) is strongly discouraged by the author.
**Using this breaks the Uniform Interface.
**No longer is the web a collection of well-defined URIs with a uniform interface, instead:
**It becomes a collection of known entry points into a universe of DIFFERING INTERFACES, few compatible with each other
*Legit overloaded POST:
**Work around lack of PUT and DELETE support
**Work around limitations on URI length
***POST http://name/resource?_method=GET and payload contains huge data set.
**Avoid methods in GET URIs: /blog/rebuild-index. This is not idempotent

=This Stuff Matters=
*Principles are not arbitrary
*Advantages: simpler, more interoperable, easier to combine than RPC
*JSG: They have, in fact, revolutionized the world by being so simple and constrained to allow loosely coupled "links" from all over the planet to anywhere else.

=Why Addressability Matters=
*Every interesting noun, or concept, is immediately accessible through one operation on its URI: GET
*URIs provide:
** Unique structured name for each item (you own your own domain name so name.com/item/12345 is always unique)
** Allows bookmarking
** Allows URIs to pass to other apps as input
** Allows for mashups you never imagined (pipes.yahoo.com)
* URIs are like:
** Cell addresses in Excel
** File paths on disk
** JSG: Longitude and Lattitude coordinates
** JSG: XPath queries against XML
** JSG: SQL SELECT statements against relational tables

=Why Statelessness Matters=
*The king of simplifying assumptions!
*Each request contains all application state needed for server to complete request
**No application state on server
**No application state implied by previous request
**Each request evaluated in isolation
*Makes it trivial to scale application up
**Add a load balancer and there is no need for server affinity
*Can scale up until resource (database) access becomes the bottleneck
*JSG This is where COM Interop introduces database latency by forcing connections open longer than they need to be because over marshalling data across process boundaries
*Increases reliability (requests that timeout can simply be requested again)
*Keeping session state can harm scalability and reliability, so use it wisely
** JSG if using a cookie, the cookie can be used to reinstantiate the srver side state for the user no matter what server handles the request

=Why the Uniform Interface Matters=
*Provides a standard way of interaction
*Given http://www.example.com/myresource, you know:
**GET retrieves it
**POST can attempt to append it or place a subordinate resource beneath it
**DELETE can assert that it should be removed

=Why Connectedness Matters=
* Provides a standard way to navigate from link to link and state to state

=Resource Design=
*Need a resource for each "thing" in your service
**Apply to any data object or algorithm
*Three types of resources:
**Predefined one-off resources: static list, object, db table
**Large (maybe infinite) number of resources of individual items: db row
**Large (usually infinite) number of resources representing ouputs of an algorithm: db query, search results
*For difficult situations, the solution is almost always to expose another resource.
**May be more abstract, but that is OK

=Relationships between Resourcs=
*Alice and Bob get married, do you:
**PUT update to Alice and to Bob, or:
**POST new resource to the "marriage" factory resource?
**Answer: you should create a third resource that links to both Alice and Bob
*JSG this leaves you wondering how you navigate the other direction, but this is not any different at all from a relational database table that has a linking table.

=Asynchronous Operations=
*A single HTTP request itself is synchronous
*Not all requests finish and many take a long time
*Use the 202 status code "Accepted"
*Example: ask server to calculate huge result, sever returns:

202 Accepted
Location: http://jobs.example.com/queue/job11a4f9

*This URI identifies the job uniquely for the client to come back to later
*Client can GET the URI for status updates and DELETE it to cancel it or DELETE the results later
*This overcomes the asynchronous "limitation" by using a new resource URI
*Caveat: use POST when you will spawn a new resource asynchronously to avoid breaking idempotency if you were to use GET or PUT

=Batch Operations=
*Factory resources can accept a collection of representations and creates many in response
*Create a "job" in response with a URI for the client to check status, or:
*Use WebDAV extension for 207: multi-status; client needs to look in entity body for a list of codes

=Transactions=
*You can implement them as resources, just like batch operations can be
*Example: financial transaction

1. POST to a transaction factory to get a URI for your transcation (201 Created response)

Request:
POST /transactions/account-transfer

Response:
201 Created

Location: /transactions/account-transfer/11a5

2. PUT new balance for checking account to this URI

PUT /transactions/account-transfer/11a5/accounts/checking

balance=150

3. Then PUT the new value for the savings account:

PUT /transactions/account-transfer/11a5/accounts/savings

balance=250

3. Commit the transaction

PUT /transactions/account-transfer/11a5

committed=true

*The server should make sure the representations make sense (no deleted money, no newly minted money, etc)
*RESTful transactions are more complex to implement, but they have advantages of being addressable, transparent, archived and linked

=When in Doubt, Make it a Resource=
*Anything can be a resource
*Strive to maintain the Uniform Interface

=URI Design=
*URIs should be well-designed and meaningful
*URIs should be "hackable" to increase the "surface area"
*Make it so clients can bookmark just about anything to get right to it
**Don't make clients have to repeat dozens of manual steps to get back to a view of a resource
*Go for general to specific
** Example: /weblogs/myweblog/entries/100
*Use punctuation to separate multiple data inputs at the same level
*Use commas when order matters (eg long and lat)
*Use semi-colons when order doesn't matter: /color-blends/red;blue
*Use query variables only for algorithm inputs
*URIs denote resource, not operations: almost never appropriate to put method names in them
**/object/do-operation is a bad style

=Outgoing Representations=
Bookmark: Page 254

Note to Self: Music and Salsa

So I'm back to technical/work mode in general. Not a lot of time to listen to interesting podcasts or audiobooks right now. But, I am not letting that stop me from doing things I want and need to do for fun.

I am taking piano/keyboard lessons at http://www.l5pmusiccenter.com

And, I'm continuing to take salsa lessons, usually at http://www.takeholdballroom.com and http://www.pasofinopro.com. Check out http://www.atlantagasalsa.com for a full list of salsa events in the area. I'll be taking a cruise to Alaska in August and it is a salsa cruise!

Ultraviolet Consulting Blog

My Favorite Blogs

The Atlanta Mentors Leadership Group

Carola L. Gough Art Gallery in Second Life

Monday, May 26, 2008

Scrum at Google: Ken Schwaber Talk

Review: Scrum and XP from the Trenches

Saturday, May 24, 2008

Getting Real, Release It!, and The Perpetual Beta: Modern Web Apps

Prioritization

Reality

Flexibility

Thursday, May 22, 2008

Querying Google Base using GLinq in C#

Sunday, May 11, 2008

Web Application Architecture in 2008 and Beyond

REST Notes

Saturday, May 10, 2008

RESTful Web Services: Notes

Note to Self: Music and Salsa

About Me

Labels

Blog Archive

My First Book