Saturday, May 10, 2008

RESTful Web Services: Notes

This is continued from a prior post. These are my notes from the book RESTful Web Services that Mike (http://www.slipjig.org) and I have been reading. He is finished. He's already implementing a commercial application using RESTful design and the ASP.NET MVC framework at http://www.aps.net/mvc.


=Chapter 8: REST and ROA Best Practices=
*Expose all interesting nouns as Resources
*Perform all access to Resources through HTTP's uniform interface: GET, PUT, POST, DELETE
*Serve Representations of the Resource, not direct access to it
*Put complexity into Representations not into access methods!

=The Generic ROA Procedure=
*Figure out data set
*Split data set into resources
*Name the resources with URIs
*Expose a subset of the uniform interface
*Design the representation(s) accepted from the client
*Design the representation(s) served to the client
*Integrate the resource into existing resources, using hypermedia links and forms
*Consider the typical course of events: what's supposed to happen? Standard control flows like the Atom Publishing Protocol can help
*Consider error conditions: what might go wrong? Again, standard control flows can help

=Addressability=
*RESTful web services should be highly addressable through thousand or infinitely variable addresses
*By contrast: RPC / SOAP based web services typically expose just one or a few addresses
*URIs = Universal Resource Identifier
*Never make a URI represent more than one resource
*Ideally, each variation of representation should have its own URI (think of a URI with en-us for English, ko-kr for Korean, etc)
*Set Content-Location header to specify the canonical location of a resource
*URIs travel better if they specify both a resource and a representation

=State and Statelessness=
*Resource state: stays on server, sent to client as a representation
*Application state: stays on client, until passed to server to use for Create, Modify, or Delete operations
* Service is "stateless" if server never maintains in memory or disk any application state
** Each request is considered in isolation in terms of resource state
** Client sends all application state to the server with each request, including credentials

=Connectedness, aka Hypermedia as the Engine of Application State=
=Server guides client to paths for state transitions of resources using links and forms
=Links and forms are the "levers of state" transition
=XHTML and XML are good representational formats

=Uniform Interface=
*Resources expose one or more of HTTP's interface
*GET: requests info about a resource: returns headers and a representation
*HEAD : same as GET, but only headers are returned
*PUT: assertion about resource state. Usually a representation is sent to the server and the server tries to adjust the resource state to match the representation
*DELETE: assertion that the resource should be removed.
*POST: attempt to create a new resource from an existing one. Root resource may be a parent resource or a "factory" resource. POST can also be used to append state of an existing resource.
*OPTIONS: attempt to discover which other methods are supported (rarely used)
*You can overload POST if you need another method, but consider first whether you can implement your need by simply designing another resource
*For transactions, consider making them resources as well

=Safety and Idempotence=
*GET or HEAD should be safe: resource state on server should not change as a result
**Server can log, increase view count, but client is not at fault
*PUT or DELETE should be idempotent: making more than one of the same request should have the same effect as one single request.
**Avoid PUT requests that are actually instructions, like "Increment x by 5"
**Instead, PUT specific final values
*POST requests for resource creation are neither safe nor idempotent
** Consider: Post Once Exactly

=New Resources: Put Versus Post=
*PUT can only create new resources when it can calculate the actual URI
*POST can create new resources even when the server decides the new URI
**Ex: /{dbtable}/{itemid}, POST to /{dbtable} and the server returns the new URI

=Overloading POST=
*Can use POST to transform the resource into an RPC-style message processor (think: SOAP web services)
*Use of overloaded POST (for XML-RPC or SOAP) is strongly discouraged by the author.
**Using this breaks the Uniform Interface.
**No longer is the web a collection of well-defined URIs with a uniform interface, instead:
**It becomes a collection of known entry points into a universe of DIFFERING INTERFACES, few compatible with each other
*Legit overloaded POST:
**Work around lack of PUT and DELETE support
**Work around limitations on URI length
***POST http://name/resource?_method=GET and payload contains huge data set.
**Avoid methods in GET URIs: /blog/rebuild-index. This is not idempotent

=This Stuff Matters=
*Principles are not arbitrary
*Advantages: simpler, more interoperable, easier to combine than RPC
*JSG: They have, in fact, revolutionized the world by being so simple and constrained to allow loosely coupled "links" from all over the planet to anywhere else.

=Why Addressability Matters=
*Every interesting noun, or concept, is immediately accessible through one operation on its URI: GET
*URIs provide:
** Unique structured name for each item (you own your own domain name so name.com/item/12345 is always unique)
** Allows bookmarking
** Allows URIs to pass to other apps as input
** Allows for mashups you never imagined (pipes.yahoo.com)
* URIs are like:
** Cell addresses in Excel
** File paths on disk
** JSG: Longitude and Lattitude coordinates
** JSG: XPath queries against XML
** JSG: SQL SELECT statements against relational tables

=Why Statelessness Matters=
*The king of simplifying assumptions!
*Each request contains all application state needed for server to complete request
**No application state on server
**No application state implied by previous request
**Each request evaluated in isolation
*Makes it trivial to scale application up
**Add a load balancer and there is no need for server affinity
*Can scale up until resource (database) access becomes the bottleneck
*JSG This is where COM Interop introduces database latency by forcing connections open longer than they need to be because over marshalling data across process boundaries
*Increases reliability (requests that timeout can simply be requested again)
*Keeping session state can harm scalability and reliability, so use it wisely
** JSG if using a cookie, the cookie can be used to reinstantiate the srver side state for the user no matter what server handles the request

=Why the Uniform Interface Matters=
*Provides a standard way of interaction
*Given http://www.example.com/myresource, you know:
**GET retrieves it
**POST can attempt to append it or place a subordinate resource beneath it
**DELETE can assert that it should be removed

=Why Connectedness Matters=
* Provides a standard way to navigate from link to link and state to state

=Resource Design=
*Need a resource for each "thing" in your service
**Apply to any data object or algorithm
*Three types of resources:
**Predefined one-off resources: static list, object, db table
**Large (maybe infinite) number of resources of individual items: db row
**Large (usually infinite) number of resources representing ouputs of an algorithm: db query, search results
*For difficult situations, the solution is almost always to expose another resource.
**May be more abstract, but that is OK

=Relationships between Resourcs=
*Alice and Bob get married, do you:
**PUT update to Alice and to Bob, or:
**POST new resource to the "marriage" factory resource?
**Answer: you should create a third resource that links to both Alice and Bob
*JSG this leaves you wondering how you navigate the other direction, but this is not any different at all from a relational database table that has a linking table.

=Asynchronous Operations=
*A single HTTP request itself is synchronous
*Not all requests finish and many take a long time
*Use the 202 status code "Accepted"
*Example: ask server to calculate huge result, sever returns:

202 Accepted
Location: http://jobs.example.com/queue/job11a4f9

*This URI identifies the job uniquely for the client to come back to later
*Client can GET the URI for status updates and DELETE it to cancel it or DELETE the results later
*This overcomes the asynchronous "limitation" by using a new resource URI
*Caveat: use POST when you will spawn a new resource asynchronously to avoid breaking idempotency if you were to use GET or PUT

=Batch Operations=
*Factory resources can accept a collection of representations and creates many in response
*Create a "job" in response with a URI for the client to check status, or:
*Use WebDAV extension for 207: multi-status; client needs to look in entity body for a list of codes

=Transactions=
*You can implement them as resources, just like batch operations can be
*Example: financial transaction

1. POST to a transaction factory to get a URI for your transcation (201 Created response)

Request:
POST /transactions/account-transfer

Response:
201 Created

Location: /transactions/account-transfer/11a5

2. PUT new balance for checking account to this URI

PUT /transactions/account-transfer/11a5/accounts/checking

balance=150

3. Then PUT the new value for the savings account:

PUT /transactions/account-transfer/11a5/accounts/savings

balance=250

3. Commit the transaction

PUT /transactions/account-transfer/11a5

committed=true

*The server should make sure the representations make sense (no deleted money, no newly minted money, etc)
*RESTful transactions are more complex to implement, but they have advantages of being addressable, transparent, archived and linked

=When in Doubt, Make it a Resource=
*Anything can be a resource
*Strive to maintain the Uniform Interface

=URI Design=
*URIs should be well-designed and meaningful
*URIs should be "hackable" to increase the "surface area"
*Make it so clients can bookmark just about anything to get right to it
**Don't make clients have to repeat dozens of manual steps to get back to a view of a resource
*Go for general to specific
** Example: /weblogs/myweblog/entries/100
*Use punctuation to separate multiple data inputs at the same level
*Use commas when order matters (eg long and lat)
*Use semi-colons when order doesn't matter: /color-blends/red;blue
*Use query variables only for algorithm inputs
*URIs denote resource, not operations: almost never appropriate to put method names in them
**/object/do-operation is a bad style

=Outgoing Representations=
Bookmark: Page 254

No comments: