Velocity Conference 2008
Last edited June 27, 2008
More by John B »
Velocity Conf 2008

My notes from the O'reilly Velocity Web Performance and Operations Conference
http://en.oreilly.com/velocity2008/public/content/home

June 23-24, 2008 at the San Francisco Airport Marriott in Burlingame, California

Day 1 - Green Data Centers

Bill Coleman.  Philanthropist and startup guru (3rd).  CEO of Cassatt Corporation

  • Cassatt
  • currently unsustainable data centers.  energy costs will double in next 5 yrs
  • the cloud
    • needs to become the web's platform
    • cloud 1.0
      • build proprietary tools on proprietary data
    • cloud 2.0
      • better mashups but still proprietary
    • cloud 3.0
      • PCs commoditize  everything below it
    • "active power mgmt"
      • "when you leave the room, turn off the light"
      • you beat complexity by automating it --- set some policies




Day 1 - Systems Launches KITE

Vik Chaudhary and Aberlardo Gonzalez

  • single perf testing env for web devs
  • real-time testing
  • interactively test perf from desktop, last mile, and cloud
  • can test from multiple locations
  • KITE is FREE
Day 1 - Jiffy: OS Perf Measurements

Scott Ruthfield, VP at whitepages.com

  •  you can't manage what you can't measure
  • gomez data, monitors performance over 24 hr period
  • end-to-end system for measuing and reporting on page load activity
  • components
    • js library
    • apache config to log measurements
    • database schema/rollup
    • ingestor to parse
    • firebug plugin (written by Bill Scott @ Netflix)
  • Concept: Mark & measure
    • mark - start timing from this point
    • measure - report elapse time
  • OS, Apache 2.0 License
  • http://code.whitepages.com
Day 1 - Keynote

Artur Bergman, wikia
  •  why do users of WoW accept downtime each week, but users of friendster/twitter don't?
  • need to set expectations ahead of time
  • business
    • revenue CPM - cost per pageview = gross margin
  • reduced cache misses from 300ms to 190ms, but cache hits take 5sec
    • most logged in users get cache misses, so loyal users creating content see slowest page loading times
  • ads slow down pages
    • overwrote document write
      • function enableWikiaWrtier(adSpaceId){ document.write = writeFake; document.writeln = writelnFake; }
    • we loose money but edits increase
Day 1 - Clouds are no substitute for competence

Jovier Soltero, CEO, Hyperic
  •  cloudstatus.com:8000
  • first step to provide service uptime and availability for AWS, Google appengine, etc.

Day 1 - Hotmail Tuning Best Practices

  • Perf Best Practices
  • identify your perf bottlenecks & critical paths
  • trim down your page weights up and downstream
  • move your contents closer to your customers
    • edge caching
    • edge computing
    • network routing optimization
  • Trim page wgts (downstream)
    • trim down your features to the core min
    • render most of content on server side
    • trim down image sizes by:
      • min usage
      • image clustering
      • reducing their color palettes
    • delay load, slow down, cap and monitor ads
    • use cache control, exp dates & eTags effectively
    • group your statis content into fewer bigger files
    • optimize between inline and stand-alone js and css
    • full postbacks vs. atomic updates using ajax
  • trim page wgt (up)
    • trim cookies by:
      • eliminating them by moving your static content to a different domain
      • optimizing their use
      • moving them away from your root domain and root path "/"
      • compression cookies (heads) not just bodies
      • grouping multiple smaller files into fewer bigger ones (image clustering)
      • trim down the # of requests and redirects (round trips)


Day 1 - Lessons Learned in Live Search Moving to and then Away from Ajax

Eric Schurman

  • Problem
    • HTML already loaded, but needs to wait for headers and all JS (with extra dependencies) 
  • Stage 1
    • use fewer GETs
    • use fewer bytes
    • fewer serialized actions
  • Stage 2
    • eliminated DNS lookup/TCP connect -- HOW?
  • Today
    • HTML, inlined CSS and JS
    • single image (sprite)
    • after page load, d/l external CSS and JS for upcoming page views
  • build process
    • minified ws, comments in js/css
  • measure the right things!
    • no cache scenario is really important
    • use network emulators pre-release (fiddler)

Day 1 - Mobile Browser Performance Factors

  • Blackberry sucks.  Low concurrent connections and only 21% gzip support
  • WinCE/Mobile is 3G, but doesn't fully utilize it.  Look for new 3G iPhone to push this at release in July
  • Tips
    • optimize js
    • reduce DOM elements
    • lazy load components
    • use GET unless you need POST
    • use JSON instead of XML (don't want to walk the XML tree!)
  • http://cloudfour.com/blog
Day 1 - High Perf Ajax

Julien Lecomte
  • plan for perf from day 1
  • less is more
    • don't do anything until it becomes absolutely necessary
  • work on improving perceived performance
    • users can deal with some reasonable amount of slowness if
      • UI remains reactive at all times
      • informed that operation is pending
    • cheat when you can by updating UI then do the work --
    • but what if op fails?
    • my solution: store request to try again.  if still fails then inform user.  problem still if user navigates off page.  then you can attach an onunload event that request hasn't completed.
  • profile code during dev
  • automate profiling/perf testing
  • keep historical records of how features perform
  • consider keeping some (small amount of) profiling code in production
Part 2
Part 3 - High Perf JS
  • look-up : the scope chain
    • never use with keyword
    • always declare var when possible.. avoid global variables and climbing scope chain
  • how to handle long running JS processes
  • Protoype 2 -- limits # of events you can attach to window.  has new event dispatcher.. find out more
  • consider using the onmousedown event instead of the onclick event
    • get a head start by making use of the small delay between the time a user presses the mouse button and the time s/he releases it
    • see xjs site
Part 4 - Misc tips
  • avoid
    • js for layout
    • IE expressions
    • IE filters
  • use sprites
 Part 6
  • Optimize user experience
    • update UI when request gets sent
    • lock UI/data structures with finest possible granularity
    • let user know that something is happening
    • let the user know why a UI object is locked
    • unlock UI
    • handle errors gracefully
  • be aware of max # of concurrent connections
  • multiplex ajax requests whenever possible, if backend supports it -- how?
  • favor json over xml
  • push, don't poll.  use COMET to send real-time notifications to the browser
  • consider using local storage to cache data locally, request a diff from the server
    • IE's userData
    • flash local storage
    • DOM:Storage (WhatWG persistent storage)
    • GoogleGears
    • etc.
Part 7 - Perf Tools
  • YSlow
  • Task manager
  • IE Leak detector aka Drip
  • stopwatch profiling
    • ajaxview
    • jslex
    • YUI profiler
  • Venkman or firebug profiler
Day 1 - Cadillac or Nascar? A Non-religious Investigation of Modern Web Technologies

Akara Sucharitakul (Sun Microsystems Inc.), Shanti Subramanyam (Sun Microsystems Inc.)

Introduction
  •  Client-side
    • AJAX/JSON
  • 3 implementations of a social events app
    • PHP
      • UnixODBC/PDO
      • PECL memcache
      • LocalFS/NFS/Distributed FS
    • Ruby on Rails
      • RoR Framework
      • memcache/ DFS (future)
    • Java EE
      • Servlets, JSPs
      • JPA
      • Whalin memcache
      • LocalFS/NFS/Distributed FS
  • Deployments
    • PHP: AMMP
    • Java EE: JAMM
    • RoR: Mongrel, memcache, MySQL
  • Workload
Performance Results
  • PHP
    • throughput scales linearly
    • network becomes a problem
    • doesn't utilize CPU as it scales
  • JavaEE
    • can scale with single process
    • low memory footprint
    • reduces DB load using cache
  • RoR
    • full func, no caching
    • thin is more efficient than mongrel
    • JRuby is 3-4x better than RUby 1.8.6
    • On Solaris, ruby in Cool Stack 1.3 gives 40% improvement
  • memcached - thread scaling
    • 4 threads is sweet spot
  • memcached client
    • no standard client libs
    • perf issues are more common on client side
    • PHP clients
      • PECL seems most std
      • many roll there own
    • Java Clients
      • Whalin issues: huge cpu (single-byte read), huge syscalls, overhead of socket-pooling
      • spy issues: single-threaded
  • MySQL 5.1 has huge perf improvement than 5.0
Suggestions
  • Apache/PHP tuning
    • network stack
      • tune TCP time-wait if handling lots of conn
    • apache
      • do not load modules that you do not need (in httpd.conf)
      • tune ListenBacklog(8192), ServerLimit(2048), MaxClients(2048)
    • php
      • turn off safe_mode if you don't need it
        • safe_mode = off
      • increase realpath_cache_size if you have lots of files
        • realpath_cache_size = 128K
      • use xcache (stability problems? ... participant in audience said he uses it in large scale and works great) or APC
  • Memcached tuning
    • network
      • ensure network processing is distributed across CPUs
      • bind memcached to CPUs not processing interrupts
    • run memcached 1.2.5 with 4 threads (default)
    • use in 64-bit mode for a large cache size
    • run in multi-threaded mode
  • MySQL tuning
    • joins over sub-queries
    • use limits
Conclusions
  • Networking
    • link aggregation solves some short term problems
    • large interrupt load, needs spreading across CPUs
    • 1Gbe bottleneck for web apps on modern systems
    • 10Gbe immature

Day 1 - Improving Netflix Performance

Bill Scott (Netflix)

  • have 2 week release cycle
  • interested in tracking trends
  • 10% have over 100 movies in queue
  • most have < 10
  • side note: they use bucket tests to release new features
  • improve perf
    • apache config
      • gzip, far future expires, etag
    • images
      • far future expires (harder than it seems)
      • sprites
    • JS/CSS
      • switch to YUI minifier
      • refactor, migrate to jquery
  • http://billwscott.com/jiffyext
Day 1 - LinkedIn Communications Architecture

Sean Dawson (LinkedIn), Ruslan Belkin (LinkedIn
  • "Don't call me, I'll call you"
  • push updates when an event occurs
  • when new event occurs an event is pushed to each of his/her friends
  • partition updates by type
  • scales horizontal, partition by member id
  • reading is much quicker, but incurs add. storage
  • lessons
    • underestimated vol of updates to be processed
runs out of time...slides went to fast.  :(
 
Day 1 - Performance Metrics

John Rauser (John Rauser), Peter Sevcik (NetForecast, Inc.), Eric Goldsmith (AOL, LLC), Eric Schurman (Microsoft), Vik Chaudhary (Keynote Systems, Inc.)
  • When optimizing perf you can't just look at the mean average
    • you have to look at the median, percentile, and outliers
  • If you optimize for the 90th percentile, but severely degrades 25th percentile you might want to think twice
  • How do you explain these numbers?
Day 2 - Eucalyptus: Elastic utility computing architecture

Rich Wolski (UCSB)
http://eucalyptus.cs.ucsb.edu/

Intro
  • Commercial cloud
  • How do we run scientific apps in the cloud?
    • current cloud not well suited
  • Open source clouds
    • Nimbus
      • client side cloud computing interface to Globus-enabled TeraPort cluster at U of C
      • based on GT4 and GLobus virtual workspace service
    • Enomalism
      • startup company
      • REST APIs
      • user "dashboard"
      • multi-virtualization support
Eucalyptus
  • Intro
    • Web services based implementation of elastic/cloud computing infrastructure
    • linux image hosting ala Amazon
    • interface compatible with EC2
      • works with command line tools from amazon w/o mod
      • enables leverage of emerging EC2 value-added service (e.g. Rightscale)
    • functions as a software overlay
    • "one button" install using Rocks
  • Not industrial strength
    • wants to be the common denominator
    • doesn't focus on the scalability
    • We need a layered approach.  Think plug and play.
  • What's it made out of?
    • axis2 and axis2c
    • hibernate
    • HSQLDB
    • (didn't catch the rest)
Day 2 - How to Accelerate Non-cacheable, Dynamic Sites Leveraging a Globally Distributed Platform

Harald Prokop (Akamai Technologies)
  • Basic CDN for Static sites
  • Dynamic sites
    • content is not cacheable so CDN adds no value -- actually only partly true
  • Problem 1: Route to datacenter may perform poorly
    • Sol: Akamai SureRoute to optimize route
  • Prob 2: Many round trips for initial large download
    • Sol: Akamai Communication Protocol
  • Prob 3: Many round trips for personalized or cold objects
    • Sol: Akamai intelligent prefetching
  • Cache on edge
    • All pieces of say BestBuy.com are cached except for 1 or 2 small pieces that are fetched from the edge
  • Acceleration from US
    • to US: 1.1 s saved
    • to UK/EU: 2.9 s
    • to APAC: 10.5 s
  • Speed matters
    • Under armour: doubled conversion rate
Day 2 - HTTP Profiling Tools

HTTPWatch
http://httpwatch.com/
  • displays HTTP traffic for IE
  • commercial program ($295 for single user license), but has free version
  • features
    • A plug-in HTTP Viewer for Internet Explorer 6 & 7.
    • See headers, cookies, caching and POST data
    • Detailed information about HTTPS, compression, redirection & chunked encoding
    • Real-time page and request level time charts
    • Your users and customers can send you log files for free
    • Millisecond accurate timings and network level data
    • Use it in automated tests written in C#, Ruby, Javascript, ...
  • blocked time is usually want you want to focus on
    • caused by connection limiting behavior
  • can show what comes from cache bc it's integrated in the browser

Fiddler
http://www.fiddler2.com/fiddler2/
  • freeware web debugger platform
  • runs as local proxy; registers as system proxy while capturing
  • IE, safari, opera use system proxy so you can see their traffic; req some work of changing proxy to see FF traffic
  • downsides of running as proxy
    • can't see requests that don't hit the network
    • but you see everything... lots of traffic
      • you can use 'filters' to see only certain types of traffic
  • fiddler for perf
    • measure request size, page wgt
    • analyze caching, compression, page composition
    • similate low-speed/high-latency connections
  • can do breakpoint debugging
    • you can modify outbound cookies before they are sent
  • traffic modification
    • redirect requests to a particular datacenter
    • simulate a downed server
  • traffic archives
    • preserve traffic and timings in compressed formats (.SAZ file)
    • great for writing tickets
  • Meddler
    • generates HTTP
  • Perf testing walkthrough

AOL PageTest


Firebug for Profiling
  • sorry no notes, already use this
Day 2 - Success: A Survival Guide

Adam Jacob (HJK Solutions), Shayan Zadeh [linkedin](Zoosk, Inc. ), Brian Moon (dealnews.com), Don MacAskill (SmugMug), John Allspaw (Flickr (Yahoo!)), Michael Halligan (BitPusher, LLC), Farhan Mashraqi [linkedin] (Fotolog)

http://en.oreilly.com/velocity2008/public/schedule/detail/4762


Best scalability story
  • Shayan:
    • launched Zoosk (online dating site) on fb last year before xmas
    • woke up on xmas and had 300k users...ahh!
    • viral product, so must keep running
  • Brian
    • tech blogger on yahoo picked up a deal and got a ton of traffic when it became homepage headline (brought 40k uniques/hr)
  • Don
    • Nadal won French open and everyone started posted photos on smugmug
    • during first win, didn't realize they could get 1000s of comments on a single photo, so didn't build comment system to scale
    • next year stats tracking crumbled site -- logging all people that were viewing the photos
    • 3rd win.. no meltdown!
  • Farhan
    • decided it was a good idea to partition by first letter of username
      • most ppl had names starting in "m" or "s"
What happens weeks after getting techcrunched?
  • Don
    • doesn't want datacenters anymore...ever
    • 600TB on S3
    • EC2
    • have 4 datacenters, building a 5th
    • "the more i can narrow my business to photo sharing the more i can help my customers"

What does it mean to have architecture to scale?
  • John
    • all hard problems involve databases
    • solid state drives, memcached, want to push things to a magical cloud
  • Farhan
    • people try to plan their growth and partition on that, but most either over develop or under estimate what growth would be
    • recomend: app should be unaware of partition strategy. app should ask where a user lives and go get it
    • if you have problems bc foreign key obj, look if db offers cluster indexes.  will cut down on disk seeks.
    • HyperDB
  • Brian
    • you start with a normalized db, but you need to denormalize when you start scaling
    • have triggers that hit processing nodes to figure out how to build a new row to add to db as denormalized data
  • Don
    • if you are a brand new startup don't worry about partitioning, sharding
    • its a balancing act
    • better to launch to all then have a private beta and hope those users are still interested in 6 months
    • good to think about how you could use sharding down the road
  • Shayan
    • we have MVC for front end, and we need and will have MVC for backend soon
Does EC2 become expensive?
  • Don
    • it will be cheaper to have your own servers if you leave all servers on
    • what you want to do is turn instances off when you have low usage
    • most web sites have usage graphs.  you can leverage that to automatically turn instances on/off when needed
    • transfer between EC2 and S3 is free

Side questions
  • Don
    • How do you determine what goes on EC2 and what is on your hardware?
      • EC2 does all photo and vid processing
      • S3 has all photo storage
      • Own datacenters take care of serving front end pages and have dbs that store users, etc.
Day 2 - Even faster web sites

Steve Souders (Google)

Went to other session, but really want to see this.  Will look at presentation notes after conf and make comments here...

  • 14 Rules
    1. Make fewer HTTP requests
    2. Use a CDN
    3. Add an Expires header
    4. Gzip components
    5. Put stylesheets at the top
    6. Put scripts at the bottom
    7. Avoid CSS expressions
    8. Make JS and CSS external
    9. Reduce DNS lookups
    10. Minify JS
    11. Avoid redirects
    12. Remove duplicate scripts
    13. Configure ETags
    14. Make AJAX cacheable
  • Additional Rules
    1. Split the initial payload
    2. Load scripts without blocking
    3. Don't scatter inline scripts
    4. Split dominant domains
    5. Make static content cookie-free
    6. Reduce cookie weight
    7. Minify CSS
    8. Optimize images
    9. Use iframes sparingly
    10. To www or not to www
  • split your JavaScript between what's needed to render the page and everything else
  • load "everything else" after the page is rendered
  • load scripts without blocking



Day 2 - Doloto: Speeding Up Web 2.0 Apps with Dynamic Code Loading

Was talking to Don (SmugMug) and missed this presentation.  Here's what it says on their web site...

Ben Livshits (Microsoft)

Problem:

If you are on the intranet, please use this page

Modern Web 2.0 applications, such as GMail, Live Maps, Facebook and many others, use a combination of Dynamic HTML, JavaScript and other Web browser technologies commonly referred as AJAX to push page generation and content manipulation to the client web browser. This improves the responsiveness of these network-bound applications, but the shift of application execution from a back-end server to the client also often dramatically increases the amount of code that must first be downloaded to the browser. This creates an unfortunate Catch-22: to create responsive distributed Web 2.0 applications developers move code to the client, but for an application to be responsive, the code must first be transferred there, which takes time.

Solution:

Doloto is a system that analyzes application workloads and automatically performs code splitting of existing large Web 2.0 applications. After being processed by Doloto, an application will initially transfer only the portion of code necessary for application initialization. The rest of the application's code is replaced by short stubs -- their actual function code is transferred lazily in the background or, at the latest, on-demand on first execution. Since code download is interleaved with application execution, users can start interacting with the Web application much sooner, without waiting for the code that implements extra, unused features.

Experimental Results:

To demonstrate the effectiveness of Doloto in practice, we have performed experiments on five large widely-used Web 2.0 applications. Doloto reduces the size of initial application code download by hundreds of kilobytes or as much as 50% of the original download size.


http://research.microsoft.com/projects/doloto/
http://research.microsoft.com/research/pubs/view.aspx?tr_id=1402
Day 2 - Image Optimization: How Many of These 7 Mistakes Are You Making

Stoyan Stefanov (Yahoo! Inc)

7 things you should be doing to your images:
Day 2 - Shared Dictionary Compression Over HTTP

Wei-Hsin Lee (Google, Inc. )
  •  open protocol to speed up google and the web
"Shared Dictionary Compression over HTTP protocol (SDCH) aims at reducing data redundancy across HTTP responses. The protocol is meant to work with current schemes (gzip, deflate) to further compress the HTTP responses. This protocol is different from original proposed rfc3229 (differential compression), as it does not require the browser to cache the last version of pages."

Day 2 - Building an Automated Infrastructure

Adam Jacob (HJK Solutions)

Went to other session, but really want to see this.  Will look at presentation notes after conf and make comments here...
Day 2 - Building Faster Pages in Firefox and Internet Explorer

Eric Lawrence (Microsoft), Mike Connor (Mozilla Corporation), Christian Stockwell (Microsoft Corporation)

Mike Connor
  • new tools
    • dtrace
    • shark
    • talos
  • new tricks
    • profile-guided optimization (PGO)
    • compiler option archaeology
Eric/Christian
  • XDomainRequest
    • cross domain communication w/o server-side proxy
  • Improved XMLHTTPRequest obj now with timeout attribute
  • Selectors API support
  • DataURI support
    • suitable for use inside cached CSS stylesheets
    • can be misused and not cached on their own
Day 2 - Scaling MySQL-powered Web Sites by Sharding and Replication

Peter Zaitsev (MySQL Performance Blog)

  • challenges
    • page generation layer
      • scale by adding servers
    • storage layer
      • add more hard drives/CDNs
    • database
      • hard to scale
  • DBs
    • mysql cluster
    • mysql proxy
    • kickfire
    • continuent/sequoia
    • bigtable
    • simpledb
  • Growth choices for MySQL
    • start with single instance
      • fast joins, ease of retrieval
    • becomes limited by CPU or disk IO
      • problems internal scaling (issues w/ too many CPU cores)
    • "scale up is limited and expensive"
      • especially for single thread perf
    • simple next choices
      • vertical partition
      • replication
  • Vertical partitioning
    • put forums db on a different MySQL server
      • light joins can be coded in app or use federated tables
      • these tend to grow too large
    • don't do this
  • Replication
    • most apps have read load
      • those most reads are served from memcache
    • using 1 or several slaves to assist with read load
    • replication is asynchronous
    • does not help to scale writes
      • slaves have lower write capacity than master bc slaves run write duplication on single thread
    • slaves caches typically highly duplicated
  • sharding
    • when vertical partition and replication can't help
    • "only" solution for large scale apps
    • can be hard to implement if not designed for...needs planning
    • how?
      • want most queries to be run on same shard
        • good: sharding blogs by user_id
        • bad: sharding by country_id (large portion can come from same country)
    • techniques
      • fixed hash sharding
        • even IDs on A, odd on B
      • data dictionary
        • user 25 has data on server D
        • dict can become bottleneck
      • mixed hashing
    • HiveDB
    • HSCALE
  • caching
    • important and should be done, but only postpones problems.  will eventually need to do sharding and replication

 
Day 2 - Reverse Proxying with Squid and Varnish


  • Edge cache
  • forced caching
    • 1s --> 15s --> 30s (now we cache at 30s and still no one complained..thats good)
  • we need to increase cache hits!
  • more hits
    • ignore if-modified-since and purge
      • explicit purges
      • mediawiki does multicast HTCP
    • accept-encoding
  • mediawiki has cache policy in code
  • doesn't like squid
  • loves varnish
    • a bit unstable
    • segfaults under load (running trunk)
The content on this page is provided by a Google Notebook user, and Google assumes no responsibility for this content.