Weblogs: Javascript
Breaking the Web with hash-bangs
Tuesday, February 08, 2011Update 10 Feb 2011: Tim Bray has written a much shorter, clearer and less technical explanation of the broken use of hash-bangs URLs. I thoroughly recommend reading and referencing it.
Update 11 Feb 2011: Another very insightful (and balanced) response, this from Ben Ward (Hash, Bang, Wallop.) , great job in separating the wheat from the chaff.
Lifehacker, along with every other Gawker property, experienced a lengthy site-outage on Monday over a misbehaving piece of JavaScript. Gawker sites were reduced to being an empty homepage layout with zero content, functionality, ads, or even legal disclaimer wording. Every visitor coming through via Google bounced right back out, because all the content was missing.
JavaScript dependent URLs
Gawker, like Twitter before it, built their new site to be totally dependent on JavaScript, even down to the page URLs. The JavaScript failed to load, so no content appeared, and every URL on the page was broken. In terms of site brittleness, Gawker’s new implementation got turned up to 11.
Every URL on Lifehacker is now looks like this
http://lifehacker.com/#!5753509/hello-world-this-is-the-new-lifehacker
. Before Monday the URL was almost the same, but without the#!
. So what?Fragment identifiers
The
#
is a special character in a URL, it marks the rest of the URL as a fragment identifier, so everything after it refers to an HTML element id, or a named anchor in the current page. The current page here being the LifeHacker homepage.So Sunday Lifehacker was a 1 million page site, today it's a one page site with 1 million fragment identifiers.
Why? I don't know. Twitter's response when faced with this question on launching "New Twitter" is that Google can index individual tweets. True, but they could do that in the previous proper URL structure before too, with much less overhead.
A solution to a problem
The
#!
-baked URL (hash-bang) syntax first came into the general web developer spotlight when Google announced a method web developers could use to allow Google to crawl Ajax-dependent websites.Back then best practice web development wasn’t well known or appreciated, and sites using fancy technology like Ajax to bring in content found themselves not well listed or ranked for relevant keywords because Googlebot couldn’t find their content they’d hidden behind JavaScript calls.
Although Google spent many laborious hours trying to crack this problem, they eventually admitted defeat and tackled the problem in a different manner. Instead of trying to find this mythical content, lets get website owners to tell us where the content actually is, and they produced a specification aimed at doing just that.
In writing about it, Google were careful to stress that web developers should develop sites with progressive enhancement and not rely on JavaScript for its content, noting:
If you’re starting from scratch, one good approach is to build your site’s structure and navigation using only HTML. Then, once you have the site’s pages, links, and content in place, you can spice up the appearance and interface with Ajax. Googlebot will be happy looking at the HTML, while users with modern browsers can enjoy your Ajax bonuses.
So the
#!
URL syntax was especially geared for sites that got the fundamental web development best practices horribly wrong, and gave them a lifeline to getting their content seen by Googlebot.And today, this emergency rescue package seems to be regarded as the One True Way of web development by engineers from Facebook, Twitter, and now Lifehacker.
Clean URLs
In Google’s specification, they call the
#!
-patterned URLs as pretty URLs, and they are transformed by Googlebot (and other crawlers supporting Google’s lifeline specification) into something more grotesque.On Sunday, Lifehacker’s URL scheme looked like this:
http://lifehacker.com/5753509/hello-world-this-is-the-new-lifehacker
Not bad. The 7-digit number in the middle is the only unclean thing about this URL, and Gawker’s content system needs that as a unique identifier to map to the actual article. So it’s a mostly clean URL.
Today, the same piece of content is now addressable via this URL:
http://lifehacker.com/#!5753509/hello-world-this-is-the-new-lifehacker
This is less clean than before, the addition of the
#!
fundamentally changes the structure of the URL:
- The path
/5753509/hello-world-this-is-the-new-lifehacker
becomes/
- A new fragment identifier of
!5753509/hello-world-this-is-the-new-lifehacker
gets addedWhat does this achieve? Nothing. And the URL mangling doesn’t end there.
Google’s specification says that it will transform the hash-bang URL into a query string parameter, so the example URL above becomes:
http://lifehacker.com/?_escaped_fragment_=5753509/hello-world-this-is-the-new-lifehacker
That uglier URL actually returns the content of the article. So this is the canonical reference to this piece of content. This is the content that Google indexes. (This is also the same with Twitter’s hash-bang URLs.)
This URL scheme looks a lot like:
http://example.com/default.asp?page=about_us
Lifehacker/Gawker have thrown away a decade’s worth of clean URL experience, and ended up with something that actually looks worse than the typical templated Classic ASP site. (How more Frontpage can you get?)
Clean? Not on your life!
What’s the problem?
The main problem is that LifeHacker URLs now don’t map to actual content. Well, every URL references the LifeHacker homepage. If you are lucky enough to have the JavaScript running successfully, the homepage then triggers off several Ajax requests to render the page, hopefully with the desired content showing up at some point.
Far more complicated than a simple URL, far more error prone, and far brittler.
So, requesting the URL assigned to a piece of content doesn’t result in the requestor receiving that content. It’s broken by design. LifeHacker is deliberately preventing crawlers from following links on the site towards interesting content. Unless you jump through a hoop invented by Google.
Why is this hoop there?
The why of hash-bang
So why use a hash-bang if it’s an artificial URL, and a URL that needs to be reformatted before it points to a proper URL that actually returns content?
Out of all the reasons, the strongest one is “Because it’s cool”. I said strongest not strong.
Engineers will mutter something about preserving state within an Ajax application. And frankly, that’s a ridiculous reason for breaking URLs like that. The URL of an
href
can still be a proper addressable reference to content. You are already using JavaScript, so you can do this damage much later with JavaScript using a click handler on the link. The transform between last week’s LifeHacker URL scheme, and this week’s hash-bang mangling is trivial to do in JavaScript using a click handler.At the risk of invoking the wrath of Jamie Zawinski, LifeHacker can keep its mostly clean URL of last week (
http://lifehacker.com/5753509/hello-world-this-is-the-new-lifehacker
) and obtain the mangled version by this regular expression:
var mangledUrl = this.href.replace(/(d+)/, "#!$1");
Doing this mangling in JavaScript (during the click handler of the link) means you keep your apparent state benefits, but without needlessly preventing crawlers from traversing your site, and any other non-JavaScript eventuality.
Disallow all bots (except Googlebot)
All non-browser user-agents (crawlers, aggregators, spiders, indexers) that completely support both HTTP/1.1 and the URL specification (RFC 2396, for example) cannot crawl any Lifehacker or Gawker content. Except Googlebot.
This has ramifications that need to be considered:
- Caching is now broken, since intermediary servers have no canonical representation of content, they are unable to cache content. This results in Lifehacker perceived as being slower. It means Gawker don’t save bandwidth costs by any edge caching of chunks of content, and they are on their own in dealing with spikes of traffic.
- HTTP/1.1 and RFC-2396 compliant crawlers now cannot see anything but an empty homepage shell. This has knock-on effects on the applications and services built on such crawlers and indexers.
- The potential use of Microformats (and upper-case Semantic Web tools) has now dropped substantially - only browser-based aggregators or Google-led aggregators will see any Microformatted data. This removes Lifehacker and other Gawker sites from being used as datasources in Hackdays (rather ironic, really).
- Facebook Like widgets that use page identifiers now need extra work to allow articles to be liked. (by default, since the homepage is the only page referenceable by a non-mangled URL, and all mangled URLs resolve down to being the homepage)
Being dependent on perfect JavaScript
If content cannot be retrieved from a server given its URL, then that site is broken. Gawker have deliberately made the decision to break these URLs. They’ve left their site availability open to all sorts of JavaScript-related errors:
- JavaScript fails to load led to a 5 hour outage on all Gawker media properties on Monday. (Yes, Sproutcore and Cappucino fans, empty divs are not an appropriate fallback.)
- A trailing comma in an array or object literal will cause a JavaScript error in Internet Explorer - for Gawker, this will translate into a complete site-outage for IE users
- A debugging console.log line accidentally left in the source will cause Gawker’s site to fail when the visitor’s browser doesn’t have the developer tools installed and enabled (Firefox, Safari, Internet Explorer)
- Adverts regularly trip up with errors. So Gawker’s site availability is completely within the hands of advert JavaScript. Experienced web-developers know that Javascript from advertisers are the worst lumps of code out there on the web.
Such brittleness for no real reason or a benefit that outweighs the downside. There are far better methods than what Gawker adopted, even HTML5’s History API (with appropriate polyfillers) would be a better solution.
(If you thought that invalid XHTML delivered with the correct mimetype was not fit for the web, this JavaScript mangled-URLs approach is far worse)
An Architectural Nightmare
Gina Trapani tweets: Lay down your pitchforks and give @Lifehacker’s redesign a week before you swear it off and insist that the staff doesn’t care about you. A week won’t solve Gawker’s architectural nightmare.
Gawker/Lifehacker have violated the principle of progressive enhancement, and they paid for it immediately with an extended outage on day one of their new site launch. Every JavaScript hiccup will cause an outage, and directly affect Gawker’s revenue stream and the trust of their audience.
Updates (9th February 2011)
Wow. I (and my VPS) am overwhelmed by the conversation this post has sparked. Thank you for contributing towards a constructive discussion. Some of the posts that caught my eye today:
All of the features that hash-bangs are providing can be done today in a safer, more web-friendly way with HTML5's pushState from the History API. (thanks Kerin Cosford & Dan Sanderson)
The Next Web reports that Gawker blogs have disappeared from Google News searches. A Gawker media editor is quoted that they hope to have it resolved soon. They are listed again but using the
_escaped_fragment_
form of the URL. So much for clean URLs. Though, the link seems intermittently broken claiming the URL requested is not available (with a redirect tohttp://gawker.com/#ERR404
).I did like this tl;dr summary of this post over on theawl.com by mrmcd.
Webmonkey have a summary story, but link off to some very handy resources for clean URL strategies. (I first learnt HTML from Webmonkey back in the previous century)
Phillip Tellis, one of the handful of Yahoo's I regret not meeting blogs some Thoughts on performance, well worth reading. Also highly recommended is warpspire's URL Design.
Danny Thorpe talks about Side effects of hash-bang URLs, including URL Cache equivalence. Oliver Nightingale has a nicely worked example using HTML5's pushState in a progressively enhanced way (great job!)
The very short geeky summary of this post (try curling a Lifehacker article canonical URL):
$ curl http://lifehacker.com//hello-world- \ this-is-the-new-lifehacker | grep "Hello" $or as Ben Ward put it: If site content doesn’t load through
curl
it's broken.Broken HTTP Referrers
Watching my logfiles I'm seeing a number of inbound links to this post from gawker.com and kokatu.com - from the homepage (i.e. the fragment identifier is stripped out). So somewhere on those sites there's a discussion going on about my post, and there's no way of finding it thanks to Gawker's use of hash-bang URLs.
@mrjyn
February 19, 2011
Hash-bangs: 10% of me that understands this is pissed off!
Singer YouTube KidsPrank Prison
![]()
Singer Faces 20 Years In Prison for YouTube Prank on Kids
Adrian Chen —
21-year-old Michigan resident Evan Emory currently faces 20 years in prison for "manufacturing child sexual abusive material". His crime: He posted a YouTube video that made it appear he was singing an explicit song to a classroom of elementary students.
Emory tricked administrators at Beechnau Elementary School into letting him perform a song for the kids on video, claiming he wanted to build his portfolio. He sung an innocent song in front of the kids, but when the room was empty recorded a sexually explicit song. ("I like the way you make your body move. C'mon, girl...See how long it takes to make your panties mine...I'll add some foreplay in just to make it fun. I want to stick my index finger in your anus.")
Through trick editing, Emory made it appear that he had been singing the song to the kids while they smiled and laughed along. He included a disclaimer—"No children were exposed to the 'graphic content' of this video"—and posted it on YouTube earlier this week.
On Wednesday, Emory was arrested on charges of manufacturing "child sexual abusive material". Said the county prosecutor:
"The bottom line in this case is that he walked into a classroom and took advantage and victimized every single child in that classroom," Tague said.
"This case is very disturbing to law enforcement officials. We have launched a full-fledged investigation with the sheriff."
At his arraignment, outraged parents of the kids in the video appeared at the courthouse to rally for jail time.
We can understand why the parents and school would be upset. But these are clearly laws designed to punish hardcore sex offenders—not some bro who came up with a misguided idea for a prank. In the end, the video appears to have been online for about a day or two and was probably seen by a few hundred people at most. This is a very broad definition of "victimization!" One law professor says the charges are likely unconstitutional.
As Radly Balko points out, the hysteria is fueled by the volatile combination of children + sex + The Internet. Add to that an overreaction by a humiliated school district. Here's hoping the judge realizes this, too.
Note: The embedded video is another one of Emory's pranks—not the video in question
Singer Faces 20 Years In Prison for YouTube Prank on Kids Adrian Chen — 21-year-old Michigan resident Evan Emory currently faces 20 years in prison for "manufacturing child sexual abusive material". His crime: He posted a YouTube video that made it appear he was singing an explicit song to a classro ...... Read MORE » on Dogmeat
HTML5 Periodic Table
HTML5 Elements
The table below shows the 104 elements currently in the HTML5 working draft and two proposed elements (marked with an asterisk).
How are they used?
Periodic Table of the Elements
Elements for html5advent.com
1html col table 1head 79span fieldset form 1body 25h1<section>
Contains of elements grouped by theme, for example a chapter or tab box.
25section colgroup tr 1title 216a pre meter select<aside>
Content related to surrounding elements that doesn't belong inline, such as a advertising or quotes.
aside 25h2 1header caption td 6meta rt dfn em i 24small ins hr 2br 86div blockquote legend optgroup address 21h3 nav menu th base<rp>
Contains semantically meaningless markup for browsers that don't understand ruby annotations.
rp abbr time b 48strong del s 87p ol dl label option datalist 3h4 1article command tbody 6link noscript q var sub mark kbd<wbr>
Opportunity for a line break.
wbr figcaption 12ul dt input output keygen h5 1footer summary thead style 6script cite samp sup<ruby>
Contains text with annotations, such as pronounciation hints. Commonly used in East Asian text.
ruby bdo code<figure>
Contains elements related to single concept, such as an illustration or code example.
figure 72li dd textarea button progress h6<hgroup>
Collection of headings for the current section. The highest ranked heading repesents the group in the document outline.
22hgroup details tfoot 61img area map embed object param source iframe canvas<track>
Specifies external timing track for media elements.
This element is still being drafted.
track* audio video<device>
Allows scripts to access devices such as a webcam.
This element is still being drafted.
device*
Root element
Text-level semantics
Forms
Tabular data
Metadata and scripting
Grouping content
Document sections
Interactive elements
Embedding content
HTML5 Elements The table below shows the 104 elements currently in the HTML5 working draft and two proposed elements (marked with an asterisk). Share this How are they used? Some suggestions: reddit.com news.ycombinator.com youtube.com google.com yahoo.com wired.com bbc.co.uk en.wikipedia.org w3.org ...... Read MORE » on Dogmeat
Accounts similar to @mrjyn
Accounts similar to @mrjyn:
Similar to @mrjyn Accounts similar to @mrjyn: Following Unfollow cnet CNET CNET is the premier destination for tech product reviews, news, price comparisons, free software downloads, daily videos, and podcasts. Followed by you! Following Unfollow TeenVogue Teen Vogue Fashion starts here. Followed by ...... Read MORE » on Dogmeat
Bidding Farewell to National Inventor's Month
September 2, 2010
Bidding Farewell to National Inventor’s Month
Thomas Edison's Light Bulb, 1880. Gift of the Department of Engineering, Princeton University, 1961. Photo courtesy of NMAH.
Sadly, summer is whizzing by. August has come and gone, and we have yet to acknowledge National Inventors Month! So happy belated! We bring you our the Around the Mall Blog team’s “Top Ten Inventions from the National Museum of American History’s Collections.” The museum, after all, is home to the Lemelson Center for the Study of Invention and Innovation, which celebrates National Inventors Month every year.
THE CLASSICS
1. Thomas Edison’s Incandescent Light Bulb
“The Wizard of Menlo Park” has many inventions to his credit—an electric vote recorder, the phonograph, a telephone transmitter—but his most famous was the light bulb. He scribbled more than 40,000 pages full of notes and tested more than 1,600 materials, everything from hairs from man’s beard to coconut fiber, in his attempts to find the perfect filament. In 1879, he finally landed on carbonized bamboo and created the first modern-looking light bulb—filament, glass bulb, screw base and all. The light bulb was manufactured by Corning, a leader in glass and ceramics for the last 159 years.
2. Alexander Graham Bell’s Large Box Telephone
In its collection, the NMAH has one of two telephones Alexander Graham Bell used to conduct a call from Boston to Salem on November 26, 1876. The system, which worked when sound waves induced a current in electromagnets that was conducted over wires to another telephone where the current produced audible air vibrations, was used commercially starting in 1877.
3. Abraham Lincoln’s Patent Model for a Device for Raising Boats off Sand Bars
As a 40-year-old lawyer in Illinois, Abraham Lincoln designed floats that could be employed alongside a river boat to help it avoid getting caught in shallow waters. He was granted a patent from the U.S. Patent Office on May 22, 1849. The product never came to fruition, but Lincoln remains the only U.S. president to hold a patent.
4. Sewing Machine Patent Model
Though not the first sewing machine, John Bachelder’s version, patented on May 8, 1849, was an improvement on the original. It was rigged with a leather conveyor belt that kept the fabric moving as it was being sewn. The patent was purchased by sewing machine giant I. M. Singer and became part of a pool of patents used to barter the Sewing Machine Combination, a team of three sewing machine manufacturers including the I. M. Singer Co. that propelled the industry forward.
5. Morse Daguerrotype Camera
Perhaps the first camera in the United States, this one made the trip from Paris with its owner Samuel F. B. Morse, inventor of the telegraph. Morse and French artist Louis Daguerre, who invented the daguerreotype process for photography, brainstormed invention ideas together.
(AND SOME SURPRISES…)
6. Magnavox Odyssey Video Game Unit
Months before Pong, the ping-pong game by Atari, overtook the video game scene in 1972, Magnavox Odyssey, the first home video game system, was released. The system merged traditional board games with the new video game concept by incorporating things like dice, paper money and cards. (Watch inventors Ralph Baer and Bill Harrison play a video game here, at the Smithsonian Lemelson Center’s 2009 National Inventors Month celebration.) Success, however, wasn’t in the cards. Less than 200,000 units were sold, while Pong sales skyrocketed. Baer went on to invent Simon, the electronic memory game.
The National Museum of American History has, in its collection, an AbioCor Total Artificial Heart, the first-ever electro-hydraulic heart to be implanted in a human. Photo courtesy of NMAH.
7. The Rickenbacker Frying Pan, the First Electric Guitar
Musicians had been experimenting with using electricity to amplify the sound of string instruments for decades, but it was George Beauchamp and Adolph Rickenbacker who built the first commercial electric guitar around 1931. The electric guitar had its critics, who argued that it didn’t create an “authentic” musical sound, but it found its place with the rock and roll genre.
8. AbioCor Total Artificial Heart
Cardiac surgeons Laman Gray and Robert Dowling replaced patient Robert Tools diseased heart with an AbioCor Total Artificial Heart on July 2, 2001, at Jewish Hospital in Louisville, Kentucky, making it the first electro-hydraulic heart implanted in a human. The battery-powered heart is capable of pumping more than 2.5 gallons of blood a minute to the lungs and the rest of the body. The invention was in clinical trials at the time of Tools’ surgery. He only lived for five months with the artificial heart, but even that, was well beyond the experimental goal of 60 days.
9. Krispy Automatic Ring-King Junior Doughnut Machine
Used by the Krispy Kreme Doughnut Corporation in the 1950s and ’60s, the Ring-King Junior could spit out about 720 doughnuts an hour! The miraculous machine and other Krispy Kreme artifacts were donated to the museum in 1997 on the 60th anniversary of the doughnut maker.
10. And last but not least, The World’s First Frozen Margarita Machine
As we savor the last days of summer, this one had to make the list. In 2005, the museum acquired the first-ever frozen margarita machine, invented by Dallas restaurateur Mariano Martinez in 1971. Museum director Brent Glass called the invention a “classic example of the American entrepreneurial spirit.” With the advent of the machine, margaritas became as standard as chips and salsa at Tex-Mex restaurants. (Next time I have one, I shall toast Mariano!)
What’s your favorite invention represented in the museum’s collections?
Update: This post has been updated to clarify that this list reflects the editorial whims of the Around the Mall blog team and is not an official ranking created by the National Museum of American History.
Posted By: Megan Gambino — American History Museum
September 2, 2010 Bidding Farewell to National Inventor’s Month Thomas Edison's Light Bulb, 1880. Gift of the Department of Engineering, Princeton University, 1961. Photo courtesy of NMAH. Sadly, summer is whizzing by. August has come and gone, and we have yet to acknowledge National Inventors Month ...... Read MORE » on Dogmeat
The music dies for once popular 'Guitar Hero' video game - CNN.com
February 9, 2011 8:36 p.m. EST"Guitar Hero" was once credited with reviving teen's interest in rock. Now the video game will be discontinued.STORY HIGHLIGHTS
- "Guitar Hero" will be discontinued this year, its publisher says
- The video game maker cites "continued declines" in rock
RELATED TOPICSLos Angeles (CNN) -- The video game "Guitar Hero," once believed to be helping revive rock a few years ago in the hip-hop era, will jam no longer.
Activision Blizzard Inc. announced Wednesday it will cease publishing the game this year.
"Due to continued declines in the music genre, the company will disband Activision Publishing's 'Guitar Hero' business unit and discontinue development on its 'Guitar Hero' game for 2011," the company said in a statement.
The decision was "based on the desire to focus on the greatest opportunities that the company currently has to create the world's best interactive entertainment experiences," the firm said.
The cancellation came as the Santa Monica, California-based company announced a record operating cash flow of $1.4 billion in 2010. Its revenues last year from digital channels grew more than 20 percent to $1.5 billion.
The once-popular video game was even featured in the Vince Vaughn film "Couples Retreat" in 2009.
As rock struggled against rap music, video games like "Guitar Hero" and "Rock Band" were credited with creating a new appreciation for rock 'n' roll among the millennial generation born in the '90s who didn't know much about Aerosmith, Pat Benatar and other musicians of the '70s and '80s.
At the time, Geoff Mayfield, senior analyst and director of charts for Billboard magazine, said he saw a direct cause and effect for some of the artists who licensed their songs to "Guitar Hero."
"A few weeks ago, when the game featuring Aerosmith ('Guitar Hero: Aerosmith') came out, there was more than a 40 percent increase in their catalog sales," Mayfield said in 2008.
The video games even increased interest in guitars, according to the nationwide Guitar Center chain. Bars held "Guitar Hero" nights. And schools like Roosevelt High in Los Angeles, where most teens have grown up on a steady diet of hip-hop and R&B, sponsored three-day "Guitar Hero Face-Off" in its auditorium, blaring heavy metal.
For now, the art of playing air guitar -- and "Guitar Hero" -- seems a quaint bygone.