Last November there was a very interesting article about new tools
and technologies that will drive Web 3.0 by John Markoff of the New York Times. Apart from being of general interest,
this matters because there are claims that within these new technologies there is a “Google Killing” application. Initial reaction across blogs was fairly negative with an eclectic but interesting set of observations.

So, rather than resort to a quick and merciless blog entry, I thought it more interesting to dig deeper and get past the “label veneer.” This ended up being an odyssey of link- following, and generally learning more about some obscure and complicated new technologies ranging from OWL to ONTOLOGY to SPARQL. In summary, they are technologies that provide a foundation for the categorization, interaction, and delivery of information available via the web.

CONTEXT
THE 3R Hypothesis, A GENUINE UPGRADE?
CHALLENGES
Web 3.0 - JUST NOT THIS ONE!
EXISTING TECHNOLOGIES
RECOMMENDATIONS
WRAP UP
DEFINITIONS

CONTEXT

There is currently a lot of discussion about the wave of open source computing, peering, and how the world is “flat”. Arguably, the search engine space is a major arbiter of “flatness.” Contrary to this “flatness” theory however is the fact that there is an exponential growth of data making the process of finding the right answers problematic. When everyone contributes – that’s a lot data!

The semantic web, or Web 3.0, by definition suggests the World Wide Web is a jungle that needs to be tamed and sorted into a more organized garden. A better analogy may be that the web provides instantaneous
access to anything in the world; but, as with mining for diamonds, if you don’t have the tools or the right location you won’t find what you’re looking for.

To combat this issue, there are a number of start-up companies with very decent funding that are hyping Web 3.0 technologies. Their emphasis is on W3C standards that will potentially make content on the web more “meaningful”. Meaningful in this sense means that the words and content on any particular
site are tagged and linked to a contextual model (ontology) that determines their relationship and meaning within a particular “class.”

FOAF (Friend of a Friend) is an example of a popular new way to represent information about people and information. It’s an open source project that
uses RDF/OWL tools to enable better identification and linkages between people. This is a well explained and practical example of the thinking in the Web 3.0 space. Because FOAF is by its nature structured data- using the right query technique will yield the exact result – so it’s closer to a database style query. So if web site developers adopt FOAF tools, every time a person is referenced then they would be adopting a standard that structured queries can readily access.

So when such sites are interrogated by Artificial Intelligence based search engines or just plain old “machines” - powerful new services can be created. The proponents of this next big thing (NBT) go so far as to suggest that artificial intelligence engines will then be able to start learning at an exponential rate because they will have a structured web as their playing ground.

Nova Spivack a web entrepreneur and leading light of the Web 3.0 movement speaks eloquently of this utopian learning model and its implications. His background includes working for Raymond Kurzweil, whose idea of a Technological Singularity is well worth a diversionary read. The key theory behind this - the Law of Accelerating Changes (an extension to Moore’s Law) is very thought provoking (I recommend you check it out).The common thread between Nova Spivack’s thinking could be that when a system can learn by itself from the WWW knowledge base this becomes one of the fundamental paradigm shifts along the road to the major singularity that Kurzweil discusses. However my feeling is that this may well suffer from being MTBT (Much Too Big a Thing) – for anyone to really buy into.

THE 3R Hypothesis, A GENUINE UPGRADE?

Thus we have the following landscape evolving – it’s fairly clear that Web 1.0 was mostly about one-way content publishing; the “read” web. Web 2.0 is now mostly universally agreed as being a phase of offerings that allow participative content/expression sharing; the “read and write” web.

Web 3.0, or the semantic web, seems to be about the third “R”, a form of grown up reasoning capability emanating from the integration of smart questioning machines and well formatted data. Note that I use the term reasoning as opposed to meaning because the evidence to date suggests that interpretation of content by artificial applications currently IS a form of reasoning; ie an interpretation of what is being scanned/indexed/viewed.

We could call this the 3R Hypothesis - a kind of “grown up” or more matured web. It should be recognized that there is a great deal of very interesting work going on in this space as well as venture capital funding. Some of the companies are quite secretive – examples include Radar Networks, ZoomInfo and Metaweb Technologies.

CHALLENGES WITH THIS HYPOTHESIS

Even so, for the 3R Hypothesis to come to pass, there are a number of dependencies and problems that must be overcome.

The wild web will need to play “ball” with proscribed standards. Prescriptive implementation is required. Most blog critics around this topic believe that the web really is wild and that common content/tagging standards will never come to pass (ie RDF and SPARQL like adoption). I agree with this, but since a lot of the current work is focused on text interpretation, this is a potentially spurious argument. If you can read – why worry about tags?Data will need to be available and accurate enough to enable the new technologies to answer the questions it’s asked to answer. Open access will be required. Today most sites want to be found by search engines – arguably this is not going to be true in the future. They may want to be found but keep their true content to themselves to avoid aggregators stealing their advertising $. As will be argued later, everyone is into peer technologies today – but that may change – tomorrow may be different.

AI software must be capable of delivering the goods. The web is no different than systems in general and unfortunately systems actually aren’t all that smart. The best AI applications are very niche in nature (chess –risk - weather) and even then they struggle to solve complex problems. This is compounded by the data itself. Who decides what content is true and accurate - in a world where content is created by anyone with any purpose? There is a long way to go before accuracy of a fact can be determined by the number of instances of a statement on different web pages.

Other innovative drivers won’t drown out this vision of the future. It’s interesting to think about what “it is that is being searched. A novel perspective on this could be a world where anything of importance has an IP address (because network connectivity is ubiquitous and power is always on). Network presence or physical location becomes more tangible than data as everything becomes network aware. Entities that have a network address will store, selectively publish and validate information about themselves.

A likely example would be automobiles that reveal their location and status to other systems. FOAF is in a way a forerunner of this concept. In the future state one has to believe that entirely new modes of search, discovery and meaning are developed around more tangible entities. The web page is merely the gateway to internal more secure systems for physical entities.

Web 3.0 WILL COME – JUST NOT THIS ONE!

Based on this line of reasoning it’s very unclear that the 3R version of the Web is the next major milestone. This doesn’t mean that it isn’t valuable thinking through the implications, nor examining what the evolution of the web might mean in practical terms for businesses and individuals. We shouldn’t, in fact, overlook the obvious – ie that search is still the hottest category on line.

EXISTING TECHNOLOGIES, GOOD
OL’ SEARCH IS STILL IMPORTANT

Regardless of the next phase of the web, search engines still play a dramatic role in our digital activities. According to Comscore Media Metrix (FYI this is a fantastic service for understanding real traffic patterns on line).

• 94% of all on line Canadians in December 2006 went to a search engine.
• 84% of Canadians visited Google.
• CMM shows some 294 search engine sites that register in their research.

This is a lot of competition. Note that they all have their own indexes, databases and value propositions. It’s important, therefore, not to miss the forest for the trees. Rather than looking to the Next Big Thing (NBT), look to Darwin and the survival of the fittest. Web businesses have succeeded first and foremost by capturing visitors. Watching who wins this battle may be the best way to spot early paradigm shifts in technology.

RECOMMENDATIONS

A number of critical points are therefore worth considering for channel owners or serious web site operators.

The numbers don’t lie. SEO should be driving
at least 50% of your thinking around traffic generation and channel
growth. Even more importantly Search Engines have the data on what
people want to find, what they know and what they buy and thus may
become the ultimate research companies in the future (see for example
Google Trends).

Google has nowhere else to go. It has saturated the search market in terms of traffic. Look at other search engines who are all working on their version of the NBT. Yes Google has a huge share, but 30% of the online population do visit other SE’s.

Google’s other problem is that more accurate search punishes its bread and butter business – laser sharp results would mean less click through fees from advertisements. So get to know the others better.

Text and content probably will start to matter. The 3R style web and aggressive search engines could start to adopt agents that understand sentences with “subject – verb / object” combinations. Being descriptive matters. It might help people better understand your value proposition anyway!

Consider more carefully what to publish in front of the firewall. If your model relies on traffic to your site others may monetize your content before you, using better Search technology. Worse, you may find yourself in poor company as part of typical search results. Aggregators are getting very clever. See Yahoo’s latest technical tour de force, Pipes (this is pretty cool FYI).

Consider RIA (Rich Internet Applications). The newer more powerful development languages contain more structured tagging for content which makes them inherently better suited for SEO (think RDF/OWL – see acronym table). Not only will you be delivering more interactive powerful on line experiences, the content will be made more relevant to SE’s.

WRAP

It is inevitable that the web as it grows older will go through major gyrations of what is “hot,” ie the NBT syndrome. The 3R stage is probably overly simplistic. It’s easy to see how some could think that all the world’s knowledge is simply a hyperlink away and thus attainable for commercial or intellectual advantage. But in fact this could be like the space program – well intentioned with many beneficial side technologies developed along the way.

It’s clear that already social participation and rich media (YouTube) have created another dimension of content that needs to be searched. It is quite likely that new rich media and, as earlier discussed, physical entities themselves will be invented or added that create yet another indexing challenge. Thus content types may change faster than search engine technologies can be developed to sort through them.

Consider something else –what if today open source and “peering” is in essence really a powerful meme (think social trend). Blogging is a huge trend in self expression – but how long will it last? The web has been described by some as the ultimate “meme vector.” It is also a meme accelerator. If it’s a good idea it really gets around fast. This means that the current “participation” meme will change or evolve as with all trends. What if tomorrow communities close up shop and withdraw support for “openness” and a new tightly knit meme takes over?

Broad based search models could be in for difficult times; and the Tipping Point could in fact tip the other way. The point about really keeping a close eye on the numbers and where WWW traffic is going becomes critical in this context; it will tell us of such changes in advance.

Finally it’s important not to confuse technology trends with memes, social trends, or simply commercial self interest. I believe that overly grandiose; indeed revolutionary labels such as “Wikinomics” and the “World is Flat” could be describing OURSELVES, not so much as the underlying enablers of the web. There is nothing wrong with this so long as it’s recognized for what it is. One danger is that perfectly valuable technical ideas and tools do not get funding because they are out of step with any given trend today.

Open source software has the benefit of many minds contributing to it, but it doesn’t necessarily create the most advanced or the most stable technology from a software patform perspective. Yes, peer contributions can play a very important role in the ulitmate development of more advanced technology but this is a function of how much the contributors choose to contribute.

So, it’s more likely that Web 3.0 will be turn out to be a label based on people and how they want to interact than about the latest technical functional developments. Since we have a very hard time figuring ourselves out knowing what the next upgrade will be could be tricky. One thing is for certain – when it changes you’ll see it first on the web!

Miles Faulkner
Principal, Faulkner Consulting
Toronto

ONE MORE ASIDE

A funny and slightly dark aside was recently made by the principal academic behind “KnowItAll” a program at the University of Washington that has an online service called TextRunner (yes, knowing it all does seem to be on their long term agenda!). Oren Etzioni commented that his Textrunner application has “Attention Deficit Syndrome” in that it often got caught up in the random nonsense of single “facts” in the on-line universe and that he needed to teach it “focus”. Sounds just a bit like a science fiction plot – don’t you think?

DEFINITIONSMEME
- (http://en.wikipedia.org/wiki/Meme)
Refers to a unit of cultural information transferable from one mind to another. Examples of memes are tunes, catch-phrases, beliefs, clothes fashions, ways of making pots or of building arches.A meme propagates itself as a unit of cultural evolution and diffusion — analogous in many ways to the behavior of the gene (the unit of genetic information). Often memes propagate as more-or-less integrated cooperative sets or groups, referred to as memeplexes or meme-complexes. In this context open source collaboration is surely a typical example of a meme – a belief in sharing.OWL / Web Ontology Language
OWL
is a prescriptive language used in the markup of web content. It defines the concepts, attributes and relationships that exist within a particular group or community. For instance, the word drive is defined differently depending in which class or community is using it, the word has a clearly different meaning when used in reference to automobiles, computers, hockey or psychology.RDF / Resource Description Framework
A family of World Wide Web Consortium (W3C) specifications which has come to be used as a general method of modeling information, through a variety of syntax formats.The RDF model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object.For instance, one way to represent the notion “The sky has the color blue” in RDF is as a triple of specially formatted strings: a subject denoting “the sky”, a predicate denoting “has the color”, and an object denoting “blue”.

SPARQL / SPARQL Protocol and RDF Query Language
Modeled loosely after SQL, the query language SPARQL is emerging as the de-facto RDF query language. On the track towards status of W3C Recommendation, it was released as a Candidate Recommendation in April 2006.

SEO / Search Engine Optimization
The process of structuring web content in such a way that search engines find, index and return content in a rational manner.

.

this article has not been tagged
Bookmark This Article: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • blinkbits
  • BlinkList
  • blogmarks
  • BlogMemes
  • BlogMemes Cn
  • BlogMemes Fr
  • BlogMemes Jp
  • BlogMemes Sp
  • blogtercimlap
  • Blue Dot
  • Book.mark.hu
  • Bumpzee
  • co.mments
  • connotea
  • De.lirio.us
  • DotNetKicks
  • DZone
  • Fark
  • feedmelinks
  • Fleck
  • Furl
  • Gwar
  • Haohao
  • Hemidemi
  • IndiaGram
  • IndianPad
  • Internetmedia
  • kick.ie
  • LinkaGoGo
  • Linkter
  • Live
  • Ma.gnolia
  • MisterWong
  • MyShare
  • Netscape
  • Netvouz
  • NewsVine
  • PlugIM
  • PopCurrent
  • ppnow
  • RawSugar
  • Rec6
  • Reddit
  • Scoopeo
  • scuttle
  • Shadows
  • Simpy
  • Slashdot
  • Smarking
  • SphereIt
  • Spurl
  • StumbleUpon
  • Taggly
  • TailRank
  • Technorati
  • ThisNext
  • TwitThis
  • Webride
  • Wists
  • Wykop
  • YahooMyWeb