Schema.org – Threat or Opportunity?

What exactly is Schema.org?

  • It is a list of instructions for adding structured data to HTML pages.
  • Webmasters can choose from a long, but finite list of types and properties.
  • Data-enhanced web pages trigger richer displays in Google/Bing/Yahoo search result pages.

Why the uproar?

  • Schema.org proposes the use of Microdata, a rather new RDF format that was not developed by the RDF community.
  • Schema.org introduces a new vocabulary which doesn’t re-use terms from existing RDF schemas.

Who can benefit from it?

  • The web, because the simple template-like instructions on schema.org will boost the amount of structured data, similar to Facebook’s Open Graph Protocol.
  • The semantic web market, by offering complementing as well as extending/competing solutions.
  • SEO people, because they can offer their service with less effort.
  • Website owners, who can more reliably customize their search engine displays and increase CTRs.
  • Possibly HTML5 (doctype) deployment, because the supported structures are based on HTML5’s Microdata.
  • Verticals around popular topics (Music, Food, …) because the format shakeout will make their parser writers’ lifes easier.
  • Verticals who manage to successfully establish a schema.org extension (e.g. Job Offers).
  • The search engine companies involved, because extracting (known) structures can be less expensive and more accurate than NLP and statistical analysis. Controlling the vocabulary also means being able to tailor it to semantic advertising needs, integrating the schema.org taxonomy with AdWords would make a lot of (business) sense. And finally, the search engines can more easily generate their own verticals now (as Google has already successfully done with shopping and recipe browsers), making it harder for specialized aggregators to gain market share.
  • Spammers, unless the search engines manage to integrate the structured markup with their exisitng stats-based anti-spam algorithms.

Who might be threatened and how could they respond?

  • Microformats and overlapping RDF vocabularies such as FOAF or GoodRelations, which Schema.org already calls “earlier work”. Even if they continue to be supported for the time being, implementers will switch to schema.org vocabulary terms. One opportunity for RDF schema providers lies in grounding their terms in the schema.org taxonomy and highlighting use cases beyond the simple SEO/Ad objectives of Schema.org. RDF vocabs excel in the long tail, and there are many opportunities left (especially for non-motorcycle businesses ;-). This will best work out if there are finally going to be applications that utilize these advanced data structures. If the main consumers continue to be search engines, there is little incentive to invest in higher granularity.
  • The RDFa community. They think they are under attack here, and I wonder if Manu is overreacting perhaps? Hey, if they had listened to me they wouldn’t have this problem now, but they had several reasons to stick to their approach and I don’t think these arguments get simply wiped away by Schema.org. They may have to spend some energy now on keeping Facebook on board, but there are enough other RDFa adopters that they shouldn’t be worried too much. And, like the RDF vocab providers, they should highlight use cases beyond SEO. The good news is that potential spam problems, which are more likely to occur in the SEO context, will now get associated with Microdata, not RDFa. And the Schema.org graph can be manipulated by any site owner while Facebook’s interest graph is built by authenticated users. Maybe the RDFa community shouldn’t have taken the SEO train in the first place anyway. Now Schema.org simply stole the steam. After all, one possible future of the semantic web was to creatively destroy centralized search engines, and not to suck up to them. So maybe Schema.org can be interpreted as a kick in the back to get back on track.
  • The general RDF community, but unnecessarily so. RDFers kicked off a global movement which they can be proud of, but they will have to accept that they no longer dictate how the semantic web is going to look like. Schema.org seems to be a syntax fight, but Microdata maps nicely to RDF, which RDFers often ignore (that’s whyschema.rdfs.org was so easy to set up). The real wakeup call is less obvious. I’m sure that until now, many RDFers didn’t notice that a core RDF principle is dying. RDFers used to think that distinct identifiers for pages and their topics are needed. This assumption was already proved wrong when Facebook started their page-based OGP effort. Now, with Schema.org’s canonical URLs, we have a second, independent group that is building a semantic web truly layered on top of the existing web, without identifier mirrors (and so far without causing any URI identity crisis). This evolving semantic web is closer to the existing web than the current linked data layer, and probably even more compatible with OWL, too. There is a lot we can learn. Instead of becoming protective, the RDF community should adapt and simplify their offerings if they want to keep their niches relevant. Luckily, this is already happening, as e.g. the Linked Data API demonstrates. And I’m very happy to see Ivan Herman increasingly speaking/writing about the need to finally connect web developers with the semantic web community.
  • Early adopters in the CMS market. Projects like Drupal and IKS have put non-trivial resources into integrating machine-readable markup, and most of them are using RDFa. Microdata, in my experience, is easier to tame in a CMS than RDFa, especially when it comes to JavaScript operations. But whether semantic CMSs should add support for (or switch to) Schema.org microdata and their vocabulary depends more on whether they want/need to utilize SEO as a (short-term) selling proposition. Again, this will also depend on application developer demands.

What about Facebook?

Probably the more interesting aspect of this story, what will Facebook do? Their interest graph combined with linked data has big potential, not only for semantic advertising. And Facebook is interested in getting as many of their hooks into websites as possible. Switching to Microdata and/or aligning their types with Schema.org’s vocabulary could make sense. Webmasters would probably welcome such a consolidation step as well. On the other hand, Facebook is known for wanting to keep things under their own control, too, so the chance of them adopting Schema.org and Microdata is rather low. This could well turn into an RSS-dejavu with a small set of formats (OGP-RDFa, full RDFa, Schema.org-Microdata, full Microdata) fighting for publisher and developer attention.

Conclusion

I am glad that Microdata finally gets some deserved attention and that someone acknowledged the need for a format that is easy to write and to consume. Yes, we’ll get another wave of “see, RDF is too complicated” discussions, but we should be used to them by now. I expect RDF toolkits to simply integrate Microdata parsers soon-ish (if we’re good at one thing then it’s writing parsers), and the Linked Data community gets just another taxonomy to link to. Schema.org owns the SEO use case now, but it’s also a nice starting point for our more distributed vision. The semantic web vision is bigger than data formats and it’s definitely bigger than SEO. The enterprise market which RDF has mainly been targetting recently is a whole different beast anyway. No kittens killed. Now go build some apps, please 😉

Via bnode.org 

Advertisements

New Products Announced at SemTech 2011

A. Krizhanovsky, F. Lin, Related terms search ...

Image via Wikipedia

New Products Announced at SemTech 2011 http://www.prweb.com/releases/2011/6/prweb8497129.htm

 

Leading industry companies will unveil and debut the newest products paving the way in semantic technology.

 

New York, New York (PRWEB) June 01, 2011

 

Mediabistro.com (a division of WebMediaBrands Inc., Nasdaq: WEBM) today announced new product releases that will be revealed at the Semantic Technology Conference (#SemTech), the world’s largest conference on the commercialization of semantic technologies, taking place June 5-9, 2011 at the San Francisco Hilton in Union Square.

 

SemTech is the preferred industry platform for exhibitors to announce product launches and breaking news. Attendees will have the rare opportunity to view products and services from top industry insiders including the following: 

 

Revelytix will discuss several products, including: Spyder – a Relational to RDF conversion tool, Spinner – a SPARQL federation tool, and Rex – a RIF rules engine. Together these tools transform the information management capabilities of any enterprise.

 

Oracle will show how semantic tools within Oracle Database can effectively store, manage, inference and query RDF/OWL data for enterprise applications.

 

Ontotext will present the Web Mining Framework, (WMF) which involves a process of focused web crawling, screen scraping, text-mining, normalization, data merging, and de-duplication, resulting in normalized, structured data.

 

Inform Technologies will launch the Inform AdContext Service, which uses semantic metadata to fine-tune ad selection and make ads more topically relevant to the content.

 

Clark & Parsia will show how Pellet 3 (a leading OWL 2 reasoner) and Stardog (the new, world-class RDF database featuring fast SPARQL query performance) can be used to build fast and scalable semantic applications for the enterprise.

 

Cambridge Semantics will demonstrate how non-technical business users can combine Microsoft Excel and Anzo ETL to intuitively create mappings from an existing clinical database to the industry standard SDTM ontology and do live analysis.

 

Cray, Inc. will launch the Cray XMT System, designed specifically to run challenging big data graph analytics workloads that bring traditional systems to their knees.

 

Pragmatech will debut CTRL, a semantic engine that goes beyond words into concepts, which are then composed into topics that are subsequently analyzed to identify the ‘key’ topics that describe what a certain document is about.

 

ai-one will debut the Topic-Mapper SDK for text, enabling creation of intelligent applications that deliver better capabilities for semantic discovery, lightweight ontologies, knowledge collaboration, sentiment analysis, AI and data mining.

 

Talis will present Kasabi, a new web application that aims to support organisations in the publishing and monetization of data on the web.
Protégé will provide updated information on the latest enhancements to the tool and a description of WebProtégé, the web-based version that provides lightweight ontology editing directly in your ontology browser.

 

Knowledge Hives will introduce Civet, which uses NLP techniques to identify and analyze keywords in text; map them to concepts from vocabularies such as WordNet; and deliver an RDFa document with key words, phrases and names referencing Linked Data concepts.
Semantrix will debut its SM3 Social Multimedia Metadata Manager, which delivers enhanced content value through metadata extraction, annotation and cross-referencing using NLP, Search, Ontology and proprietary concept extraction.

 

MIT and Zepheira will show the latest work on Exhibit 3.0, fixing many shortcomings of the original, popular tool from the MIT Simile Project, making it far more scalable, modular, and feature rich.

 

To register for the conference, request a press pass, or to view the program schedule, visit http://semtech2011.semanticweb.com
2011 SemTech sponsors, leading vendors, and developers will demonstrate dozens of innovations at the SemTech Expo Hall.
They include Ontotext, Oracle, Revelytix, Elsevier, Fluid Operations, iQser, Ontoprise, OpenAmplify, OpenText, Orbis, TopQuadrant, XSB, Cognition, Morgan Kaufmann, Expert System, Tom Sawyer Software, Semantic Valley, Semantifi, Liaison Technologies, DERI, Franz, Semantic Arts, Semsphere, and more.

 

For sponsorship and exhibit information, contact: 

 

Frank Fazio
Senior Director of Sales, Events
203-662-2887
eventsales(at)webmediabrands(dot)com

 

About WebMediaBrands Inc.

 

WebMediaBrands Inc. (Nasdaq: WEBM) (http://www.webmediabrands.com), headquartered in New York, NY, is a leading Internet media company that provides content, education, and career services to media and creative professionals through a portfolio of vertical online properties, communities, and trade shows. The Company’s online business includes: (i) mediabistro.com, a leading blog network providing content, education, community, and career resources (including the industry’s leading online job board) about major media industry verticals including new media, social media, Facebook, TV news, sports news, advertising, public relations, publishing, design, mobile, and the Semantic Web; and (ii) AllCreativeWorld.com, a leading network of online properties providing content, education, community, career, and other resources for creative and design professionals. The Company’s online business also includes community, membership and e-commerce offerings including a freelance listing service, a marketplace for designing and purchasing logos and premium membership services. The Company’s trade show and educational offerings include conferences, online and in-person courses, and video subscription libraries on topics covered by the Company’s online business.

 

All WebMediaBrands press releases are here: 

 

http://www.webmediabrands.com/corporate/press.html
For information about WebMediaBrands contact:
Amanda Barrett
Director of Marketing
212-547-7879
press(at)webmediabrands(dot)com