Why Watson and SPSS Are IBM’s Big Data Yin and Yang?

SPSS

Image via Wikipedia

By Ramy Ghaly March 02, 2011

However, the promise of this holistic data environment doesn’t stop with IBM.

As I reported in a recent post explaining the limits of Watson as a machine-learning platform, its ability to process and answer questions based on natural language is a big deal, but the system as currently comprised is largely relegated to the realm of answering specific questions based on the very specific data loaded into it. However, thanks to a $14 billion investment in analytics acquisitions over the past several years, IBM has a robust portfolio products with which to complement Watson’s impressive capabilities. According to IBM VP of Predictive Analytics Deepak Advani, SPSS — the predictive piece of IBM’s analytics puzzle, which it bought for $1.2 billion in 2009 — might just be Watson’s ideal mate.

The next step is determining the best treatment plan, and that’s where Watson comes in.

Asked a question about the best-possible treatment plan for that particular patient, given the patient’s particular background, Watson could scour its database to suggest a plan, or plans, for the doctor to consider.

Read the full article in “The New York Times” Via ctrl-News

 

Advertisements

Is “Personalized News” a Trend that is here to stay or is it Just a Fad?

The New York Times

Image by Laughing Squid via Flickr

February 17, 2011

 

Many thanks to a growing number in conventional media, especially, online media that are now looking into customized news trends as an opportunity to  ride the social wave which offers their audience a free service that now allows them to “personalize news”. Although, this trend is not new, but lately I have seen major players in conventional news media giants like WSJ, NYT, WP, LA Times and others are jumping into, and offer such solutions to “personalize news”.

Here are some trends/facts of the media industry and how I do see it going. Nonetheless, a further deep analysis is needed to find out exactly if this trend sustainable or is it just a Fad?

I call this trend “News Social Communities“. In conventional media, personalize news services are new to traditional news agencies where they are engaging their audience to share, build communities, discuss hot topics in the news, and even be able to share their blog posts on hot topics or trending topics in the news today; similarly, to product/brand conversations on social media platforms like Twitter, Facebook, and LinkedIn; just to name a few.

One important finding is worth highlighting in the product functionality is its “search technology” which is being used today. So far, I have looked into most of these services and non offer the semantic approach in finding news. They are all still using keyword based search technology with lengthy profile filing to extract relevant news as per the user preference. However, this all could change the ball game with real challenges in the market place when differentiating elements introducing superior search capabilities.

Would you like to “personalize news feed” using the semantic approach?

I think most of the current and future users that are already have some user experience in this can now see the difference between “keyword based search technologies” opposed to “topic based semantic technologies”. One semantic technology product that recently introduced a free service using the semantic approach in personalizing news feed with the latest technology in “natural language processing”that has a very high accuracy in current industry standards is a service entitled “ctrl-news service”. Introducing this concept will not only be a unique product/service with great differentiating elements in search capabilities, but also it will build strong challenges to others market positioning for at least the next 3 to 5 years. News Social Communities are very attractive in many ways, one element is keen to stress about is “real ctrl” , it gives the reader/audience to really control the news they like/follow also gives them an ability to input their opinions, thoughts, and get engage in making the news more personal with their groups/communities.

For a real example of this trend, you can click on the link here: http://ht.ly/3TKCD

For another examples of this trend, please click on the WSJ: http://online.wsj.com/community

If interested, you can also find this article on Machable.com entitled: “trends in conventional media building social communities”: http://mashable.com/2010/08/10/personalized-news-stream/


What do you think about this trend? Would you like to name other similar services that are worth highlighting? Do you think this trend is here to stay?

 

What is “ctrl-news” ? FAQs #search #media #analytics #semantic #textmining #datamining #news

ctrl

Image via Wikipedia

Ramy Ghaly February 16, 2011

Ctrl-News is powered by the “ctrl semantic engine“, ctrl-news is an online service that can be viewed as a “customized news homepage” for its users.

The ctrl-News homepage allows users to filter news semantically and topically, depending on their preferred and customized subject(s) of interest. Moreover, users can see an automatically-generated summary, as well as the entities and the key topics (disambiguated) for every news article received.

Who is behind ctrl-News?

ctrl-News is not only powered by the ctrl semantic engine, it was actually created by the same team who created ctrl to show ONE possible application that can be built on top of their semantic engine ctrl – the first product of the Research and Development group at PRAGMATECH.

 

How to register to this service?

Click on the “FREE Sign Up” button and fill in the required fields in the personal settings. You also need to provide ctrl-news with at least one subject of interest that you like to follow.

 

The following error appeared “Subject of interest X has 44 words. It should have a minimum of 80 words” why is that?

ctrl-News relies on heavy semantic analysis of the text that comprises your subjects of interest in order to find the most related articles from everyday’s news. Knowing that, it is recommended to enter at least 80 words in each subject of interest to ensure best results.

 

How to retrieve my password?

Click on the “forgot my password” link on top , and provide us with either your username or email. We will reset your password and send you a new one to your registered email.

 

Concerning retrieved articles, what retrieval accuracy should I expect from ctrl-news?

ctrl-News results are all automatically generated by the ctrl semantic engine, and, therefore, rely totally on the topical similarity between the entered subject of interest and the processed news articles. Therefore, in some cases, the returned articles do not match the user’s expectations – although we are certain that our accuracy in determining topic relevancy is extremely high. Please use the feedback functionality to report such cases helping us accordingly to improve the existing system.

 

How does ctrl-News decide on the similarity between articles and my subject of interest?

ctrl-News uses the ctrl semantic engine to process each subject of interest you enter. It then tries to find the most related articles from the processed news articles based on their topical similarity (in addiction to other subtle document attributes). By topical similarity we mean that the main topics of the subject of interest are similar to the ones of these articles. Accordingly, you might receive articles that are discussing the same topic with different ‘players’ (location, people, etc). Please use the feedback functionality to report the cases where you didn’t find the results relevant.

 

How far back does ctrl-news retain my retrieved results?

Currently 2 weeks. However, we may prolong this period in the future if enough users deemed this necessary.

 

What is meant by entities?

The Entities section displays all the ‘names’ of things that ctrl automatically identified in the text along with their specific type and definition. These include names of Agencies, Associations, Awards, Brands, Companies, Consumer products, Diseases, Educational institutes, Events, Medicines, Organizations, Outlets, People, Political events, Political group, Publication, Software systems, Teams, Tournaments, Websites, gadgets.

 

What languages does ctrl-News support?

Currently, ctrl-news only supports the English language. However, with relatively minimal effort, its semantic engine can virtually support any language.

 

Can I advertise on ctrl-News?

Pragmatech’s ctrl team is currently working on another application that can showcase their ctrl engine’s superiority in semantic analysis through semantic push of ads on ctrl-News. This functionality will be free of charge and very easy to use. Stay posted!!!

 

Which media sources do you fetch your news from?

On the bottom part of homepage (when not logged in) a list of news sources appears in a flash animation. Also, you may find the list in the archive section. However, this list contains more detailed information of the RSS feeds ctrl-news fetches from each of these news sources.

 

ctrl-news have a predetermined list of news sources. Can I add or suggest a new one?

We welcome all suggestions and we will try to process all requests. Clearly, multiple requests for the same news source will receive more consideration than others.

 

How many friends can I invite to join ctrl-News?

Invite as many friends as you like, or share it on your favorite social-network by using the existing share panel found on all pages.

 

How to stop receiving the daily emails while retaining my registration with ctrl-news?

Login and go to your “My Profile” page (link on top). Click on the “Personal Settings” panel, and tick the “Deactivate Your Daily Email Report” checkbox.

 


Since ctrl-News is powered by the ctrl semantic engine, its results are more accurate than any of the other existing keyword-based attempts. We hope you find ctrl-News beneficial. Don’t forget to provide feedback and why not even invite few of your friends!

Via ctrl-News

Take ctrl of the news today! Now you can personalize your news feed in a way never done before

User Experience consists of Usability, Look an...

Image via Wikipedia

As some of you might have already noticed, a new version of ctrl-News is now online. This version includes many new features along with a completely revamped and more user-friendly design.

 

The new website allows you to add subjects of interest on the fly using the “ctrl-this” functionality. It also displays the Entities (companies, people, tournaments, publications, brands, products, countries, etc) and the category automatically identified for each article along with the pre-existing sections of the summary, the key topics, as well as a sample of topically related articles.

 

Also added are two sections that allow you to view the automatically generated “Hot News” and “Most Followed” (article) as well as your ratings’ statistics and friends’ invitations status.

 

The new ctrl-News version makes your user-experience more pleasant!

 

ctrl-News always welcome your comments/feedback.

 

Via Pragmatech R&D Team – ctrl Group

 

2011: A Big Year for Big Data

Entrance of IBM Headquaters, Armonk, Town of N...

Image via Wikipedia

Posted by Ramy Ghaly February 03, 2011

During the 2011 NFL playoff TV broadcasts, amid commercials featuring Anheuser-Busch Clydesdales and auto-racing driver Danica Patrick, one ad features an IBM researcher talking about data analytics.

With data volumes moving past terabytes to tens of petabytes and more, business and IT leaders across the board face significant opportunities and challenges from big data. For a large company, big data may be in the petabytes or more; for a small or mid-size enterprise, data volumes that grow into tens of terabytes become “big data”.

In the financial services industry, Fidelity National Information Services (FIS), which sells risk management and fraud detection services to credit card issuers, uses big data analytics to better detect credit card fraud.

Nonetheless, 2011 is shaping up to be a big year for big data.

This summary was made possible by ctrl-news Via The New York Times

Does Social Media Sets “New World Order” in Building Revolutions Throughout the Middle East?

The Suez Canal crosses the Suez isthmus

Image via Wikipedia

BY consultramy

While the uprising started in Tunisia requesting from a long dictator to step down, Egypt is following suite of the recent Tunisian’s Jasmine Revolution seeking reform, freedom, and respect by using social media to organize, communicate, and express their feelings not only to the world, but to say to their dictators “this is the end of your rule, and now is our time to make the change weather you like it or not”.

The young catalysts of Twitter, Facebook, and social media are behind the driving force of the revolution not only in Tunisia and Egypt, but also in Jordan and Yemen where the role of social media has proven a new world order in digital social communications to bring dawn totalitarian regimes all over the Middle East. Nonstop-able young embryonic youth, the educated elite of the Arab world are speaking out seeking freedom. Yes, freedom is a high commodity especially in the Arab world where no real democracies exists; yet decades of totalitarian regimes that took the power of their people and turned them into modern slaves. Moreover, the blocked elite of Egypt are speaking out after thirty years of unprecedented selfish rule that stole their dignities by keeping them away from engaging in decision making, take responsibilities of their future are now fighting for their pride in setting the road map to gain freedom, the right to speak out, needless to say, gain respect by re-writing history that makes them proud citizens of their forced change to set a better example not only to Egypt, but to other dictators in the Middle East and the world. Now, social media set fear in dictatorships throughout the Arab world where Egypt’s dictatorships shut down all social networks and communications to put more pressure on the young youth to stop their uprising, but that not only turned against the totalitarian regime, instead, it sparked more anger and gave the youth more reason not to give up that easily; yet embrace the fight for freedom to the end while Mr. Mubarak is witnessing his last moments of the 30 years old rule.

Going social in the digital world is building a new world order by driving change and bringing down dictators at least in the Arab world. The young blocked elite of Egypt are seeking change from regime reform, free elections, and the ability to choose their own leaders, to requesting their basic civil rights to be heard and respected; yet achieving it is even more challenging when chaos takes over in the streets and sets a whole new ball game. Social media is playing a vital role to make change possible and give these young rebels a tool that it was unprecedented before having the ability to bring down dictators and set a new world order in social digital communications.

Important facts about Egypt according to CIA FactBook‘s latest figures: 2010

Population: 80 Million

Median age: 24 years

GDP: Approximately $216.8 billion (2009)

Per capita income: $6,200 (GDP/year-PPP 2010)

Unemployment: 9.7% (est.)

Poverty: 40%

Strategic interests to the world:

  • Israeli/Egyptian Peace Treaty signed In 1979.
  • It is estimated that 10 percent of the global crude oil demand passes through the Suez Canal.
  • The biggest population in the Middle East
  • Egypt receives nearly $2-$3 billion in aid per year from the United States.
  • Egypt holds ancient treasures and artifacts.
  • The Pyramids are one of the top wonders of the world.
  • A global destination for tourists making it one of the top tourist markets in the world.

Popular Hash Tags used for the “uprising in Egypt” on Twitter:

#liberation technology

#social freedom

#twitter revolution

#SM revolution

#Egypt

#jan25

#freedom

#Tahrir

#Cairo

#reform

#protest

#democracy

#support

#liberty

#civil liberty

#uprising

#Mona Altahawy

Latest Sentiment analysis of Egypt’s uprising since it started six days ago on January 25, 2011

Link here: Egypt’s Uprising Sentiment Analysis

About the Author:

Ramy Ghaly is a Marketing Strategist with more than ten years in international markets experience. He held professional and managerial positions in varicose global markets in various industries ranging from retail, wholesale, consumer goods, to technology product management with concentration in channel development. In addition, He holds a degree in International Marketing Management with a minor in International Relations and Middle Eastern studies from Daytona State College. He is interested in social media developments, next generation search technologies, semantic search engines, and text analytics; needless to say, strategies in geopolitics, Middle Eastern Studies, and Environmental factors that affect global business growth are general interests that keen to always monitor and encourage writing about.

The main issue in “Approximate String Matching in databases” or “Fuzzy Matching” is PERFORMANCE

Architecture of Microsoft SQL S

Image via Wikipedia

Ramy Ghaly January 28, 2011

In other words, I’m trying to find who has this technology and what claims do they have about the performance of their systems?

My question is technical more than commercial. However, the application is for commercial use in “fuzzy search matching” technology and performance.

For example, a vendor that is specialized in name and address fuzzy matching and has the below performance:

matchIT SQL performance (based on Windows XP, SQL Server 2005, Intel Core 2 Quad CPU, 2.40GHz, 4 GB RAM):

· Key generation = 10mil records/hour
· Find matches (one table, standard low-volume match keys):
o 1 million records = 14mins
· Find overlap (two tables, standard high-volume match keys):
o 40mil+5mil = 2.5hours

 

What Key Professionals Are Saying?


Cohan Sujay Carlos

Follow Cohan Sujay

Cohan Sujay Carlos PostgreSQL has a module for fuzzy matching of names:http://www.postgresql.org/docs/current/static/fuzzystrmatch.html

Solr/Lucene uses Levenshtein distance:http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Fuzzy%20Searches

Pentaho provides a choice of algorithms in its ETL engine.

1 day ago

Anurag Gupta

Follow Anurag

Anurag Gupta FAST ESP (now FSIS) provides choices in fuzzy matching, photenic matching and even full wildcards. Performance here is a function of hardware resources, starting with a few million documents being matched in sub-seconds on a single server install. Not to mention, there are other factors like query load, doc size, query size etc which all will need to be considered for an appropriate sizing. Hope this helps.

1 day ago

Vineet Yadav sphnix (http://sphinxsearch.com/) is an open source full text search engine server and it can be used with both sql database and nosql datastore. It supports real time indexing.
You can find performance comparison between sphnix and other open source search engine http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/. Also postgres supports fuzzy string matching algo (http://www.viget.com/extend/alternatives-to-sphinx-fuzzy-string-matching-in-postgresql/). Agrep (http://en.wikipedia.org/wiki/Agrep) is used for fast fuzzy string matching and it is based on wu-manber algorithm (http://webglimpse.net/pubs/TR94-17.pdf). I suggest you to look at modified version of wu-manber algorithm.

1 day ago

Hans Hjelm

Follow Hans

Hans Hjelm


This article springs to mind:

http://cs.anu.edu.au/~Peter.Christen/publications/ausdm2008christen.pdf

As for commercial systems, that’s a different story.

Steve Harris

Follow Steve

Steve Harris There’s some fuzzy word matching inside 4store, but we designed it for fuzzy matching on names, and document titles, so it might not be efficient enough on large blocks of text. It’s based on double metaphones, and supports a number of languages.

http://4store.org/trac/wiki/TextIndexing

2 days ago

http://static02.linkedin.com/scds/common/u/img/icon/icon_no_photo_80x80.png

Follow Moty

Moty Mondiano The problem with fuzzy is that every permutation of the word has to be calculated and match in real time (unless the dictionary is very small creating pre- calculating and indexing all possible permutation is out of the question). i.e. fuzzy/distance is inherently costly.
An alternative would be to equate the “canonical” form of the words (create a single canonical form for the word) than match toCanonic ( input) with toCanonic(candidate). This way performance is the same as regular comparison.
Consider using “syd” or “soundex” for implementing “toCanonic”. it all depends if for example soundex (using the phonetic signature of the word) is a good enough match for your needs.

2 days ago

Jeremy Branham

Follow Jeremy

Jeremy Branham Maybe you could use a stemming algorithm to create the canonical form of the word. Like the Porter Stemmer algorithm…
http://tartarus.org/~martin/PorterStemmer/

2 days ago

Adi Eyal

Follow Adi

Adi Eyal Depends on what exactly it is that you want to fuzzy search. Canonical forms like stemming, soundex, nysiis etc don’t work very well in practice and don’t cover important semantic classes. If you want to find Chuck when searching for Charles or Prescription Shoppe for Drugstore, there’s not much to do but to either index Chuck along with Charles or to convert the search string to Charles or Chuck in the background.

The trade-off of course is computational time vs storage. If you pre-index then you’re increasing storage requirements for your database while pre-processing the search string while increase the computational burden at search time.

I don’t think there’s a silver bullet to solve this problem and the difficulty depends on what you’re trying to achieve.

1 day ago

Mark Bishop

Follow Mark

Mark Bishop For best-fit problems you may be interested in the application of a Swarm Intelligence meta-heuristic called Stochastic Diffusion Search.

For time complexity issues see: <http://www.doc.gold.ac.uk/~mas02mb/Selected%20Papers/1998%20Time%20complexity.pdf>.

2 days ago

Neil Smith

Web Developer at Basekit

see all my answers

Best Answers in: Web Development (16) see more

Compare it to Sphinx fulltext search engine, particularly when running in clustered mode and indexing of large volume document sets.

Typically quoted : 10-15MB/second indexing speed PER CORE, 500qps fulltext search, 50 mil queries per day (craigslist)

It’s widely used on sites such as Craigslist and other large scale sites – routinely with > 2TB datasets.

 

Links:

· http://sphinxsearch.com/about/sphinx/

· http://notes.jschutz.net/wp-content/uploads/2009/04/sphinx-performance.pdf

· http://sphinxsearch.com/info/powered/

Posted 2 days ago |

Edwin Stauthamer

Consultant Search Solutions at Emid Consult B.V. Specialist on Google, Autonomy and Exalead technologies.

See all my answers

I would like to ask you a question instead of answering it: ‘Why are you using a database for this?”
And now the answer: “Specifically for this kind of text retrieval purposes Search Engines are invented”.

There are plenty of engines out there that can solves this performance issue, amongst them Autonomy, Solr and Exalead.

Databases are commonly optimized for transactional performance and not retrieval purposes.

Posted 2 days ago

Charlie Hull

Follow Charlie

Charlie Hull In my experience this isn’t likely to be publically available information, although vendors will tell partners and potential customers if asked.

2 days ago

Sam Mefford

Follow Sam

Sam Mefford We created a large-scale fuzzy search for a government organization searching across 20 million names. With tuning and distribution, we were able to get most queries to run sub-second. We used Autonomy IDOL.

22 hours ago

Charlie Hull

Follow Charlie

Charlie Hull

We’ve implemented fuzzy search for a media monitoring application – there’s more at http://www.flax.co.uk/blog/2010/12/13/next-generation-media-monitoring-with-open-source-search/ – made some great improvements in accuracy for our client. We used Python and Xapian.

39 minutes ago

Vadim Berman

Follow Vadim

Vadim Berman
Yep, it always is. The thing is, it’s more of a database related issue, if I understand it correctly. Different vendors resolve it in different ways with different strengths and weaknesses, so you might want to benchmark different databases for your particular application.

You might need to tweak things on your side as well. Generally, if you have a fixed prefix (e.g. “blah%” and not “%blah%”), it is easier to resolve for most systems, and does not require full-text indexing.

Otherwise, there are still ways to speed it up, but it’s not as straightforward. Of course, full-text index helps.

I don’t want to start “religious wars” which one is better, but from what I know, PostgreSQL is more “tweakable” than MySQL and scales better for clustering and such. I never tried it though. Try these:

http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#Advanced_Indexinghttp://www.postgresonline.com/journal/archives/51-Cross-Compare-of-SQL-Server,-MySQL,-and-PostgreSQL.html

Michael Belanger

Michael Belanger Look into Kenneth Baclawski’s work (Northeastern University)

2 days ago

Simon Spero

Follow Simon

Simon Spero A lot depends on the size of the data set vs. the query rate. A lot also depends whether there is domain specific knowledge that can be brought to bear. For example, there has been a lot of work on approximate name matching, and a lot of systems have support for coding scheme like Soundex. See e.g:

Cohen, W., P. Ravikumar, and S. Fienberg (2003). “A comparison of string distance metrics for name-matching tasks”. In: the Proceedings of the IJCAI. American Association for Artificial Intelligence.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.15.178&rep=rep1&type=pdf

A good introduction to large scale approximate matching can be found in:

Scheuren, Fritz, William E Winkler, and Thomas Herzog (2007). Data quality and record linkage techniques. New York: Springer. ISBN: 9780387695051.

Winkler, William E. (2006). Overview of Record Linkage and Current Research Directions. Research Report Series, Statistics #2006-2. Statistical Research Division U.S. Census Bureau. URL: http://www.census.gov/srd/papers/pdf/rrs2006-02.pdf

Some systems can maintain n-gram indexes, which can dramatically improve query times at the expense of slower updates. Postgresql has had support for trigram indexes for some time. Postgresql 9 greatly improved the update speed for these indexes.

See: http://www.postgresql.org/docs/9.0/static/pgtrgm.html

Postgresql also has support for fuzzy matching; some of the methods can be used to build indexes; however, levenshtein distances must be computed on the fly.

See: http://www.postgresql.org/docs/9.0/static/fuzzystrmatch.html

2 days ago

Charles Patridge

Follow Charles

Charles Patridge

I have been doing “Fuzzy Search Matching” since 1980’s, primarily using the SAS Software. Yes, Performance is a big issue, and it chews up a lot of CPUs.

SAS is pretty good at this but it is expensive / slow to get the results you desire. In SAS, I have to use a number of SAS functions such as INDEX, INDEXC, SOUNDEX, RXPARSE, and other such functions; most of which are CPU bound.

It seems “Regular Expressions” & PERL have more power to FUZZY Search Matching but I can not give specifics as to how much faster it is over SAS but I suspect it is faster based upon what I have seen.

I do not know of any other tools that do this kind of searching nor any vendors that promote this capability.

If you happen across a vendor that says it is good at FUZZY Search, I would like hear about them.

2 days ago

Marie Risov

Follow Marie

Marie Risov SQL Server Integration Services (SSIS) has fuzzy grouping and fuzzy matching transformations. They are also resource intensive.

2 days ago

Note: With this question posted in 28 targeted groups on LinkedIn with more than 15-20 thousand potential views, the question/topic made a huge impact in these groups to a point that this question was voted “Top Influential Discussion Of The Week in 80 to 90 percent of them. It would be nice to keep this conversation going with some replies to these mentioned responses.

For more information link to the question here: LinkedIn or you can also link to Q&A .

Thank you all for your support and cooperation.

What are the most challenging issues in Sentiment Analysis(opinion mining)?

A Twitter tweet

Image via Wikipedia

Ramy Ghaly January 28, 2011

Hossein Said:

Opinion Mining/Sentiment Analysis is a somewhat recent subtask of Natural Language processing.Some compare it to text classification,some take a more deep stance towards it. What do you think about the most challenging issues in Sentiment Analysis(opinion mining)? Can you name a few?

Hightechrider Said:

The key challenges for sentiment analysis are:-

1) Named Entity Recognition – What is the person actually talking about, e.g. is 300 Spartans a group of Greeks or a movie?

2) Anaphora Resolution – the problem of resolving what a pronoun, or a noun phrase refers to. “We watched the movie and went to dinner; it was awful.” What does “It” refer to?

3) Parsing – What is the subject and object of the sentence, which one does the verb and/or adjective actually refer to?

4) Sarcasm – If you don’t know the author you have no idea whether ‘bad’ means bad or good.

5) Twitter – abbreviations, lack of capitals, poor spelling, poor punctuation, poor grammar, …

 

ealdent Said:

I agree with Hightechrider that those are areas where Sentiment Analysis accuracy can see improvement. I would also add that sentiment analysis tends to be done on closed-domain text for the most part. Attempts to do it on open domain text usually winds up having very bad accuracy/F1 measure/what have you or else it is pseudo-open-domain because it only looks at certain grammatical constructions. So I would say topic-sensitive sentiment analysis that can identify context and make decisions based on that is an exciting area for research (and industry products).

I’d also expand his 5th point from Twitter to other social media sites (e.g. Facebook, Youtube), where short, ungrammatical utterances are commonplace.

 

Skarab Said:

I think the answer is the language complexity, mistakes in grammar, and spelling. There is vast of ways people expresses there opinions, e.g., sarcasms could be wrongly interpreted as extremely positive sentiment.

 

What do you think? Do you agree? Would you like to ask a question and get an answer? Try out: Q&A for professional and enthusiast programmers

 


Social commerce to surge

Robert Scoble (left) and Mark Zuckerberg (righ...

Image via Wikipedia

NEW YORK: Social commerce sales will rise dramatically during the next five years, encouraging brands and retailers to enhance their presence on sites like Facebook, Booz & Co has argued.

In a new report, the consultancy stated marketers must shift the terms of engagement with consumers using Web 2.0 properties from “like” to “buy”.

“The market for social commerce has been embryonic to date, but that will change over the next five years as companies race to establish stores,” it said.

“Trendsetting companies are focused on products and services that benefit from the unique characteristics of social media, including the opportunity to get quick feedback from multiple friends and family members.”

The study praised 1-800-Flowers, which boasts a fully-functioning Facebook store allowing customers to complete purchases without leaving the network’s pages.

It has also implemented other innovative strategies, for example linking Facebook’s calendar and “group gifting” features to its Mother’s Day campaign.

“We are going to continue to invest in certain areas to help drive future growth,” Bill Shea, 1-800-Flowers’ chief financial officer, said in late 2010.

“Whether it be franchising efforts for both the consumer floral and our food group, investments in mobile and social commerce [or] floral supply chain in Celebrations.com, we are going to continue to invest.”

Dell was cited as another pioneering early-adopter, having earned millions of dollars in revenue through Twitter.

The IT specialist is becoming increasingly active in the smartphone and tablet segments, which the organisation believes will transform the retail sector.

“It used to be ‘We’re going to tell you how you’re going to experience our store,'” said Brian Slaughter, Dell’s director, end-user solutions, large enterprises.

“Now the consumer is walking in and saying: ‘No, I’m going to tell you how I’m going to use your store to give me more information.’ The tools they have at their disposal are very cool.”

Similarly, Quidsi, owned by Amazon, recently set up Facebook outlets for its Soap.com and Diapers.com platforms, although the ability to make purchases is limited to members of these two portals.

“No one has yet cracked the nut on Facebook e-commerce,” said Josh Himwich, Quidsi’s vp, ecommerce.

Overall, Booz estimated sales of physical goods via social channels would hit $5bn (€3.7bn; £3.1bn) globally in 2011, with the US contributing 20% of this total.

Revenues were pegged to reach $9bn by the close of 2012, incorporating $3bn generated by American internet users.

Such figures should achieve $14bn and $5bn respectively for 2013, while US customers deliver nearly half of the $20bn returns yielded in 2014.

By 2015, the worldwide expenditure attributable to this medium is anticipated to come in at $30bn, housing $14bn from the world’s biggest economy.

A previous Booz survey of netizens dedicating one hour or more a month to social networks, and who bought at least one product online in the last year, found 20% proved willing to pay for items through these sites.

Elsewhere, 10% suggested this spending would be incremental to their current outlay, but 71% added “liking” a brand on Facebook did not improve the probability of buying it.

The consultancy predicted that social media will have the greatest impact on consideration, conversion, loyalty and customer service.

Facebook’s chief executive Mark Zuckerburg certainly supported such as optimistic reading when rolling out the Places geo-location system last year.

“If I had to guess, social commerce is the next area to really blow up,” he said in August.

Data sourced from Booz & Co, Seeking Alpha, Daily Finance; additional content by Warc staff, 21 January 2011

Via: WARC

Hello world!

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!