search

July 07, 2009

Open Source Search

Blogger: Larry Cannell

A few months ago I asked readers of this blog: “Is Enterprise Search Ripe for Open Source Disruption?” This marked the start of my interest in the intersection of these two intriguing topics: open source and search.

Since then Burton Group published a report I authored entitled “Open Source Search: Bringing Enterprise Search Out into the Open.” Here is an excerpt from the opening paragraphs:

‘It has been over ten years since “open source” was first used to describe what was previously called “free software.” Early detractors of open source software pointed to potential risks and claimed only commercial vendors could produce high quality software. However, leading open source development communities quietly moved forward with a sometimes slow, but disciplined, progression of releases to the point at which the quality and robustness of these offerings is no longer easily questioned or challenged.

‘While popular open source projects like the Linux operating system, the Apache Web Server, and the MySQL database were capturing headlines, open source projects that tackle the problem of searching large quantities of content (e.g., Apache Lucene, which provides a high-quality Java search library) have become the basis for search capabilities provided by thousands of Internet sites and many software products. Like popular open source products that have come before, open source search is finding its way into enterprise computing environments by first earning its stripes through successful implementations on the Internet—an ultra-competitive environment where a search-based user experience can be the difference between success and failure.’

I also had the pleasure of moderating a lively panel discussion (that was also titled “Is Enterprise Search Ripe for Open Source Disruption?”) at the Enterprise 2.0 Conference two weeks ago. Participating on the panel were:

  • Jerome Pesenti, Chief Scientist and Co-Founder, Vivisimo
  • Marc Krellenstein, Chief Technology Officer, Lucid Imagination
  • Sid Probstein, Chief Technology Officer, Attivio
  • Stephen “The Search Guy” Green, Senior Staff Engineer, Sun Microsystems Laboratories

Jerome Pesenti put up a good fight and provided the strongest opposition to the idea that open source was ready for enterprise use. Marc Krellenstein, as expected, was the most vocal proponent for open source. In addition, Sid Probstein and Stephen Green contributed their unique perspectives. Sid’s company, Attivio, uses Lucene in their product. Stephen Green is the author of an open source search engine called Minion. Although, somewhat contentious (and loud) at times, the conversation highlighted many of the opportunities and concerns with using open source for enterprise search.

For those of you attending the Burton Group Catalyst Conference later this month, be sure to sit in on the session “Open Source Search: Good Stuff Cheap (With a Few Caveats)” where I will be providing an overview of the topic and discussing the open source products Lucene, Solr, Nutch, Xapian, Flax, OpenPipeline, and SMILA.

May 21, 2009

Changing Assumptions About Search

Blogger: Larry Cannell

Unless you been unplugged from the Internet for the past month you’ve likely heard about Wolfram Alpha. Early on (before it was released for public use) and based on the first impressions of many, it was claimed to be a “Google Killer.” Now, we are seeing a chorus of bloggers saying it is something different and will definitely not kill Google.

While Wolfram Alpha doesn’t appear to be a Google Killer, it does kind of look and act like a search engine. So the confusion is understandable. At the very least, Wolfram Alpha is a thought provoking experiment in information retrieval.

However, the fact that so many people first described Wolfram as a search engine (contrary to how Wolfram describes itself in it’s FAQ) is what I find most intriguing. In many ways, Wolfram is causing each of us to re-examine our own definition of search:

  • If an application uses a natural language processing (NLP) interface (i.e., just type text in a box and click submit), does that make it search? Probably not. But our use of the simple Google type-and-click experience is clearly coloring our expectations of computer systems. This reminds me of a theme FAST (and now, Microsoft) pushed the last two years at the FASTforward conference: search is a “user experience,” not just information retrieval.
  • Many of you may have noticed that Wolfram Alpha contains structured data but search generally deals with unstructured content. So clearly Wolfram Alpha is not search, right? The problem is this statement makes virtually no sense to the average enterprise knowledge worker (“structured data…unstructured…huh?”) who might find a system like this useful. They just want information to make a decision, structured or otherwise. Describing a solution by the type of data it uses is a slippery slope.

This isn’t the first time there’s been confusion around what is and isn’t search. Enterprise examples of this are being sold by companies like Endeca, FAST, and Attivio. These systems are blending structured data and unstructured content in new and interesting ways, with interfaces that allow broad exploration of information, regardless of its source. The blogosphere’s confusion around Wolfram Alpha will be familiar to IT strategists who’ve tried to explain these systems to their CIO.

Maybe these new systems shouldn’t be called search. But how would you describe them otherwise? Let me know by posting a comment below.

May 13, 2009

Sometimes Even Google Needs Metadata

Blogger: Larry Cannell

Earlier this week Google announced a number of new features coming to their Internet search engine. The most interesting to me was their plans to start returning a “snippet” about a web page in search results. Here is what they said:

“These ‘rich snippets’ extract and show more useful information from web pages than the preview text that you are used to seeing. For example, if you are thinking of trying out a new restaurant and are searching for reviews, rich snippets could include things like the average review score, the number of reviews, and the restaurant's price range:”

Sounds cool. But, the most intriguing line from the post was (emphasis added):

We can't provide these snippets on our own, so we hope that web publishers will help us by adopting microformats or RDFa standards to mark up their HTML and bring this structured data to the surface.”

Which made me think about how Google has been marketing their Google Search Appliance (GSA), which they sell to enterprises to crawl and index content on intranets. They have been quite vocal about how a GSA can be dropped into an intranet and immediately return great search results, with virtually no effort at all. For example, here is a line from a whitepaper on Google’s enterprise search site:

“A come-as-you-are approach to indexing eliminates the overhead of preparing documents for admission to the body of searchable data. In any case, your data shouldn’t need a laborious makeover for your search solution to provide relevant results.”

In short Google is saying: “Metadata!? We don’t need no metadata! Our search appliance eliminates the need for metadata.” So while other enterprise search vendors are encouraging customers to attach metadata to their documents and web pages, Google is telling the same people not to worry about “preparing documents.” (Although, to be fair, recent releases of the Google Search Appliance have added features that make better use of metadata)

However, to deliver this new Internet search feature Google now admits they “can't provide these snippets on our own” and that they need additional information embedded within the web page. In other words, sometimes even Google needs a little help from…don’t say it too loud…metadata.

Of course, I am being a little tongue-in-cheek here. But all kidding aside this could be an important development. The source for these snippets is communicated to the search engine through its support for microformats and RDFa, which describe how to structure metadata embedded in a web page. This metadata provides information in a way that search engines and any other application crawling a web page can read directly, rather than inferring from a web page. In the example above the four star review for “Drooling Dog Bar B Q” came from this metadata.

Although there has been some skepticism expressed about Google’s efforts here it will be interesting to see if these efforts start getting more content providers to use these standards.

February 11, 2009

This is a Microsoft Conference?

Blogger: Larry Cannell

Today is the final day for the FASTforward 2009 Conference. This is the fourth year it has been held but the first one planned, developed and run under Microsoft management. Now, I have been to many Microsoft enterprise IT conferences, but this one is different from any other I’ve attended.

Granted, there has been plenty of corporate speak, the occasional mind-numbing speaker (note to self: pacing, mumbling, and having no slides to help listeners follow you is not a good idea), and the obligatory sales pitches (from both Microsoft and FAST speakers). However, a number of things jumped out at me this week that may reflect changes coming to Microsoft or may just be anomalies which the corporate immune system will soon correct:

  • FASTForward 2009 had the coolest Contoso and Litware demos I've ever seen at a Microsoft Conference. For those of you not familiar with these names, Contoso and Litware are pseudo-companies Microsoft often uses in mock-ups to show off their technology. For example, instead of the staid, boring default SharePoint themes showing how Litware manages their documents, this week we saw a cool website featuring David Bowie!
  • It is strange to be at a Microsoft conference and hear words assuring customers that their investments in Linux and Unix-based solutions are safe.
  • Although I’ve heard Microsoft people refer to “user experience” at previous conferences (usually in passing), this week’s speakers talk about user experience like they mean it.
  • I haven’t seen any bloggers make note of this, but in Clay Shirky’s keynote Monday afternoon, products from Microsoft competitors (IBM’s Dogear and Apple’s iPod) played important roles in his examples. I am sure someone from Microsoft reviewed these slides but they were left in anyway. Maybe not a big deal but there are many companies that would not have allowed this, regardless if they were expressing interesting ideas.

Time will tell if these changes extend beyond this conference. Do they reflect a change at Microsoft? Are they a reflection of an aging company going through a mid-life crisis trying to find its new identify? What do you think?

February 10, 2009

FAST Search for SharePoint

Blogger: Larry Cannell

A little over a year after Microsoft acquired FAST the company outlined plans today for incorporating the high-end search technology with SharePoint in a forthcoming feature called “FAST Search for SharePoint.” The official press release is available here.

The high points of the FAST/SharePoint roadmap include:

  • SharePoint “14” provides the basis for this deep integration with FAST.
  • FAST Search for SharePoint will be included with Microsoft Office SharePoint Server Enterprise Edition. However, customers will also need to purchase server-based licenses for all required FAST servers (this is the first SharePoint capability, which I can think of, that requires both client and server CALs).
  • Customers who want to use FAST with their SharePoint environment now can purchase “ESP for SharePoint.”  This does not provide the same capabilities expected with FAST Search for SharePoint but is still more sophisticated than SharePoint’s search (and likely to be similar to the free FAST webparts announced last year).
  • Beta testing for Fast Search for SharePoint will be aligned with SharePoint “14.”

In an analyst meeting at the FASTforward 2009 conference yesterday, Microsoft briefly showed a couple of sample screenshots of mock-ups illustrating how FAST will integrate with SharePoint “14.” The first one was a faceted navigation UI showing facets in a left side menu (with “exact counts” next to each facet) on a page also showing related search queries, and featured content.

The second example showed an integration with SharePoint people search, providing the capability for phonetic name lookup, organization-based browsing, recently authored content, expertise identification, and filters using focus expertise, and other attributes.

Also revealed was the company’s long-term plans for FAST to provide the foundation for all enterprise search technologies from Microsoft. However, for the “14” release SharePoint search technologies will still provide the basis for Search Server Express, search in the standard edition of SharePoint, and Search Server.

January 30, 2009

Is Enterprise Search Ripe for Open Source Disruption?

Blogger: Larry Cannell

This past week we’ve seen some attention-getting news from both Autonomy (buying Interwoven) and Microsoft (John Lervik, the FAST CEO is resigning). What you may have missed in all of the hubbub was an announcement about Lucid Imagination. This is a startup which recently received funding to compete in the enterprise search market by providing commercial support for the open source search solutions Lucene and Solr.

In a Burton Group report published last year we highlighted Lucene as an example of an innovative open source product. However, it doesn’t yet have a significant presence in the enterprise market. Lucid Imagination is out to change that.

I should note that Lucene is not an application. Rather, Lucene is a set of Java functions that provide indexing capabilities, which applications can use. It’s not a free enterprise search product that you just download, install, and run like, for example, Wordpress, an open source blogging platform.

Lucid Imagination will provide support for both Lucene and Solr (a younger open source project which extends Lucene). Together they are an indexing engine with a web service interface and frameworks for building search capabilities. These can be used for creating standalone enterprise search sites but may be more valuable for adding search capabilities to vertical applications.

This is not just simple google-style search too. In addition to a number of other capabilities, Solr extends Lucene by enabling the use of faceted search, a compelling approach to information retrieval that is quite powerful. If you are interested in learning more, the Lucid Imagination site has a good example of faceted search. They’ve indexed a number of the Lucene and Solr project websites and provide a faceted search interface to browse across all of them.

I think Lucid Imagination’s approach of supporting developers leveraging their search capabilities, may be a good strategy. First, it would be hard to disrupt a market that already has a low-cost solution like the Google Mini. In addition, this approach provides Lucid opportunities for revenue streams from consulting and project work, buying them time while they continue enhancing core Lucene and Solr capabilities and building a community. In my opinion, this is key. Commercial open source companies based on existing projects must be a leader in the project’s community and this approach may have a better chance of strengthening, even building a larger, more sustainable Lucene/Solr community.

Open source communities can be fabulous sources of innovation, if they are able to sustain themselves. One of the keys to success for an open source community is motivating others to join in and be willing to compete on features which extend the core product. In this case this could mean (instead of competing on the building blocks of search) competing on things such as user experience, navigation, and supporting new types of “content.” This includes, of course, multimedia content but might also include adding faceted search capabilities to applications with structured data sources too.

Lucid Imagination is an interesting company to keep your eye on. But I think the key will be to watch the Lucene/Solr community. If it grows then it’s likely Lucid will too.

December 24, 2008

Is an Intranet Infrastructure or Application?

Blogger: Larry Cannell

While keeping the driveway clear of snow and battling slippery roads to finish some last minute holiday shopping I’ve also been thinking about a report I am working on (to be published in the February/March time frame). The topic is intranets and the role they play in an enterprise. It seems to me the intranet is something many of us take for granted but its importance has changed dramatically over the 10-15 years they have been around.

Most definitions of intranets describe intranets in terms of technology. For example, the Wikipedia entry for intranet starts out this way:

An intranet is a private computer network….

In my opinion, saying intranets are just technology is similar to Henry Ford saying “Any customer can have a car painted any colour he wants so long as it is black.” At the time Ford said this he was only looking at cars as technology, but they mean so much more to us. Perhaps our prevailing view of intranets is as mature as cars were in 1909 (when Ford said this).

We discount the importance of intranets because, at one time, they were simply a bunch of technologies. They were pieces of infrastructure. Just deploy a few intranet technologies (like maybe a portal, a web content management system, or even collaborative workspaces), similar to how we might install a router, and they will simply pay for themselves.

This is where we got it wrong. In my opinion, intranets should not be treated as infrastructure. They should be treated like a suite of applications which support the most important processes used within the enterprises. Intranets support how we work online, and this is something we all should feel strongly about since it impacts our personal and organizational effectiveness more than any other set of tools we use.

When NCSA launched Mosaic browser in 1993 it was a rudimentary client, just good enough to get us thinking about the potential of an interconnected web. A couple of years later the Apache web server project was born out of another NCSA project and we were off making websites and demonstrating how easy it is to connect everyone to the same information.

Also taking place in the late 1990s was the rise in use of client/server e-mail systems, like Microsoft Exchange (version 5.0 was released in 1997). In many cases, Microsoft Exchange or Lotus Notes became the standard e-mail system providing the first reliable peer to peer enterprise communication system.

Deploying these e-mail systems was fairly straightforward and success soon followed since most everyone already knew, or were quickly learning, how to send and receive e-mail. The first experiments in using the web were similarly, although more narrowly, successful. These private applications of popular Internet technologies demonstrated to us that this stuff worked.

However, for many enterprises these early successes came from the IT infrastructure group. This may be the source of the problems we have today, with intranets that don’t seem to add any value (other than being technologies which connect our office computers to the Internet). Since these efforts grew out of IT infrastructure groups many of our intranet efforts stayed within them and were also considered IT infrastructure.

I’m not blaming IT infrastructure groups (I worked in one for ten years). But, intranets should no longer be treated as infrastructure. Infrastructure is technology that is well understood and could be considered a commodity (paint it black, who cares?). Intranet technologies are far from commodities. E-mail, yes, might be considered a commodity. But, for example, the use of collaborative workspaces is still quite immature and not at all close to being commoditized.

Rather, intranets should be treated as a portfolio of applications that are owned and funded by an organization and has a roadmap for improvements based on real, documented needs. Intranet technologies are used by people and understanding how people work is a touchy-feely sort of thing that infrastructure groups aren’t good at doing (for that matter, it’s one reason why some people work in an infrastructure group, to get away from the touchy-feely).

What do you think? Are intranets infrastructure or applications?

February 20, 2008

Microsoft's Vision for FAST Search

Blogger: Guy Creese

I'm currently attending the FAST Search user conference (FASTforward 08), and yesterday Jared Spataro from Microsoft explained Microsoft's reasoning for buying FAST. (The shareholders have approved the deal, but it has not yet been completed.) He noted that the questions he gets always fall into three categories. Here are my notes from his speech:

  • What’s in it for Microsoft: 18 months ago, Microsoft thought search was a set of features. Microsoft finally got religion during the SharePoint 2007 project. Consequently, a specialized team was formed to target search, and was made part of the SharePoint team. FAST came up to Microsoft right after FASTforward 2007 and demoed their capabilities. So Microsoft has been aware of FAST and its capabilities for over a year. Microsoft saw three differentiators with FAST--vision, people, technology—which is why it bought FAST rather than other vendors.
  • What’s in it for FAST: Combine FAST’s depth (visionary innovations, passionate people, best-in-class technologies) and Microsoft's breadth (SharePoint momentum, complementary infrastructure technologies, sales and marketing engine) to gain faster adoption of FAST’s technologies.
  • What’s in it for customers: FAST will continue to pursue both monetization (customer-facing, revenue-producing search applications) and enterprise (employee-facing, productivity-enhancing applications) segments. Customer-led innovation will continue. Customers can expect cross-platform support and innovation to continue.

I thought the most interesting part was the last bullet: Microsoft's comments that it does not plan to disrupt FAST's current segmentation strategy, that high-touch customer engagement will continue, and that it will support and extend search running on non-Windows platforms.

January 08, 2008

Microsoft Buys FAST; Last Year It Was BI, This Year It's Search

Blogger: Guy Creese

Microsoft announced today that it was buying Fast Search & Transfer, the Norwegian enterprise search firm, for approximately $1.2 billion. It looks like pretty much a done deal, in that FAST's Board of Directors is recommending the acquisition and the two largest shareholders are on board (per a ZDNet blog post).

FAST went into an operational meltdown last year (see Forbes article), with writedowns, layoffs, and the exiting of many longtime U.S.-based employees. This probably helped decrease the purchase price, and Microsoft seized the moment. While the operational wheels fell off, the FAST technology is strong at its core.

That FAST would be acquired is not surprising; Bjorn Olstad, the CTO, commented in a meeting I was at last year that the infrastructure players (IBM, Microsoft, Oracle) would increasingly encroach on FAST's space. In short, FAST was well aware of the challenges ahead, and sounded like it was amenable to being acquired. What surprised me was that Microsoft bought FAST; I always thought it would be Oracle, for a variety of reasons.

Last year, we saw the infrastructure players absorb business intelligence (Oracle bought Hyperion, SAP bought Business Objects, IBM bought Cognos); this year it will be search. At this point, Autonomy is the only large best-of-breed player left standing, and it will have a hard time going it alone.

This is a huge coup for Microsoft in the enterprise search space. After futzing around for years, Microsoft finally started to get serious with search in SharePoint 2007. It's not perfect--clients have started to tell me the boundary conditions they're running into--but it's a lot better than search was in SharePoint 2003. If you split the search market into three sectors: (1) cheap and OK, (2) relatively inexpensive and an 80% solution, and (3) expensive and sophisticated, Microsoft is targeting tier two with SharePoint Search. Microsoft Search Server 2008 Express is its answer to tier one (see my previous blog post here), and the FAST acquisition is its answer to tier three.

Strategically, Microsoft now has all the bases covered (and, as a nice side benefit, prevented IBM and Oracle from adding FAST to their arsenal). Now, of course, it has to execute, which is always easier said than done.

If you go down the list of competitors, they're now coming up short:

  • Autonomy: Starting to look a bit stranded as the remaining, large, best-of-breed vendor, and strong only in tier three. Great for publishers and specialized niche search, but too expensive for general deployment.
  • Oracle: Has a solid tier two offering with Secure Enterprise Search, but nothing for tiers one and three.
  • IBM: A lot of strong technology, but oriented more for integrators (think IBM Global Services) than for an off-the-shelf purchase. It doesn't help that there are internal organizational walls to overcome. Search falls within the Information Management division at IBM, while collaboration is controlled by the IBM Lotus division.
  • Google: A strong offering for tier one (Google Search Appliance), but nothing for tiers two and three.

To sum up, Microsoft just became a one-stop shop vendor for enterprise search.

November 06, 2007

Microsoft's Free Enterprise Search: Game Change in Progress

Blogger: Guy Creese

Today Microsoft announced two new search servers: Microsoft Search Server 2008 and Microsoft Search Server 2008 Express.  The interesting one is the Express version, since it's free. (A download is available -- warning: it's beta software, although Microsoft says it's close to final -- at http://www.microsoft.com/enterprisesearch/.) Furthermore, the only difference between it and Search Server is that Search Server can be run on multiple servers for load balancing.

This announcement is a game-changer, in terms of Microsoft's relation to its arch competitor, Google, as well as the market in general. In several calls I had with reporters, I was asked, "Is this Microsoft playing catch up to the Google Search Appliance?" Yes, but with a twist -- while Microsoft is not offering a search appliance itself, it certainly expects its partners to do so. However, the Microsoft offering is a bit more nuanced than the Google Search Appliance. The Google appliance pretty much comes in any color you want, as long as it's black. The static nature of the appliance often becomes an issue over time -- at least according to the clients I talk to -- because as they get more sophisticated about search they want to have greater control over the search results. Google has recognized this issue and continues to roll out more tunable knobs as a way to counteract this problem. However, the reality is you have to take the appliance the way Google has configured it.

Microsoft, by depending on partners, can offer an infinitely variable set of appliances: tweaking this knob and adding these connectors before delivering an appliance to a customer. In my view, it will not be long before the market sees a search appliance for law firms or a search appliance for manufacturers, all based on Microsoft Search Server Express. Furthermore, the appliance will be supported by a local partner with a hand you can shake. In my view, this combination of customization and high touch will allow Microsoft to get broader penetration than Google has been able to get with a static appliance and low touch (which, to be fair, is still thousands of installations).

In terms of the market at large, this is Microsoft commoditizing the market, something it has done before. Back in 1998, Microsoft entered the BI market with products such as SQL Server 7 and OLAP Services (now called Analysis Services), going against best-of-breed vendors such as Business Objects and Cognos. While the best-of-breed vendors remain (although Business Objects was just bought by SAP), Microsoft has carved out a good chunk of this market. In 2006, the BI portion of Microsoft's business generated revenue of $480 million and was growing at 28%.

A decade later, Microsoft is doing the same thing with search, offering Microsoft Search Server 2008 Express at the entry level product, as well as a migration path (Microsoft Search Server 2008 for load balancing, and SharePoint Search for SharePoint installations).

This set of products will be a boon to Microsoft partners and SMBs, and be an alternative to enterprises who thought the Google Search Appliance was the only low cost enterprise search solution offered by a major vendor. While even Microsoft admits it will be a long slog getting enterprise search into most businesses, it has taken its first step -- and based on past history, figures slow and steady will win the race.

  • Burton Group Free Resources Stay Connected Stay Connected Stay Connected Stay Connected


Catalyst Conference 2009


Blog powered by TypePad