Big Data companies come in many different shapes and flavors. In fact you might say, a list of Big Data companies necessarily contains vendors with highly contrasting strategies – clearly, the analytics market is in rapid flux.
Standards? Kind of. Not exactly. Depends who you ask.
It has been just seven years since Yahoo introduced Hadoop but the concept behind it, Big Data, has exploded in popularity as more and more firms launch pilot programs to gain insight from the massive amounts of data at their disposal.
Big Data has matured differently than most technologies, however. First, no one leader has emerged after nearly a decade. The analytics industry is still in growth mode, and leaders emerge when an industry consolidates.
Secondly, the big names got in the market early in a big way. That's also unprecedented, because established vendors have traditionally been notoriously slow to embrace a new technology. But already, IBM, Microsoft, SAP, HP, and Oracle are in the game.
So, which tools and platforms should you choose? Here are 25 of the top companies to consider in the Big Data world.
Please note: this list is NOT a ranking – the strategies are too different. So company number 7, for instance, is not a “better” Big Data vendor than company number 20.
Big Data Companies: The Leaders
Originally spun out of Stanford University as a research project, Tableau started out by offering visualization techniques for exploring and analyzing relational databases and data cubes and has expanded to include Big Data research. It offers visualization of data from any source, from Hadoop to Excel files, unlike some visualization products that only work with certain sources, and works on everything from a PC to an iPhone.
2) New Relic
New Relic uses a SaaS model for monitoring Web and mobile applications in real-time that run in the cloud, on-premises, or in a hybrid mix. It uses more than 50 plug-ins from technology partners to connect to its monitoring dashboard. The plug-ins include PaaS/cloud services, caching, database, Web servers and queuing. Its Insights software for analysis works across the entire New Relic product line, and the company offers a product called Insights Data Explorer that is designed to make it easier for everyone on a software team to explore Insights events.
Alation crawls an enterprise to catalog every bit of information it finds and then centralizes the organization's knowledge of data, automatically capturing information on what the data describes, where the data comes from, who's using it and how it's used. In other words, it turns all your data into metadata, and allows for fast searches using English words and not computer strings. The company's products provide collaborative analytics for faster insight, a unified means of search, provides a more optimized data structure of the company's data, and assists in better data governance.
Teradata has built a portfolio of Big Data apps into what it calls its Unified Data Architecture, which includes Teradata QueryGrid, Teradata Listener, Teradata Unity and Teradata Viewpoint. QueryGrid provides a seamless data fabric across new and existing analytic engines, including Hadoop. Listener is the primary ingestion framework for organizations with multiple data streams, Unity is a portfolio of four integrated products for managing data flow throughout the process, and Viewpoint is a custom Web-based dashboard of tools to manage the Teradata environment.
VMware has incorporated Big Data into its flagship virtualization product, called VMware vSphere Big Data Extensions. BDE is a virtual appliance that enables administrators to deploy and manage the Hadoop clusters under vSphere. It supports a number of Hadoop distributions, including Apache, Cloudera, Hortonworks, MapR and Pivotal.
Splunk Enterprise started out as a log analysis tool but has since expanded its focus and now focuses on machine data analytics to make the information useable by anyone. It can monitor online end-to-end transactions, study customer behavior and usage of services in real time, monitor for security threats, and identify spot trends and sentiment analysis on social platforms.
Besides its mainframe and Power systems, IBM offers cloud services for massive compute scale through its Softlayer subsidiary. On the software side, its DB2, Informix and InfoSphere database software all support Big Data analytics and Cognos and SPSS analytics software specialize in BI and data insight. IBM also offers InfoSphere, the basic platform for building data integration and data warehousing used in a BD scenario.
Formerly known as WebAction, Striim is a real-time, data streaming analytics software platform that reads in data from multiple sources such as databases, log files, applications and IoT sensors and allows customers to react instantly. Enterprises can filter, transform, aggregate and enrich data as it is coming in, organizing it in-memory before it ever lands on disk.
SAP's main Big Data tool is its HANA in-memory relational database, which the company says can run analytics on 80 terabytes of data and integrates with Hadoop. Although HANA is a row-and-column database, it can perform advanced analytics, like predictive analytics, spatial data processing, text analytics, text search, streaming analytics, and graph data processing and has ETL (Extract, Transform, and Load) capabilities.
While some companies specialize in one or few sources of data, SAP deals with data from a wide range of sources, including data from sensors, machine logs and other equipment; human generated data – social, point of sale (POS), ERP, emails documents and other things that make up enterprise data.
10) Alpine Data Labs
A creation of Greenplum employees, Alpine Data Labs puts an easy-to-use advanced analytics interface on Apache Hadoop to provide a collaborative, visual environment for building analytics workflow and predictive models that anyone can use, rather than requiring a high-priced data scientist to program the analytics.
Oracle has its Big Data Appliance that combines an Intel server with a number of Oracle software products. They include Oracle NoSQL Database, Apache Hadoop, Oracle Data Integrator with Application Adapter for Hadoop, Oracle Loader for Hadoop, Oracle R Enterprise tool, which uses the R programming language and software environment for statistical computing and publication-quality graphics, Oracle Linux and Oracle Java Hotspot Virtual Machine.
Calling itself the leader in self-service data analytics, Alteryx's software is meant for the business user and not the data scientist. It allows them to blend data from multiple and potentially disparate sources, analyze it and share it so that actions can be taken. Queries can be made from anything from a history of sales transactions to social media activity.
13) Splice Machine
Splice Machine bills itself as the provider of the only Hadoop relationship database management system (RDBMS). It can act as a general-purpose database that can replace Oracle, MySQL or SQL Server databases for various workloads on Hadoop. The latest version, 2.0, added Spark, which does all analytics in memory instead of on disk. Version 2.0 also added the ability to route work to one of two processing engines either OLTP or OLAP.