Hadoop has become nearly synonymous with Big Data. It’s the database engine that distributes and manages workloads in a way that makes Big Data possible. Okay, that’s a major oversimplification, but you get the general idea.
Hadoop’s power is represented by the top-flight Big Data startups using it, such as Cloudera, Hortonworks, and MapR, which all offer commercial distributions. Hadoop is at the core of important projects at major companies, such as those at Facebook, Yahoo!, and Amazon. In fact, Hadoop providers claim that more than half of the Fortune 50 is already using Hadoop.
Of course, Hadoop isn’t the only game in town. There are Big Data alternatives like Disco, Storm, and proprietary systems from Software AG, LexisNexis, ParStream, and others. Anytime technology takes off the way Hadoop has there will be kinks and pain points along the way, opening the door for even more innovation.
But Hadoop is getting the buzz, and many IT professionals wonder what all the fuss is about. Is Hadoop really that big of a deal?
The short answer is a resounding “yes.” Hadoop is a driving force in Big Data infrastructure, and many of the alternatives basically build on what Hadoop has already achieved, solving various headaches that you may or may not care about, depending on how you intend to use it.
The Big Data space may eventually evolve away from Hadoop, but, either way, no one can deny that Hadoop played a starring role in triggering the Big Data revolution.
Here are three examples where Hadoop’s impact could actually improve people’s lives:
The Climate Corporation leverages Hadoop to help farmers cope with climate change
Unless you’re a climate-change denier (and probably also think the moon landing was staged in Hollywood), it’s pretty obvious that farmers worldwide will need to adapt to climate change quickly. If you live in California, this fact is made even clearer by our record drought.
Climate Corporation is building out a system using MapR’s distribution of Hadoop that will, hopefully, better predict weather patterns for the coming years. The company has built a system using Hadoop that creates weather projections for the next two years at every 2.5 by 2.5 kilometer grid across the U.S. They’ve mapped out the most likely 10,000 outcomes per location using different variations of likely patterns to create a probabilistic view of weather.
I should note here that Climate Corp. was acquired by Monsanto in 2013 for more than $1 billion, so any green-washing critiques you want to make certainly have some legs, but that doesn’t minimize what Climate Corp. is doing.
“We are proud of using Hadoop to provide a class of weather insurance for farmers never before available and to do it in a way where, with index-based weather insurance, farmers have access to an independently sold product that changes how they manage risk. Since 85 percent of farmers’ risks are weather related, this is our impact on the world,” said Andy Mutz, director of engineering for Climate Corp.
Climate Corp. uses Hadoop to help simulate weather and to create risk portfolios that they sell to risk/underwriting partners. The goal is to help farmers understand the risks of their practices and to help them reduce those risks by changing their practices and by helping underwrite weather insurance against adverse effects. Hadoop is a central part of the weather simulation process and also is central to the process of aggregating financial data to create risk portfolios that Climate Corp. then sells to partners.
The Durkheim Project combats suicide in the military
Suicide is an issue the U.S. military has struggled with for years. In 2012, a record number of 349 military suicides took place, which far exceeded the number of American combat deaths in Afghanistan for the same year. The rate of military suicides is roughly double those of adults in the general U.S. population.
To get better insight into the problem, Predictive analytics firm Patterns and Predictions (P&P) created the Durkheim Project. Built on top of Cloudera's distribution of Hadoop, the project uses an array of advanced analytics, real-time predictive modeling, and machine learning, all of which work in concert to identify critical correlations between veterans’ communications and suicide risk.
“One of the promises of Big Data in this case is that you can shorten the distance between the people who need help and the system that can get them help," said P&P founder Chris Poulin.