Hadoop is an open source software that is gaining popularity
because of its ability to handle massive amounts of data. A mathematician hired
by Facebook was asked to look at data from the network to get a better
understanding of what people are doing on Facebook and how it can be improved. Since
this network is extremely large, they struggled to find a software to handle
that much data, however Hadoop could. Hadoop utilizes several different
computer servers to store and process data, allowing it to work with extremely large datasets. Many technology companies began using this software along with
others, but in the future, it will probably be used on its own at companies once their data is not fit for relational databases anymore. Hadoop has the potential to be very influential in the business world.
When it was first developed Hadoop used a “batch system”, which was inefficient and could take days to respond if the program was complex. Eventually Greenplum, a data analytics company, developed a “query engine” that was expected to enhance performance significantly. Unfortunately when using a query engine, if one machine is lost the whole query must start over. One bonus of the initial model was that when servers failed, the job could continue. However, even though the query must restart whenever a machine breaks, the extent of the heightened performance makes using a query engine is worth it. Companies will continue to keep traditional databases around, but Hadoop will likely be a new addition for many.