Day 1 Reflection

Day 1 Reflection

by Amber Frazier -
Number of replies: 0

Hadoop is an open source software that is gaining popularity because of its ability to handle massive amounts of data. A mathematician hired by Facebook was asked to look at data from the network to get a better understanding of what people are doing on Facebook and how it can be improved. Since this network is extremely large, they struggled to find a software to handle that much data, however Hadoop could. Hadoop utilizes several different computer servers to store and process data, allowing it to work with extremely large datasets. Many technology companies began using this software along with others, but in the future, it will probably be used on its own at companies once their data is not fit for relational databases anymore. Hadoop has the potential to be very influential in the business world.

When it was first developed Hadoop used a “batch system”, which was inefficient and could take days to respond if the program was complex. Eventually Greenplum, a data analytics company, developed a “query engine” that was expected to enhance performance significantly. Unfortunately when using a query engine, if one machine is lost the whole query must start over. One bonus of the initial model was that when servers failed, the job could continue. However, even though the query must restart whenever a machine breaks, the extent of the heightened performance makes using a query engine is worth it. Companies will continue to keep traditional databases around, but Hadoop will likely be a new addition for many.