1.What is Hadoop and why is it a big deal ?
Hadoop is a software framework that allows users to change the application code to customize it to their big data analytic needs. Hadoop is an Apache open source being developed by a world wide community of developers. It is based around earlier work done by Google on their MapReduce application. MapReduce allows distributed computing of large data sets on a cluster of computers. Hadoop is a big deal because it is a open source project that allows custom code that allows business to analyze complex data sets that otherwise would have been hard to make sense of using standard data tables. Hadoop is used by a lot of different big companies but most business are not ready to use it just yet because of the high level analytic expertise and training it requires.
2. Who are Cloudera?
It is a company that specializes in Apache Hadoop software and support services around it on an enterprise level they also contribute to Apache projects related to Hadoop. Cloudera offers two products; the first which is Cloudera Enterprise and the other being Cloudera's Distribution including Apache Hadoop.
3.What is PIG?
It is a high level data flow language used in conjunction with Hadoop. The language is called Pig Latin and it is a form of Java that allows for fast ad-hoc analysis of data sets. Users can create their own functions for special purpose data processing.
4.What is HIVE ?
HIVE functions as a data warehouse that allows for query based analysis of larger data sets. It uses a SQL like languange for its queries.It functions along side Hadoop files systems and just like Hadoop it is open-source and apache developed.
5. What is Cassandra?
Cassandra is an Apache open source database management system. It can handle large volumes of data that is spread out around many different servers. It started out as a way for Facebook to power their inbox search function. It uses NoSQL because traditional SQL based databases can be slow when dealing with big data sets.
6. What is Mahout ?
Mahout it is a
suite of machine learning libraries that is designed to be
scalable and robust. it is another Apache open source project that is degined to work with Hadoop. Hadoop is associated with big data and Mahout is the word for a person driving an elephant. The elephant is Hadoop and Mahout wants to be the driving force behind it, but not lead the development of Hadoop.