hadoopecosystemtable.github.io - The Hadoop Ecosystem Table

Description: Hadoopecosystemtable.github.io : This page is a summary to keep the track of Hadoop related project, and relevant projects around Big Data scene focused on the open source, free software enviroment.

Example domain paragraphs

The Apache Apex platform is supplemented by Apache Apex-Malhar, which is a library of operators that implement common business logic functions needed by customers who want to quickly develop applications. These operators provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, Redis, HBase, CouchDB and other databases along with JDBC connectors. The library also includes a host of other common business logic patterns

In addition to H2O’s point and click Web-UI, its REST API allows easy integration into various clients. This means explorative analysis of data can be done in a typical fashion in R, Python, and Scala; and entire workflows can be written up as automated scripts.

Sparkling Water combines two open source technologies: Apache Spark and H2O - a machine learning engine.  It makes H2O’s library of Advanced Algorithms including Deep Learning, GLM, GBM, KMeans, PCA, and Random Forest accessible from Spark workflows. Spark users are provided with the options to select the best features from either platforms to meet their Machine Learning needs.  Users can combine Sparks’ RDD API and Spark MLLib with H2O’s machine learning algorithms, or use H2O independent of Spark in the m