- Austin, TX
- http://www.gpfreitas.net
Stars
OpenRefine is a free, open source power tool for working with messy data and improving it
Example code from Learning Spark book
Stream summarizer and cardinality estimator.
A platform for visualization and real-time monitoring of data workflows
A Java package to automatically detect anomalies in large scale time-series data
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
Hadoop library for large-scale data processing, now an Apache Incubator project
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
A sample maven-enabled pig project complete with example of unit tests on UDF using junit and pig scripts using pigunit.