storm-applications

A collection of real-time applications built with Apache Storm.

Applications

Wordcount (WC)

The classic example of big data applications, the wordcount application was extracted from storm-stater. It is composed of a bolt for splitting sentences into words and another one for counting the number of occurrences for each word in a hashmap.

Bargain Index (BI)

This applications was taken from papers about the System S (IBM InfoSphere Streams). First, the VWAP (Volume Weighted Average Price) is calculated from a stream of trades, then another bolt receives both the VWAP and another stream of quotes and calculates a bargain index that tells if it is a good idea to buy the quote that is being offered and how good it is.

Fraud Detection in Credit Card Transactions (FD)

Outlier Detection in Computer Network (MO)

Spike Detection in Sensor Network (SD)

Tracks measurements from a set of sensor devices, calculates the moving average of these measurements and checks if the current readings are above a certain threshold in relation to the moving average, if so, an alert is emitted.

Sentiment Analysis for Twitter (SA)

Calculates the sentiment score for each tweet and produces a summary per state. Uses a very basic algorithm that counts occurrences of good and bad words in the message to calculate the score.

VoIPSTREAM (Spam Detection in VoIP) (VS)

VoIPSTREAM is an application composed of a set of filters and modules that are used to detect telemarketing spam in Call Detail Records (CDRs). A detailed description of the application can be found in the paper that describes an on-demand time-decaying bloom filter.

Ads Analytics

Calculate the current Click-Through Rate (CTR) for pairs of query and ad. Predicts the probability of a given ad being clicked given a set of features, such as the query, position of the ad, the advertiser, etc.

Reinforcement Learner (RL)

Reinforcement learning in the context of ads can be employed as a way of maximizing the CTR by choosing the ad or ads with highest profit. As the time goes by, an ad may be replaced by other ads as a response to a decreasing CTR.

Spam Filter for Emails (SF)

Log Processing (LP)

Click Analytics (CA)

Datasets

Application	Source	Size
WC	Project Gutenberg	~8GB
BI	Yahoo Finance, Google Finance	—
SD	Intel Berkeley Research Lab	150MB
TT, SA	Twitter Streaming	—
MO	Google Cluster Traces	36GB (compressed)
CA, LP	1998 World Cup Web Site Http Logs	104GB
SF	TREC 2007 Public Spam Corpus	547MB (labeled)
	SPAM Archive by Bruce Guenter	~1.2GB (spam only)
	Enron Email Dataset	2.6GB (raw)
	Enron Spam Dataset	50MB (labeled)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ads-analytics		ads-analytics
bargain-index		bargain-index
base		base
bin		bin
click-analytics		click-analytics
credit-card-fraud		credit-card-fraud
linear-road		linear-road
log-processing		log-processing
machine-outlier		machine-outlier
reinforcement-learner		reinforcement-learner
spam-filter		spam-filter
spike-detection		spike-detection
trending-topics		trending-topics
twitter-sentiment		twitter-sentiment
voipstream		voipstream
word-count		word-count
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

storm-applications

Applications

Wordcount (WC)

Trending Topics (TT)

Bargain Index (BI)

Fraud Detection in Credit Card Transactions (FD)

Outlier Detection in Computer Network (MO)

Spike Detection in Sensor Network (SD)

Sentiment Analysis for Twitter (SA)

VoIPSTREAM (Spam Detection in VoIP) (VS)

Ads Analytics

Reinforcement Learner (RL)

Spam Filter for Emails (SF)

Log Processing (LP)

Click Analytics (CA)

Datasets

About

Releases

Packages

positivepsycho/storm-applications

Folders and files

Latest commit

History

Repository files navigation

storm-applications

Applications

Wordcount (WC)

Trending Topics (TT)

Bargain Index (BI)

Fraud Detection in Credit Card Transactions (FD)

Outlier Detection in Computer Network (MO)

Spike Detection in Sensor Network (SD)

Sentiment Analysis for Twitter (SA)

VoIPSTREAM (Spam Detection in VoIP) (VS)

Ads Analytics

Reinforcement Learner (RL)

Spam Filter for Emails (SF)

Log Processing (LP)

Click Analytics (CA)

Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages