Tuesday, December 31, 2013

How to build Netflix RSS Reader Example on Centos 6 with Amazon AWS

Instructions for installing Netflix RSS Reader on Amazon EC2

In the last week of December 2013, I built and installed the Netflix example RSS Reader application through following these instructions on the Netflix recipes-rss wiki.  See also the Netflix Tech Blog for an overview.

There were a lot of dead ends and rabbit warrens involved in this process as there are a lot of components to get up and running, including: infrastructure, networking, JDK, Gradle, source code, Tomcat, Jetty.

Hopefully these step-by-step instructions make it easier for someone.

It's worth noting a couple of default assumptions that aren't immediately clear that do actually make your life easier:
  1. All 3 services (Eureka, RSS Middletier, RSS Edge) are installed on the same host - you don't need to create 3 separate machines and network them all together
  2. The RSS Reader example does not require Cassandra but uses in-memory data storage by default (connecting to Cassandra is optional)
The first set of instructions below are how to get it running on a single instance.  The follow-on instructions describe how to scale it out to separate clusters of nodes - which is where Turbine/Hystrix gets interesting.

On to the instructions...

Basic Single-Instance Installation


# Instructions prefixed with "###" are dead-ends I went down - may save you time to skip them.

# Create the machine

# Create an EC2 Instance in the AWS console with the following:
#   AMI base image: Centos 6 x86_64 with updates
#   Instance: m1.small (WARNING: t1.micro's 600M RAM is insufficient)
#   Create a security group and save the key. Login to your new instance as root with the downloaded key.



# Setup the networking configuration to allow the services to talk to each other and allow you to browse to them:

# Configure AWS security Group: 
#   Open TCP Input ports: 22, 80, 9090, 9092, 9191, 9192
#   Ideally but optionally expose them only to your IP address instead of the whole world (0.0.0.0/0)

# Configure iptables:
# Flush all existing rules
iptables -F
# Block null recon packets
iptables -A INPUT -p tcp --tcp-flags ALL NONE -j DROP
# Reject syn-flood attack
iptables -A INPUT -p tcp ! --syn -m state --state NEW -j DROP
# Allow loopback for internal services
iptables -A INPUT -i lo -j ACCEPT
# Open ports 22 (ssh) & 80 (http)
iptables -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
# Open ports 9090 & 9092 for RSS Reader Edge webserver
iptables -A INPUT -p tcp -m tcp --dport 9090 -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 9092 -j ACCEPT
# Open ports 9191 for RSS Reader Middletier webserver
iptables -A INPUT -p tcp -m tcp --dport 9191 -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 9192 -j ACCEPT
# Allow outgoing connections
iptables -I INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow all outgoing connections
iptables -P OUTPUT ACCEPT
# Drop everything else
iptables -P INPUT DROP

iptables -L -n
service iptables save
service iptables restart

# For better security, consider leaving port 80 closed and forwarding requests on port 80 to port 8080. Then, in the Tomcat instructions below, we could leave Tomcat on the default 8080 port without requiring "root" user

# iptables -A PREROUTING -t nat -i eth1 -p tcp --dport 80 -j REDIRECT --to-port 8080
# iptables -A INPUT -p tcp -m tcp --dport 8080 -j ACCEPT



# Install JDK

### I shouldn't have done this first step... it only installs the JRE... Gradle needs the JDK
### yum install -y java-1.7.0-openjdk.x86_64

# Install Oracle JDK as per: http://parijatmishra.wordpress.com/2013/03/09/oraclesun-jdk-on-ec2-amazon-linux/
### Remove OpenJDK. Hopefully not required if you didn't do the above "yum install java-1.7.0-openjdk.x86_64"
### rpm --erase --nodeps java-1.7.0-openjdk java-1.7.0-openjdk-devel
yum install -y wget
# There's probably a better way to get the most recent JDK - but these instructions made it easy
wget --no-check-certificate --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2Ftechnetwork%2Fjava%2Fjavase%2Fdownloads%2Fjdk-7u3-download-1501626.html;" http://download.oracle.com/otn-pub/java/jdk/7u25-b15/jdk-7u25-linux-x64.rpm
mv jdk-7u25-linux-x64.rpm\?AuthParam\=1388300280_9fd087722658cfbb8e571f2d0449beea jdk-7u25-linux-x64.rpm
yum install -y jdk-7u25-linux-x64.rpm 

for i in /usr/java/jdk1.7.0_25/bin/* ; do \
 f=$(basename $i); echo $f; \
 sudo alternatives --install /usr/bin/$f $f $i 20000 ; \
 sudo update-alternatives --config $f ; \
done

cd /etc/alternatives
ln -sfn /usr/java/jdk1.7.0_25 java_sdk
cd /usr/lib/jvm
ln -sfn /usr/java/jdk1.7.0_25/jre jre
# JAVA_HOME must be set for Gradle to work
echo "export JAVA_HOME=/usr/java/jdk1.7.0_25" >> .bashrc
. ~/.bashrc


### Install Gradle - may not be required as the Netflix build steps below use self-contained "gradlew" script (which downloads Gradle)
### curl -O "http://downloads.gradle.org/distributions/gradle-1.10-all.zip"
### yum install -y unzip
### cd /opt
### unzip ~/gradle-1.10-bin.zip 
### cd
### echo "export PATH=$PATH:/opt/gradle-1.10/bin" >> .bashrc
### . ~/.bashrc 


# Build RSS Reader Middletier and Edge webapps

yum install -y git

git clone https://github.com/Netflix/recipes-rss.git
cd recipes-rss
./gradlew clean build

### This was required to fix error "Error compiling file: /tmp//org/apache/jsp/jsp/rss_jsp.java org.apache.jasper.JasperException: PWC6033: Unable to compile class for JSP"
### javac /tmp/org/apache/jsp/jsp/rss_jsp.java -cp /root/recipes-rss/rss-edge/build/libs/rss-edge-0.1.0-SNAPSHOT.jar:/tmp -source 1.5 -target 1.5

# Install Tomcat 6 (didn't bother with Tomcat 7 as it wasn't available in defuault Centos 6 yum repo, as per "yum search tomcat")
yum install -y tomcat6
sed -i 's/port=\"8080\"/port=\"80\"/g' /etc/tomcat6/server.xml
# Set TOMCAT_USER to "root"
vim /etc/tomcat6/tomcat6.conf
    # Replace TOMCAT_USER setting with:
    TOMCAT_USER="root"
# End of edit
cd
echo "export TOMCAT_HOME=/usr/share/tomcat6" >> .bashrc
. ~/.bashrc

# Build and deploy Eureka
git clone https://github.com/Netflix/eureka.git
cd eureka/
./gradlew clean build
cp ./eureka-server/build/libs/eureka-server-XXX-SNAPSHOT.war $TOMCAT_HOME/webapps/eureka.war

service tomcat6 start

# Make sure there are no errors
grep "ERROR" /usr/share/tomcat6/logs/catalina.out | less -S

# (takes around 2mins to startup, expect some startup errors due to Eureka not running in an established cluster)
# Browse to http://[IP ADDRESS]/eureka/   <-- trailing slash required


# In another terminal session
# Start RSS Middletier Webserver
export APP_ENV=dev
cd recipes-rss
java -Xmx128m -XX:MaxPermSize=32m -jar rss-middletier/build/libs/rss-middletier-*SNAPSHOT.jar

# Test via Admin port: Browse to: http://[IP ADDRESS]:9192


# In another terminal session
# Start RSS Edge Webserver
export APP_ENV=dev
cd recipes-rss
java -Xmx128m -XX:MaxPermSize=32m -jar rss-edge/build/libs/rss-edge-*SNAPSHOT.jar

# Test via Admin port: Browse to: http://[IP ADDRESS]:9092
# Browse to http://[IP ADDRESS]:9090/jsp/rss.jsp
# Add the following RSS feeds:
# http://rss.cnn.com/rss/edition.rss
# http://feeds.washingtonpost.com/rss/politics
# http://news.yahoo.com/rss/us
# http://rss.cnn.com/rss/money_autos.rss


# Optional Extras...


# Install Hystrix: https://github.com/Netflix/recipes-rss/wiki/Hystrix-Metrics-%28Optional%29

# Open port 7979 in AWS Security Group

iptables -A INPUT -p tcp -m tcp --dport 7979 -j ACCEPT
iptables -L -n
service iptables save
service iptables restart

git clone https://github.com/Netflix/Hystrix.git
cd Hystrix/hystrix-dashboard
../gradlew jettyRun

# Browse to: http://[IP ADDRESS]:7979/hystrix-dashboard
# Enter http://[IP ADDRESS]:9090/hystrix.stream to see the Hystrix metrics show up in the dashboard. You will have to send a few transactions from the Edge service to have the Hystrix metrics loaded.

# Stress the Edge webserver and watch the circuit trip into "Open" state:
for i in {1..100}; do curl -s -o /dev/null -w "%{http_code} %{url_effective}\\n" "http://[IP ADDRESS]:9090/jsp/rss.jsp" & done




# Hystrix Example application: https://github.com/Netflix/Hystrix/tree/master/hystrix-examples-webapp

# Open port 8989 in AWS Security Group

iptables -A INPUT -p tcp -m tcp --dport 8989 -j ACCEPT
iptables -L -n
service iptables save
service iptables restart

cd Hystrix/hystrix-examples-webapp
../gradlew jettyRun

# Browse to: http://[IP ADDRESS]:8989/hystrix-examples-webapp

# View it on Hystrix dashboard
# Browse to: http://[IP_ADDRESS]:7979/hystrix-dashboard/
# Enter: http://[IP_ADDRESS]:8989/hystrix-examples-webapp/hystrix.stream
# Click "Monitor Stream"

# See the metrics change with this in one window:
curl [IP_ADDRESS]:8989/hystrix-examples-webapp/hystrix.stream
# And this in another window:
while true ; do curl "[IP_ADDRESS]:8989/hystrix-examples-webapp/"; done   # <-- The trailing "/" is required



# Install Turbine: https://github.com/Netflix/Hystrix/wiki/Dashboard
curl -L -O https://github.com/downloads/Netflix/Turbine/turbine-web-1.0.0.war
cp turbine-web-1.0.0.war $TOMCAT_HOME/webapps/turbine.war

# Configure Turbine (using Archaius)
vi /root/rss-edge-turbine.properties
    # From https://github.com/Netflix/Hystrix/wiki/Dashboard
    # Hystrix stream for RSS Edge webapp
    turbine.ConfigPropertyBasedDiscovery.default.instances=localhost
    turbine.instanceUrlSuffix=:9090/hystrix.stream
# End edit

# Add Archaius config to Tomcat "archaius.configurationSource.additionalUrls"
vi /etc/tomcat6/tomcat6.conf
    # Archaius properties for Turbine and Netflix
    JAVA_OPTS="${JAVA_OPTS} -Darchaius.configurationSource.additionalUrls=file:///root/rss-edge-turbine.properties"
# End edit


service tomcat6 restart
# Should see this in the logs: "URLs to be used as dynamic configuration source: [file:/root/rss-edge-turbine.properties]"
grep rss-edge /usr/share/tomcat6/logs/catalina.out

# Browse to: http://[IP_ADDRESS]:7979/hystrix-dashboard/
# Enter: http://54.206.19.137/turbine/turbine.stream
# Click "Monitor Stream"


# Install hystrix-dashboard in Tomcat
curl -O http://search.maven.org/remotecontent?filepath=com/netflix/hystrix/hystrix-dashboard/1.3.8/hystrix-dashboard-1.3.8.war

# Install hystrix-examples-webapp in Tomcat
cd Hystrix/hystrix-examples-webapp
../gradlew build
cp build/libs/hystrix-examples-webapp-1.3.9-SNAPSHOT.war /usr/share/tomcat6/webapps/hystrix-examples-webapp.war

Cluster Install

Turbine really only starts to shine once there are clusters involved.

Let's set up a cluster of dedicated Edge and Middletier instances, a dedicated Eureka instance (on Tomcat together with Hystrix Dashboard + Turbine) and with ELB in front of the Edge cluster. Something like this:

         Internet                           Internet
             |                                  |
 ------------|----------------------------------|---------------------------------------------
             |                                  v
             |                               AWS ELB
             |                                  |
  /----------|------------------------\        /|\
  |          v                        |        v v v
  | Hystrix Dashboard  <-- Turbine  <-----  RSS Edge (x3) 
  |                                   |    ^  ^ ^ ^
  |                                   |   /    \|/
  |                       Eureka <--------      |
  |                          ^        |         |
  \--------------------------|--------/        /|\
                             |                v v v
                             ---------- RSS Middletier (x3)


# To make things easier, before we scale out we'll set up some convenience 
# hostnames and scripts - in reality we'd come back and automate this 
# properly later on with Puppet/Chef/Ansible/Salt/Baked-into-image

# Add an entry with the Private IP address of the server in /etc/hosts e.g.:
127.0.0.1       eureka

vim "recipes-rss/rss-edge/src/main/resources/edge.properties"
    eureka.serviceUrl.default=http://eureka/eureka/v2/
# End edit

vim "recipes-rss/rss-middletier/src/main/resources/middletier.properties"
    eureka.serviceUrl.default=http://eureka/eureka/v2/
# End edit

# Rebuild both jars
cd /root/recipes-rss && gradlew build

# Create the following scripts in root's homedir:

# change_eureka_host.sh
#!/bin/bash
if [ $# -lt 1 ]; then
    echo "Usage: $0 IP_ADDRESS"
fi
sed -i "s/.*eureka/$1    eureka/g" /etc/hosts

# start_rss-edge.sh
#!/bin/bash
cd /root/recipes-rss
nohup java -Xmx128m -XX:MaxPermSize=32m -jar rss-edge/build/libs/rss-edge-*SNAPSHOT.jar &
echo "Output is logged to /root/recipes-rss/logs/rss-edge.log"

# start_rss-middletier.sh
#!/bin/bash
cd /root/recipes-rss
nohup java -Xmx128m -XX:MaxPermSize=32m -jar rss-middletier/build/libs/rss-middletier-*SNAPSHOT.jar &
echo "Output is logged to /root/recipes-rss/logs/rss-middletier.log"


# tail_rss-edge.sh
#!/bin/bash
tail -100f /root/recipes-rss/logs/rss-edge.log

# tail_rss-middletier.sh
#!/bin/bash
tail -100f /root/recipes-rss/logs/rss-middletier.log


# Save the instance as an AMI
# Now create 6 more instances based off this AMI - they can all be t1.micro's (in fact we can reprovision our main node as a t1.micro and only run Tomcat on it for Eureka, Hystrix Dashboard, & Turbine)
# Optional: Name them in the EC2 console as rss-edge-N, rss-middletier-N, rss-eureka
# Create 2 more security groups: rss-edge & rss-middletier
# Ensure 'rss-edge' security group has ports 9090 & 9092 open
# Ensure 'rss-middletier' security group has ports 9191 & 9192 open
# Create a Load Balancer named 'RssEdgeLoadBalancer' containing all 3 
# rss-edge-* nodes.  Attach ports 9090 & 9092 to the same ports.
# Optional: add a health check on:
#   Protocol: HTTP
#   Port: 9092
#   Path: /adminres/webadmin/index.html
#   Timeout: 5s
#   Interval: 0.5min
#   Unhealthy Threshold: 2
# Optional: Add cloudwatch alarm in Monitoring tab
#   UnHealthyHostCount >= 1 for 1 minute
#   Send message to topic "NotifyMe"

# On the Eureka node add the Private IP addresses of the nodes to /etc/hosts 
# e.g.:
127.0.0.1       eureka
172.31.111.333  rss-edge-1
172.31.111.444  rss-edge-2
172.31.111.555  rss-edge-3
172.31.111.666  rss-middletier-1
172.31.111.777  rss-middletier-2
172.31.111.888  rss-middletier-3

# On the Eureka node - modify the all-turbine.properties
vi /root/all-turbine.properties
    turbine.ConfigPropertyBasedDiscovery.rss-edge.instances=rss-edge-1,rss-edge-2,rss-edge-3
# End edit


# Create a script to load test the Edge nodes

# hammer_rss_vip.sh
#!/bin/bash

# Usage: hammer_rss_vip.sh [pause_seconds] 
#   e.g. hammer_rss_vip.sh 0.1   - will pause for 100ms between requests
#   e.g. hammer_rss_vip.sh       - no pause: fire requests as fast as we can fork processes in the background

RSS_EDGE_VIP=rssedgeloadbalancer-NNNNNNNNNN.xx-region-N.elb.amazonaws.com
DELAY=${1-0}

while true ; do
  curl -s -o /dev/null http://$RSS_EDGE_VIP:9090/jsp/rss.jsp &
  sleep $DELAY
done

# Now watch Turbine via the Hystrix Dashboard while you load test the Edge via the ELB and see the difference for different timings
hammer_rss_vip 1
hammer_rss_vip 0.1