Snippets and thoughts from a passionate software developer interested in all things DevOps, Continuous Delivery, Functional Programming, Distributed Systems. Also on Twitter as @mcallana.
Wednesday, May 30, 2012
Geppetto: Puppet IDE
http://puppetlabs.com/blog/geppetto-a-puppet-ide/ (old but good example screenshots)
https://github.com/cloudsmith/geppetto
http://cloudsmith.github.com/geppetto/
"As usual, the team ignored Management and did whatever they damn well pleased, working on Geppetto in secret through 15 long months of nights, weekends and vacations"
http://blog.cloudsmith.com/?p=803
Thursday, May 24, 2012
Lifehacker: Focussing goals
http://lifehacker.com/5912971/focus-your-ambitions-with-the-lifehacker-hierarchy-of-goals
THORIN KLOSOWSKI
Level 1: The Primary Goals
- Base of all other goals - one or two things you aspire to do before you die - that will truly matter to you in 20 years.
- Everything above this bottom level should help you to one day complete these goals.
- Should only have two or three high stakes goals
Level 2: Long Term Goals
- Major goals that are required to get to the primary goals.
- E.g. Sustainable habits you need to form over the years
- E.g. Achievements you want to reach
Level 3: Short Term Goals
- Weeks or months out.
- Short term, not short-sighted.
- What you can do now to make primary goal happen?
Level 4: Recurring Goals
- What you want to do daily/weekly/monthly regardless of what else is going on.
- These aren't quite the same as short-term goals because they're to form a habit.
- Ask yourself what you need to do on a daily basis to make that happen in the long run.
Level 5: Immediate Goals
- Goals and to-dos that you can and want to accomplish right this second.
- Ever-changing but necessary part of pyramid - allows you to measure your daily duties to see how they have an affect on overall life goals.
- Could be as simple as making a phone call.
- How do they affect other aspects of the pyramid both positively and negatively?
Weed Out Junk and Accomplish Your Goals
- "Life consists of what you pay attention to" - you can structure your goals the same way.
- When you have too many goals conflicting with each other your attention is shifted too often.
- Trim away junk goals to get things done and find an actionable path.
- The goal of the pyramid is to ensure that every aspect of your goals work together:
- Can see where your ideas fail and aren't coalescing.
- When you have too many goals conflicting with each other your attention is shifted too often.
- Trim away junk goals to get things done and find an actionable path.
- Can see where your ideas fail and aren't coalescing.
- Start at the bottom of your pyramid and draw lines up through goals that match each other. The line should move through each level and hit one or two different goals along the way.
- Do this with all your goals moving upwards through the pyramid.
- When you're done you'll probably have a few outliers scattered about.
- Ask yourself: Why do I want this?
- Does this relate to anything else I want?
- If you don't have a good answer, cut them from the list.
- If you want to keep goals then focus them to help you with another goal.
- Go back through your levels and see what goals you can outsource to other people.
Focus only on the goals that matter, break them into smaller steps, and start work immediately
Tuesday, May 22, 2012
HTTP Status Decision Graph
Just saw a demo of webmachine's decision graph by @TheColonial:
http://blog.beerriot.com/2009/03/20/http-decision-graph-comes-to-life/
Inspired by:
http://http-headers-status.googlecode.com/files/http-headers-status%20v3%20draft.png
Saturday, May 12, 2012
Puppet Learning VM
http://docs.puppetlabs.com/learning/
Download latest VMX from http://downloads.puppetlabs.com/learning/
Follow instructions:
http://docs.puppetlabs.com/learning/#get-equipped
Log in as
To view the Puppet Enterprise web console, navigate to https://(your VM’s IP address).
(Note the https)
Log in as
Downloading LearningVM v2.5.1...
Download latest VMX from http://downloads.puppetlabs.com/learning/
Follow instructions:
http://docs.puppetlabs.com/learning/#get-equipped
Log in as
root
, with the password puppet
Success
To view the Puppet Enterprise web console, navigate to https://(your VM’s IP address).
(Note the https)
Log in as
puppet@example.com
, with the password learningpuppet
.
Doesn't work with LearningVM v2.0.1
Downloading LearningVM v2.5.1...
It works!
Sunday, May 6, 2012
Friday, May 4, 2012
Julian Porter: MapReduce as a Monad in Haskell
Julian Porter: MapReduce as a Monad in Haskell
Simon Peyton-Jones - Cloud Haskell
- http://jpembeddedsolutions.wordpress.com/mapreduce-in-haskell/
- http://www.haskell.org/haskellwiki/MapReduce_as_a_monad/
- http://jpembeddedsolutions.wordpress.com/2011/10/30/distributed-storage-in-haskell/
- http://jpembeddedsolutions.wordpress.com/2011/10/31/a-datastore-is-a-monad-transformer/
Simon Peyton-Jones - Cloud Haskell
Thursday, May 3, 2012
Klint Finley: How an Established Company Can Start Doing DevOps
http://siliconangle.com/blog/2012/03/20/how-an-established-company-can-start-doing-devops/
finding an discrete project that members of both the development and operations teams collaborate on
He suggests that developers who want to get started with DevOps start by reaching out to ops and getting themselves on the call list so they can see what sort of issues ops has to deal with every day. If you're in infrastructure, he suggests reaching out to development and getting in on some build meetings and learn about how that process works and the pressures they are under.
Starting with small, pilot programs and taking small specific steps are a great starting point. Even if a "pure" agile/DevOps methodology can never been adopted at a company, certain proven technologies like configuration management and practices like putting developers on the call rotation can improve productivity and collaboration.
Notes from Cloudera Hadoop talk
Cloudera Hadoop
Notes from http://www.meetup.com/qldjvm/events/58332582/
Notes from http://www.meetup.com/qldjvm/events/58332582/
Hadoop
·
Scalable
·
Fault tolerant
·
OS
HDFS + MapReduce
HDFS: Like a normal Filesystem but Distributed (highly scalable)
MapReduce: Compute framework
Can
process the data where it resides
Direct
attached storage at individual nodes
Concepts
·
Apps are written in high-level code
·
Share nothing architecture (between nodes)
·
Data is spread among machines in advance
Schema
on-read
SerDe
(Serialiser/Deserialiser)
DB:
reads are fast, standards
Hadoop:
loads are fast, flexibility
Enterprise
DataWarehouse Arch
Sqoop
& Flume --> Hadoop
Hive
& ODBC
Oozie
(work flow coordinator)
Sqoop
out to BI reports
HDFS
·
NameNode – holds all metadata for HDFS
·
Needs to be highly reliable machine (Raid 10,
dual power supplies, dual network cards bonded)
·
More memory the better 48Gb-96Gb (depending on #
of files/blocks)
·
v1.0 If NameNode disappears cluster is down
·
Secondary NameNode – check pointing for NameNode
(same hardware)
·
DataNodes
·
Hardware will depend on specific
·
JBOD (just a bunch of disks), no raid, no SAN, no
virtualisation
·
Maximise for IO throughput
·
Direct attached disk
·
Data on S3, EMR instances (maxed out by how
quickly S3 can access EMR)
·
Pay a price for virtualisation
E.g.
DataNode
·
12 3TB drives
·
2 4-6core CPUs
·
24-96Gb RAM
MapReduce
Job
tracker takes job submitted to cluster
Task
Tracker
·
take job and tries to run on same machine where
data resides (90% possible)
·
works on multiple files
·
map task per block
·
split into multiple blocks
·
don’t want massive block - 2GB file split into
smaller pieces
·
128kb chunks
Input is split and fed in
parallel to the map tasks which process the data given to them and write
immediate data o their local storage (intermediate data)
Shuffle and sort (network
intensive)
Reduce tasks pick up data from
the local storage of the map tasks relevant to the key range they have to
perform the reduce task on They perform the reduce op and wwrite the output to
HDFS
WriteOnceReadMany,
e.g. log data, snapshot DB, not for system of record multiple writes
Can
only append to file
Mapping
|
Shuffling
|
Reducing
|
Final Result
|
The, 1
|
Aardvark, 1
Cat, 1
Mat, 1
On[1,1]
Sat[1,1]
Sofa,1
The[1,1,1,1]
|
Aardvark, 1
Cat, 1
Mat, 1
On, 2
Sat, 2
Sofa, 1
The, 4
|
Aardvark, 1
Cat, 1
Mat, 1
On, 2
Sat, 2
Sofa, 1
The 4
|
Cat, 1
|
|||
Sat, 1
|
|||
On, 1
|
|||
The, 1
|
|||
Mat, 1
|
|||
The, 1
|
|||
Aardvark, 1
|
|||
Sat, 1
|
|||
On, 1
|
|||
The, 1
|
|||
Sofa, 1
|
Hive
SQL
like language for wiring MapReduce Jobs
Supports
SELECT, JOIN, GROUP BY, etc.
Can support
very large datasets by allowing
Partitioning sampling
Bucketing
Sqoop Sql to Hadoop
·
Pull data in, push data out
·
Connectors
·
Generates MR job to generate parallel load
import or export to RDBMS
Flume
Bring
in log (or other) data
Multi-agents
talk to other agents
Log,
twitter, avro, netcat, exec
Oozie
Workflow/coordination
service to magan data processing jobs for Hadoop
Chain
jobs together (e.g. every 15mins)
Pipes and Streaming
Write
native-code MR in C++ / arbitrary scripting languages
Stdin –
stdout map à
reduce à
final result
Fuse –
DFS
Allows mounting
HDFS volumes via Linux FUSE FS
HBase
·
RealTime (not batch) column-oriented datastore (Modeled
on BigTable)
·
Handles billions rows, petabytes
·
Facebook 1mil writes/s likes, 1.6mil/s insights,
log data
·
Profile data in MySQL with memcache on top, moving
to HBase
·
Gave up on developing Cassandra (eventual
consistency better for static/write-only) for HBase
·
Mobile devices
·
Strong consistency
·
Not RDBMS: No joins, no indexes, no SQL
·
Columns can be added on the fly and store any
kind of data
·
Keeps 3 versions of column cells, write-ahead
log, sequential writes/reads -> append-type model
·
Johnathon Grey HadoopWorld videos
·
Row key only (not index) otherwise scan whole
dataset
CDH
Cloudera’s
Distribution enterprise-ready dist of Hadoop
Pig
Dataflow
language for MapReduce
OpenTSDB (time series DB) http://opentsdb.net/
Interesting
example - metrics for 1000s machines
YCSB (Yahoo Cloud Serving Benchmark)
·
Test cloud and bigdata technologies with real
loads to demonstrate how they will perform before making expensive decisions with
little data.
· Trigger-like “coprocessors” attached to data nodes
Wednesday, May 2, 2012
Elasticsearch vs SOLR
http://stackoverflow.com/questions/10213009/solr-vs-elasticsearch
http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage/2288211#2288211
- Want to have Compass distributed.
- Kimchy started working on a distributed Compass, by integrating with data grid solutions like GigaSpaces, Coherence and Terracotta. At its core, a distributed Lucene solution needs to be sharded.
- Also, with the advancement of HTTP and JSON as ubiquitous APIs, it means that a solution that many different systems with different languages can easily be used.
- It has a very advance distributed model, speaks natively JSON, and exposes many advance search features, all seamlessly expressed through JSON DSL.
- Solr is also a solution for exposing an indexing/search server over HTTP, but I would argue that ElasticSearch provides a much superior distributed model and ease of use.
Cloudera Hadoop
Download CDH3 VM: https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM
Once you launch the VM, you are automatically logged in as the cloudera user.
The account details are:
- username: cloudera
- password: cloudera
There is a Hadoop Tutorial available from the Cloudera website.
You can access status through the browser at the following URLs:
- NameNode status (localhost:50070)
- JobTracker status (localhost:50030)
- The Hue user interface (localhost:8088)
- The HBase web UI (localhost:60010)
Try some commands: https://ccp.cloudera.com/display/CDHDOC/Installing+CDH3+on+a+Single+Linux+Node+in+Pseudo-distributed+Mode
Subscribe to:
Posts (Atom)