A Herd Of Rabbits Part 2: RabbitMQ Data Pipelines

2020 Mar31
R

abbitMQ is a powerful message borker allowing engineers to implement complex messaging topologies with relative ease. At the day job we used RabbitMQ as the backbone of our real time data infrastructure. In the previous post we setup a simple PostgreSQL trigger to send change capture messages to a RabbitMQ exchange. Conceptually, this is where we left off:

In this early stage, we basically have a fire-hose that we can selectively tap into. But we have no way to control the flow of data.

To recap a bit before we get too deep, we had a simple and manual way of handling real time operations. Effectively, we just baked all of the logic in the specific application code path.

Read More

Exactly Once Execution In A Distributed System

2017 Sep04
S

kyring is is a distributed system for managing timers, or delayed execution similar to `setTimeout` in javascript. The difference being that it is handled in a reliable and fault tolerant way. setTimeout in javascript is transient. If the running application is restarted or crashes, any pending timers are lost forever. The one guarantee that skyring provides is that a timer will execute after the specified delay, and that it only executes once. Exactly once is an interesting challenge in distributed systems, and Skyring makes use of a number of mechanisms at the node level to achieve this. From a high level, this is what the behavior on individual nodes looks like.

Skyring Node Behavior

Shared Nothing

Skyring follows the shared nothing mantra

Read More

Custom Transports For Skyring

2017 May29
S

kyring is a distributed system for managing timers. When a timer lapses, a message is delivered to destination that you have defined. *How* that message is delivered is configurable. Out of the box, Skyring comes with an `HTTP` transport, and there is an official package enabling tcp delivery of messages with connection pooling. They are pretty easy to write, and you can use any of the tools you are currently used to using.

STDOUT Transport

To illustrate the process, we're going to make a simple transport handler to write the data to stdout. Basically, speaking a transport is just a node.js module that exports a named function

Module [ˈmäjo͞ol] -n., --noun

any of a number of distinct

Read More
filed under:  zmq skyring timers node.js

Build JSON API Responses With Postgres CTEs

2017 Apr30
P

agination is a recurring problem that developers have to deal with when implementing data access layers for APIs. It can be particularly tricky with the more traditional RDMS like MySQL or Postgresql. For example, let's say we had an API endpoint that allowed consumers to search a data base of moves. We could search by title, director, starring actors, etc. Our data base has millions of movies, and we know we don't want to return all all the potential matches for every search request.

We only want to return the top 25 or so records and indicate in the response that there are more results to query for:

{
  meta: {
    total: 12000
  , limit: 25
  , next: <URL TO NEXT PAGE&
Read More
filed under:  sql postgres node.js

Distributed Timers With Node.js and Skyring

2016 Dec28
W

orking with timers a distributed system is a really nasty problem that pops up more often than most people would like. Something as simple an useful as setTimeout / clearTimeout becomes brittle, unreliable and a bottle neck in today's stateless, scalable server mindset. Basically, I need to be able to set a timer somewhere in a cluster with out knowing or caring about what server. And reversely, I need to be able to cancel that timer **without** having to know where in the cluster that timer lives. But before we can start to understand possible solutions, let's dive into a use case to understand the problem and why existing solutions aren't suitable replacements.

Scenarios

Un-send an email  - A simple

Read More

Timeseries APIs on a dime with Node, Tastypie and MySQL

2016 Mar11
T

ime series data is quickly becoming all the rage in big data circles. The primary use case for large amounts of time series data tends to be visualization of collected metrics. This could be the temperature of our house, CPU usage of a remote server, the oil levels of your car, etc. In a nut shell, time series data is:

  • Data over a continuous time interval
  • Data contains successive measurements across that interval
  • Data uses equal spacing between every two consecutive measurements
  • Each time unit within the interval has at most one data point

It might look something like this

[
    {
        time: '2016-01-01 00:00:00', // minute 0
        value: 1
    },{
        time: '2016-01-01 00:01:00', // minute 1
        value: 2
    }
]

The

Read More

Override Nested Dependencies With NPM 3

2016 Jan25
N

pm is one of the primary reasons that the node community is so strong today. It makes it easy to write, package and publish code. This is primarily because of how it solves the package version and dependency crisis - Every package has a version and it's own set of dependencies which are organized into a directory tree. It sounds so simple, but it took over twenty years of developers pulling their hair out over package manager dependency soup, it is a wonder why it hadn't been done sooner. Even more so, NPM's package manifest is a simple json file that lets fine tune the specificity of the modules in your package

However, there is one thing that can

Read More
filed under:  modules npm packages node.js

Dockerizing Node Services

2016 Jan21
I

f you haven't jumped onto the docker bandwagon just yet, you are missing the boat. It has quickly become the de facto way for building, and deploying applications of all types and sizes. And it should be. It's easy to learn and makes deploying and scaling applications significantly easier. Linux containers are lightweight, start up very quickly, and are "throw away" resources. Most of all, Node.js applications are a breeze to get running containers

Set Up An App

The most common, and easiest way to create Docker images, is to use Dockerfiles. Much like a Makefile, Rakefile, Jakefile, etc, A Docker file is a simple set of instructions that is used to create the base image for you

Read More
filed under:  api docker node.js

Configure Node Apps with Nconf and ETCD

2015 Nov02

Recently, I have been working with, and learning a lot about the new distributed operating system, CoreOS. It is really interesting and makes managing micro-service architectures, quite a bit easier than manually SSHing into each machine and dealing every node individually. At the heart of CoreOS sits ETCD, a distributed key / value store. Internally CoreOS uses it for node discovery, communication and orchestration. Unlike other key / value stores, ETCD feels a bit more like a file system with directories and files. A directory can contain multiple directories and files where a file can contain a single value. For example, you might store the name of the environment as /company/metadata/environment = staging. It is a data hierarchy, which means it

Read More

Throttling Endpoints With Node-Tastypie & Hapijs

2015 Aug01
W

hen you decide to open up parts of your API to the public, you will need to prepare for bad citizens, or consumers that may abuse your API. One way to safe guard against this might be throttling certain endpoints restricting them to a certain number of request per second.

Throttle ['Thraudl] -n --noun., -v --verb

  1. a device controlling the flow
  2. to choke or suffocate in any way.

Tastypie's base resource has hooks for easily implementing throttling behaviors. The default implementations are mostly for testing and debugging be provide the just such a behavior, allowing you to define a number of requests allowed during a given time frame. Setting up is very easy, and looks something like this:

Define

Read More
filed under:  tastypie REST hapi node.js