Weaver is a distributed graph store that provides horizontal scalability, high-performance, and strong consistency.
Weaver enables users to execute transactional graph updates and queries through a simple python API.
For example, you can create a user (node) and a link (edge) of a specified type in a single transaction:
graph.begin_tx()
graph.create_node('ayush')
graph.create_edge('ayush', 'egs')
graph.set_edge_property('type', 'friend')
graph.end_tx()
You can query a user's friends-of-friends in another transaction:
two_hop_friends = graph.traverse('ayush'). \
out_edge([('type', 'friend')]).node(). \
out_edge([('type', 'friend')]).node().execute()
Weaver dynamically migrates portions of the graph across shards to maintain graph locality and minimize communication.
Weaver also enables users to cache results of graph computation at the nodes.
You can learn more about the API by reading the tutorial.
Install HyperDex Warp.
Run the following commands as root for Ubuntu 14.04:
$ wget -O - http://weaver.systems/repos/apt/ubuntu/weaver.gpg.key | apt-key add -
$ wget -O /etc/apt/sources.list.d/weaver.list http://weaver.systems/repos/apt/ubuntu/trusty.list
$ apt-get update
$ apt-get install weaver
Dependencies:
$ apt-get install autoconf autoconf-archive build-essential \
pkg-config libtool python-dev cython libyaml-dev libpopt-dev \
libgoogle-glog-dev libpugixml-dev sparsehash gcc-4.8 g++-4.8
Dependencies from HyperDex repository:
$ apt-get install libpo6-dev libe-dev libbusybee-dev \
libhyperleveldb-dev replicant librsm-dev libreplicant-dev \
libhyperdex-dev-warp libhyperdex-client-dev-warp hyperdex-warp
$ git clone https://github.com/dubey/weaver.git
$ cd weaver
$ autoreconf -i
$ ./configure
$ make && make install
Weaver has a pre-built Docker image that makes it easier to deploy the system. To get the image:
$ docker pull weaver/weaver
Weaver needs running instances of Hyperdex, the Weaver timeline
oracle, and the Weaver coordinator. To start all of this, get the code and execute the
start_weaver.sh
script. This script will ssh into the
machines which will host the various processes, so you need a running
ssh-server on those machines.
$ git clone https://github.com/dubey/weaver.git
$ cd weaver/
$ startup_scripts/start_weaver.sh
The script takes a single argument: path of the weaver.yaml
config file, which defaults to /etc/weaver.yaml
or
/usr/local/etc/weaver.yaml
. The config file specifies the
IP addresses and ports to which the Hyperdex coordinator, the timeline
oracle, and the Weaver coordinator will bind.
To start a Weaver shard:
$ weaver shard
To start a Weaver timestamper:
$ weaver timestamper
You must start as many timestampers as listed in the
weaver.yaml
config file, at which point you have a working
cluster. You should additionally start backup timestampers and shards
by appending the -b
option to weaver shard
and weaver timestamper
. You need f backups for each shard
and timestamper to tolerate f failures. For additional startup options
such as specifying the IP address and port to bind to, see the output of
weaver shard --help
and weaver timestamper
--help
.
A simple sanity check program which creates and traverses a small graph is given in
tests/python/correctness/simple_test.py
. Give it a try by running:
$ python tests/python/correctness/simple_test.py
Each of the timestampers and shards must be killed individually on
the machines they are running. You can use the
kill_weaver.sh
script to kill Hyperdex,
the timeline oracle, and the coordinator.
$ startup_scripts/kill_weaver.sh
The weaver/weaver
Docker image includes a Weaver
installation and scripts that setup a local Weaver instance. To run
Weaver on Docker:
$ docker run --net="host" -itd weaver/weaver /start_weaver.sh
$ docker run --net="host" -itd weaver/weaver /weaver_shard.sh
$ docker run --net="host" -itd weaver/weaver /weaver_timestamper.sh
$ docker run --net="host" -it weaver/weaver python weaver/tests/python/correctness/simple_test.py
To kill the cluster simply kill all the Docker containers. In order
to setup the cluster with another configuration, you will have to
modify the /etc/config.yaml
file by running an interactive
Docker container with a shell program. You can learn more by reading
the Docker
documentation.
A typical Weaver program creates a Weaver client
by
providing the IP address and port of the Weaver coordinator.
from weaver import client
c = client.Client('127.0.0.1', 2002)
Weaver client
s provide functions to create nodes and edges,
and to assign them properties. All such updates take place in the
context of a transaction. Multiple clients can issue concurrent
transactions, which are guaranteed to execute in isolation.
from weaver import client
c = client.Client('127.0.0.1', 2002)
c.begin_tx()
c.create_node('ayush')
new_node = c.create_node() # weaver creates a unique node handle
new_edge = c.create_edge('ayush', new_node)
c.set_node_property(node='ayush', key='type', value='person')
c.set_node_property(node=new_node, key='type', value='content')
c.set_edge_property(node='ayush', edge=new_edge, key='rel', value='likes')
c.end_tx()
c.begin_tx()
c.delete_edge(edge=new_edge, node='ayush')
c.delete_node('ayush')
c.end_tx()
Apart from creating the graph, you can also bulk-load an existing graph while starting the Weaver shards.
weaver shard -l 127.0.0.1 -p 5201 \
--graph-file=data.txt --graph-format=snap
Weaver client
s also provide query methods that trigger the
execution of snippets of code called node programs. Each query is
fully-specified by the client and is executed entirely at the shards as a
transaction. The binaries include a number of precompiled node programs
such as get node neighbors, get node properties, get edge properties,
count edges, BFS traversal, reachability request, and computation of
clustering coefficient.
from weaver import client
c = client.Client('127.0.0.1', 2002)
# read color of nodes
prog_params = c.ReadNodePropsParams(keys=['color']) # prepare program parameters
response = c.read_node_props([('ayush',prog_params)])
print response.node_props
# syntactic sugar for traversals
links = c.traverse('egs').out_edge().execute() # list all out-links from node egs
friends = c.traverse('egs').out_edge().node().execute # list all of egs's friends
all = c.traverse('egs').out_edge().node().collect() # links + friends
There are a number of sample python client programs in the
tests/python/*
directory in the source.
If you are interested in implementing your own node program, take a
look at the node_prog/
directory.
RoboBrain: A knowledge base for robots and machine learning applications.
ResearchTrends: Visualizations of the Computer Science research corpus.
For what kind of data is Weaver designed?
Weaver is designed to store dynamic graphs. You can perform transactions on rapidly evolving graph-structured data with high throughput.
What can I do with transactions on a graph?
Weaver transactions ensure that any updates and graph operations (such as traversals) execute atomically and in isolation from other transactions on a consistent snapshot of the graph. This ensures, for example, that a traversal on the graph will never return a path that never existed.
What are some examples of 'dynamic graphs' for which Weaver is a good fit?
Think online social networks, WWW, knowledge graphs, Bitcoin transaction graphs, biological interaction networks, etc. If your application manipulates graph-structured data similar to these examples, you should try Weaver out!
How high is 'high throughput'?
Our preliminary experiments show that Weaver achieves over 12x higher throughput than Titan on an online social network workload similar to that of Tao. In addition, Weaver also achieves 4x lower latency than GraphLab on an offline, graph traversal workload.
How does Weaver compare to {GraphLab/Titan/Neo4j}?
Weaver is a
high-performance, transactional, and scalable online data store for
graphs.
On what platforms does Weaver run? Do you support {CentOS/Debian/Fedora/OS X}?
The pre-release includes binaries for Ubuntu 14.04. For all other platforms, you will have to compile from source. Future releases may include pre-compiled binaries for other platforms.
Which bindings does Weaver provide?
In addition to the native C++ binding, Weaver provides a Python client that users can program against.
Can I use Weaver in production?
No! Weaver is currently in pre-release and we are in the process of making the system more robust. Please submit bugs and feature requests on the discussion list.
How can I contribute?
You can contribute in a number of ways! Please take a look at the source code, which is available on Github. The code includes a number of node programs based on graph exploration. If you feel that your custom node program is generic enough to be included in the repository, please submit a pull request. If there is a feature that Weaver should include, or a bug in the implementation, please let us know on the discussion list.
Contact us at Turn on JavaScript to view email address