Gremlin
Gremlin is the name of the graph traversal and query language provided by TinkerPop.
Gremlin console
Create TinkerGraph:
graph = TinkerGraph.open()
Load from GraphML:
graph.io(graphml()).readGraph('air-routes.graphml')
To prevent output, add []
:
a=g.V().has('code','AUS').out().toList();[]
Query
g = graph.traversal()
has and hasLabel:
g.V().hasLabel('airport').has('code','DFW') g.V().has('airport','code','DFW')
next() - a terminal step:
g.V().has('airport','code','DFW').next().getClass()
Ends the graph traversal and returns a concrete object that you can work with further in your application.
has():
g.V().has('region') g.V().hasNot('region') g.V().not(has('region'))
groupCount():
g.V().groupCount().by(label) g.V().label().groupCount() g.V().group().by(label).by(count()) g.V().hasLabel('airport').groupCount().by('country') g.V().hasLabel('country').group().by('code').by(out().count())
path - visited vertices and edges:
g.V().has('airport','code','LCY').outE().inV().path()
g.V().has('airport','code','AUS').out().as('a').out().as('b'). path().by('code').from('a').to('b').limit(10)
If edge exist:
g.V().has('code','AUS').out('route').has('code','DFW').hasNext()
Select edge between two vertices:
g.V().has('code','MIA').outE().as('e').inV().has('code','DFW').select('e')
Limit:
g.V().hasLabel('airport').limit(20).values('code') g.V().hasLabel('airport').tail(20).values('code') g.V().hasLabel('airport').range(0,20).values('code')
Locate by id:
g.V().hasId(8).values('code') g.V().has(id,8).values('code') g.V().hasId(between(1,6)) g.V().has(id,between(1,6)) g.V(3).values('code') g.V(3,6,8,15).values('code')
Labels:
g.V().where(label().is(eq('airport'))).count() g.V().has(label,'airport').count() g.V().hasLabel('airport').count() g.V().has(label,neq('airport')).count() g.V().where(label().is(neq('airport'))).count() g.V().not(hasLabel('airport')).count()
Equal:
g.V().has('runways',eq(3)).count() g.V().has('runways',3).count() g.V().values('runways').is(3).count()
Starts with:
g.V().hasLabel('airport').has('city',between('Dal','Dam')).values('city') g.V().hasLabel('airport'). filter{ it.get().value('city').startsWith('Dal')}. values('city')
Boolean:
g.V().and(has('code','AUS'),has('icao','KAUS')) g.V().has('code','AUS').and().has('icao','KAUS') g.V().hasLabel('airport'). where(out().count().is(lt(100).and(gt(94)))). group().by('code').by(out().count())
Where:
g.V().has('runways',gt(5)) # is equal to g.V().where(values('runways').is(gt(5))) g.V().hasLabel('airport').where(out('route').count().is(gt(60))).count()
Finding two vertices in one query:
g.V().has('code','NCE').values('region').as('r'). V().hasLabel('airport').as('a').values('region'). where(eq('r')).by(). local(select('a').values('city','code','region').fold())
If/than/else (choose):
g.V().has('region','US-TX').choose(values('longest').is(gt(12000)), values('code'), values('desc'))
Case/switch (option):
g.V().hasLabel('airport').choose(values('code')). option('DFW',values('desc')). option('AUS',values('region')). option('LAX',values('runways'))
Union and group:
g.V().has('airport','code','DFW').as('a'). union(select('a'),out().count()).fold() [v[8],221] g.V().has('airport','code','DFW'). group().by().by(out().count()) [v[8]:221]]
sideEffect:
g.V(3).sideEffect(out().count().store('a')). out().out().count().as('b').select('a','b')
aggregate (temporary collection):
g.V().has('code','AUS').out().aggregate('nonstop'). out().where(without('nonstop')).dedup().count()
coalesce - return the first has result:
g.V(3).coalesce(out('fly'),__.in('contains')).valueMap()
simplePath - do not trevel the same path again:
g.V().has('code','AUS'). repeat(out().simplePath()). until(has('code','AGR')). path().by('code').limit(10)
Create if not exist:
g.V().has('code','XYZ').fold().coalesce(unfold(),addV().property('code','XYZ'))
Delete:
# vertices g.V().has('code','AUS').outE().as('e').inV().has('code','LHR').select('e').drop() # properties g.V().has('code','SFO').properties('desc').drop()
Properties:
g.V().has('code','AUS').property(list,'code','ABIA') g.V().has('code','AUS').properties().hasValue('ABIA').drop() g.V(3).property(set,'hw',"hello").property(set,'hw','world')
sack - a side effect:
g.V().has('code','AUS').sack(assign).by('runways'). V().has('code','SAF').out(). sack(minus).by('runways').sack().fold()
Profile:
g.V().has('region','US-TX').out().has('region','US-CA'). out().has('country','DE').profile()
Walk a graph
One vertex is considered to be adjacent to another vertex if there is an edge connecting them.
A vertex and an edge are considered incident if they are connected to each other.
out
- Outgoing adjacent vertices.
in
- Incoming adjacent vertices.
both
- Both incoming and outgoing adjacent vertices.
outE
- Outgoing incident edges.
inE
- Incoming incident edges.
bothE
- Both outgoing and incoming incident edges.
outV
- Outgoing vertex.
inV
- Incoming vertex.
otherV
- The vertex that was not the vertex we came from.
All except outV
, inV
, otherV
can accept labels as parameters.
Predicates
eq
- Equal to
neq
- Not equal to
gt
- Greater than
gte
- Greater than or equal to
lt
- Less than
lte
- Less than or equal to
inside
- Inside a lower and upper bound, neither bound is included.
outside
- Outside a lower and upper bound, neither bound is included.
between
- Between two values inclusive/exclusive (upper bound is excluded)
within
- Must match at least one of the values provided. Can be a range or a list
without
- Must not match any of the values provided. Can be a range or a list
Best practices
We start from looking for Vertices, and their number is lower than Edges.
Labels
As useful as labels are, in larger graph deployments when indexing technology such as Solr or Elasticsearch is often used to speed up traversing the graph, vertex labels typically do not get indexed. Therefore, it is currently recommended that an actual vertex property that can be indexed is used when walking a graph rather than relying on the vertex label. This is especially important when working with large graphs where performance can become an issue.
Double underscore
__
- is a result of the previous step. Use to avoid reserved names usage issues (in
is reserved in groovy).
Vocabulary
Traversal
Graph query is often referred to as a traversal.
Traversals that do not start with a V or E step are referred to as "anonymous traversals".
Modulator step
A modulator is a step that influences the behavior of the step that it is associated with. Examples of such modulator steps are by and as.
Vertex degree
Vertex degree is used when discussing the number of edges coming into a vertex (in degree), going out from a vertex (out degree) or potentially both coming in and going out (degree).
Fold and unfold
The unfold step turns the HashMap into a series of HashMap.Node elements.
Side effect
Can store values during a traversal but has no effect on what is passed on to the next step.
OLTP vs OLAP
OLTP - Online Transaction Processing
OLAP - Online Analytical Processing