Thursday, November 19, 2015

2015 with Neo4J: A Summary

Next up in my series on the things I used in 2015 and what I felt about them is Neo4J.

Probably the most "visible" graph database out there, and also the oldest, Neo4J has been around since 2007, hitting 1.0 in 2010. As a result it occupies prime real estate in the enterprise market and this shows in its editions - they can get away with not providing clustering or high availability on the free community edition.

This blog post isn't about Neo Technologies' licencing policies. So I'll just get back on topic and summarize my experience here.


Experience

  • Used upto version 2.2
  • Used in production for powering a social graph
  • Wouldn't trust it to master the data it contains, so make sure a more reliable storage backend contains data which can be backfilled into Neo4J if it ever crashes on you
  • Until version 2.3, it lacked range indices (so sorting the user nodes a celebrity user node is linked to via incoming follow relations to by user_name is terribly inefficient). Unfortunately I haven't had the time to revisit Neo4J 2.3 to see how the problem is solved now.

Likes

  • A lot of data "naturally" gets modeled as a graph
  • Cypher is a pleasure to write graph queries in
  • Full ACID support if you need that sort of thing and no dirty reads by design.
  • The web based workbench is extremely user friendly and gets the job done quickly. It spoils you.
  • https://github.com/jmcvetta/neoism and https://github.com/thingdom/node-neo4j are respectively amazingly written drivers for Go and Node.js (the two environments I wrote code in during 2015)
  • Great for bootstrapping the prototyping phase as you can model a lot many things in it while easily growing the data model with more and more relationships. Sort of like using PostgreSQL and joining the living daylights out of every table but with a friendlier query language and web interface (neither scales much though as obvious).
  • Lucene based full text indexing available (but as a negative, poorly documented)
  • Free startup and opensource licenses available for the enterprise edition making things easier on the wallet.

Dislikes

  • Proprietary clustering and high availability. I am a poor citizen of the third world.
  • Poor performance under heavy read/write load due to mandatory ACID enforcement - forget about reading from a node you visit while it is being updated.
  • Above issue makes it a bad idea to store frequently updated data like counters and profile information on a user node in the graph if it is part of relationships you regularly traverse.
  • No internal sharding and distributing ability - all your data needs to fit on one machine (thus restrict use to containable data like social graph).
  • For pagination, you are restricted to skip and limit which is very inefficient. I assume range indexes support in Neo4J 2.3 must mitigate this, which I am yet to investigate.

Future

  • Probably still the most mature free/opensource (atleast partially) database out there. For enterprise use, I doubt there are going to be competitors for a long time.
  • As a developer, don't be that guy who models every single thing as a graph and then starts crying about performance and scalability.
  • With range indexes and a lot of stability added to version 2.3 (see this and this) the project is heading in the right direction.
  • Personally though, I'll try OrientDB and TitanDB (probably in that order) first before taking a look at Neo4J again because these two are fully opensource and relatively a lot more feature rich.

No comments:

Post a Comment