Playing with Couchbase: A Case Study

NoSQL in the context of Social Gaming Data

Created by Marc Trudel / @stelcheck

Intro

申し訳ございませんが。。。

About Wizcorp

  • Mobile Social Games
  • Internal middleware
  • Open-source projects

About me

  • Started as a Developer
  • Became System Administrator
  • Now Chief Technology Officer

About this story

  • It's about love
  • It will scare you
  • There is a lot to be learned

Part 1

Couchbase for Social Games

The Early Days

  • PHP
  • MySQL as our prime datastore
  • Memcached
  • Lots of HTML5 goodness

But!

  • Real-time is hard
  • Asynchronous is hard
  • High Write/Read ratio
  • Scalability became an issue

Couchbase to the rescue!

  • Just like Memcached
  • Simple to operate
  • The future is bright

And most importantly

http://www.slideshare.net/renatko/couchbase-performance-benchmarking


It's fast

The Global Strategy

  • Couchbase + Node.js
  • node-memcached + Moxi
  • Store scalar values
  • Structured key patterns

The Strategy

  • Puppet
  • Graylog2
  • Nagios + Observium
  • Operate with admin panel

Part 2

Playtime's over, folks

It works!

  • Incredible performance
  • Very nice monitoring
  • Developers were happy

Then...

  • Key length/memory issues
  • Stat data does not persists
  • Weird errors with Moxi
  • node-memcached failure

And...

  • App-related data corruption
  • 1 Node down, all game is down
  • Rebalance failures
  • Downtime and data recovery

Whaaaat??

What happened?

  • Rebalance bugs (1.8)
  • Crash/recovery strategies
  • Bad key/value storage
  • Persistent monitoring data

So some bugs asides....

We didn't do our homework properly

We assumed it would be simple

It was

We assumed it would be easy

It wasn't... Quite

Part 3

Strategies and Tools

Data storage

  • Reduce number of entries
  • Scalar values >> Documents
  • Better key generation strategy
  • Control over vbucket hash

Client library

  • Bug fixes to node-memcached
  • node-Couchbase contributions
  • Integrity check/repair scripts
  • node-archivist

Operations

  • Polyglot Persistence
  • Load test
  • Hot backups
  • Better rebalance strategy
  • NagiosForCouchbase

In Conclusion

What we love

  • Simple at every step
  • Easy sharding/rebalance
  • Fast. Like, really fast
  • Great community

What we learned

  • Test your use cases
  • Prepare for the worse
  • Don't leave monitoring to your datastore

What we hope

  • Production-ready node-couchbase
  • More robust rebalance
  • Bigger community

Hope it wasn't too scary

You can feel safe with Couchbase

Do your homeworks

Any Questions?

Thank you!

  • wizcorp.jp/CouchConf2013/
  • github.com/Wizcorp