Jane Project, Part Two: Data Storage

Part One of this series went into some detail on the components that make up this virtual assistant. I've hit an interesting inflection point that I wanted to talk about. As I've been digging into this project, I've realized that how I store the data needed by, and generated by this project is actually a really important early decision. I've learned the hard way on several professional projects over the many years I've been coding systems that require data storage, than an early wrong choice can lead to dead ends and allow things to get bogged down. That said, it's also important to realize that how data is stored can evolve over time.

When I initially started this project, I thought I'd want/need something quick and dirty, easy, fast, and light. So I thought SQLite would be perfect. As I began to design the core, however, I realized quickly that a SQL database is going to be too limiting - for one, I'm not a SQL wizard, and for two - this data isn't going to be all that structured. I think a NoSQL, document store or key-value store is the way to go.

The challenge of a NoSQL solution is that they can be resource intensive for a device like a Raspberry Pi. My favorite NoSQL, MongoDB is hard to get working on a Pi. My second-favorite NoSQL is AWS DynamoDB, and because of the nature of this project, it's a non-starter. (There is a local version of DynamoDB available, but it's really for development, and not production, and, furthermore, it's not open source. I am only using open-source projects with no cloud storage requirements.)

At the moment, CouchDB is currently my front runner. But I have decided for now to abstract the database layer, and start with simply saving and retrieving json documents. I'm sure I'll hit a wall pretty quickly using that methodology, but it seemed the best starting place, so I don't have to make this decision right now, and can make it later when I know a lot more about what I'll need.

links

social