This looks very interesting. I guess that this would be the next-best thing afte...

newscracker · on Feb 24, 2016

> When a client performs a query, it asks the server to return buckets of the tree as it traverses the index remotely and incrementally, as in Fig. 2b. The client fetches and decrypts buckets in order to figure out which buckets in the next level of the tree to fetch next. Frequently accessed buckets can be cached client-side so that subsequent queries do not make unnecessary network calls."

>Which would suggest that it might be a bit slow to do many round-trips for each query.

I was curious and found some information through a search (key point italicized) [1]: "On the performance aspect, with a real world use case of 1GB index, just 150KB of data must be transferred on average over three requests to fetch the results back. In full text search terms, 250MB of data can be queried in around 500msec which even though slow, may not be prohibitively slow for some use cases. Insert queries also may take around the same time. The number of requests needed to fetch the query results grows logarithmically with the data size."

[1]: http://www.infoq.com/news/2015/04/Encrypt-Database-CryptDB-Z...

>I'm guessing there could be some specific use-cases where this is not so relevant?

They claim the performance degradation is not much, but that would become worse as the number of clients grows. The clients also would need to have a relatively decent amount of RAM and CPU power to handle this.

Related: CryptDB from MIT - http://css.csail.mit.edu/cryptdb/

michwill · on Feb 24, 2016

Yep, relevant pieces, thanks for bringing this!

Our performance become a little better since then. And also we'll update our fulltext search algorithm using Lucene's practical scoring function: that makes it more scalable and will also help with performance.

As number of clients grows, things actually become better for read queries (as long as we support MVCC). However, multiple writing clients can create congestion if they write data to the same place which can affect the performance indeed.

saganus · on Feb 24, 2016

Do you have a specific use case in mind for this?

I ask because apparently there would be a practical limit on the size of the DB.

So considering that you would probably need to restrict the DB use to something specific, what would that be? (at least while performance gets better/practical for very big datasets)

michwill · on Feb 24, 2016

I don't think it's so much of a size issue. I'd use it for applications where you don't have too many simultaneous users of the same dataset (in future - that constraint would be only about simultaneous writing users).

When users have their own private datasets, that's ok to scale up number of users though.

This limitation comes from a) invalidation requests and b) limited ability to do server-side conflict resolution

saganus · on Feb 24, 2016

Nice find. Thanks!