Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This looks very interesting.

I guess that this would be the next-best thing after FHE?.

The catch is the performance, right? I mean, the paper says:

"In order to make the database end-to-end encrypted yet still capable of performing queries, the client encrypts the buckets (at the time of creation or modification). The server, which stores the buckets, never knows the encryption key used. The objects referenced by the leaf nodes of the B-Tree indexes are also encrypted client-side. As a result, the server doesn’t know how individual objects are organized within the B-Tree or whether they belong to an index at all. Since ZeroDB is encryption agnostic, probabilistic encryption can ensure that the server cannot even compare objects for equality. When a client performs a query, it asks the server to return buckets of the tree as it traverses the index remotely and incrementally, as in Fig. 2b. The client fetches and decrypts buckets in order to figure out which buckets in the next level of the tree to fetch next. Frequently accessed buckets can be cached client-side so that subsequent queries do not make unnecessary network calls."

Which would suggest that it might be a bit slow to do many round-trips for each query.

I'm guessing there could be some specific use-cases where this is not so relevant?

I would love to have some more knowledgeable people comment on this as it would be really neat to be able to start using a DB with these features.



> When a client performs a query, it asks the server to return buckets of the tree as it traverses the index remotely and incrementally, as in Fig. 2b. The client fetches and decrypts buckets in order to figure out which buckets in the next level of the tree to fetch next. Frequently accessed buckets can be cached client-side so that subsequent queries do not make unnecessary network calls."

>Which would suggest that it might be a bit slow to do many round-trips for each query.

I was curious and found some information through a search (key point italicized) [1]: "On the performance aspect, with a real world use case of 1GB index, just 150KB of data must be transferred on average over three requests to fetch the results back. In full text search terms, 250MB of data can be queried in around 500msec which even though slow, may not be prohibitively slow for some use cases. Insert queries also may take around the same time. The number of requests needed to fetch the query results grows logarithmically with the data size."

[1]: http://www.infoq.com/news/2015/04/Encrypt-Database-CryptDB-Z...

>I'm guessing there could be some specific use-cases where this is not so relevant?

They claim the performance degradation is not much, but that would become worse as the number of clients grows. The clients also would need to have a relatively decent amount of RAM and CPU power to handle this.

Related: CryptDB from MIT - http://css.csail.mit.edu/cryptdb/


Yep, relevant pieces, thanks for bringing this!

Our performance become a little better since then. And also we'll update our fulltext search algorithm using Lucene's practical scoring function: that makes it more scalable and will also help with performance.

As number of clients grows, things actually become better for read queries (as long as we support MVCC). However, multiple writing clients can create congestion if they write data to the same place which can affect the performance indeed.


Do you have a specific use case in mind for this?

I ask because apparently there would be a practical limit on the size of the DB.

So considering that you would probably need to restrict the DB use to something specific, what would that be? (at least while performance gets better/practical for very big datasets)


I don't think it's so much of a size issue. I'd use it for applications where you don't have too many simultaneous users of the same dataset (in future - that constraint would be only about simultaneous writing users).

When users have their own private datasets, that's ok to scale up number of users though.

This limitation comes from a) invalidation requests and b) limited ability to do server-side conflict resolution


Nice find. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: