Make Query Limit Results Configurable#56
Conversation
Currently, all_docs are view results are practically unlimited unless a user passes in a defined limit. This change allows an administrator/operator to set a limit which takes priority over the user defined limit. If this property is not set, then the default value 16#10000000 is used. BugzId: 67462
|
+1 |
|
-1, the default must be the current behaviour. A partial result from _all_docs is not the same as a full one. What's the use case for forcibly clipping a response? |
|
and there's no JIRA ticket associated with this change either. |
|
@rnewson: I'm using the original value here: https://github.com/cloudant/couchdb-couch-mrview/blob/92575f448bbba34466e64ab595aa3ec8af3c96b6/include/couch_mrview.hrl#L76. Isn't that the current default behavior? |
|
hrm, very bleh. Ok, please make that a Imo the default behaviour for _all_docs and views is that, unless constrained by startkey/endkey, you could receive any number of rows. Since wee stream them, there's no reason to put an artificial cap here, no matter how generous. |
|
updated with macros and created corresponding jira ticket |
COUCHDB-3130
|
Good points, @rnewson. Changing my to -1. I think this need for more wide discussion. |
|
If it still planned to be accepted, for the lesser surprise there should be:
|
include/couch_mrview.hrl
Outdated
There was a problem hiding this comment.
DEFAULT_MAX_QUERY_LIMIT, I'm sure a user could specify a larger value, right?
There was a problem hiding this comment.
or just DEFAULT_LIMIT, this ain't Java. :D
|
hrm, this still isn't quite right. current behaviour, limit defaults to the new behaviour should preserve the same default What it does right now is always impose an upper limit which defaults to |
COUCHDB-3130
|
@rnewson: thx for pointing that out. I think my new change should adhere to the behavior we want:
|
|
|
||
| end, Args1, Props), | ||
| Limit = Args2#mrargs.limit, | ||
| case config:get_integer("couch_db", " default_query_limit", -1) of |
There was a problem hiding this comment.
this should be max_query_limit, right?
There was a problem hiding this comment.
max_query_limit confusing. Is it max value I can put into limit query parameter?
There was a problem hiding this comment.
But section name couch_db is something strange. We don't have any of such.
There was a problem hiding this comment.
I just mean it needs to match with the setting in the test, but yes, that's the point, right? An optional forced upper value of ?limit=X.
good catch of section name, it should be couch_mrview imo.
There was a problem hiding this comment.
I see. My point was about thin difference between default_query_limit and max_query_limit. First one assumes that you can specify any value instead to override the default, even greater than. The second implies (at least for me) that I cannot beat that value in config file, so ?limit=100 with max_query_limit=10 will still return 10 records back.
There was a problem hiding this comment.
Hm..I should read the code probably, lol. The min(Limit, ConfigDefault1) is quite specific on the intentions. So everything is correct here, you're right.
|
still some more issues but we're getting closer. Note that this cannot merge until 2.0 is cut, we're in feature freeze. |
That's a good position while requesting indexes is not cheap. Say, there are some O(N) operations which can cause DoS by requesting a little bit of big data. Otherwise, or if we can stream results chunk by chunk ala changes feed without any harm for server, there is no need in such limitations, even configurable, as they makes things only complicated while it's the client who will cause DoS himself by requesting more than it can process. I would like to take a look on this feature under such kind of angle. Would the proposed limit save server from some kind of resources drain or it's about protection the clients? Can we make all our indexes steam like and encourage people use them instead? Will that help everyone to solve the problem of requesting unlimited data? |
|
I still think the default behaviour of _all_docs, _changes and _view should be to return everything that the parameters dictate (that is, we should fix the default, the longstanding large-but-finite default is also a bug). If we all agree to change that, then it needs to be abundantly clear to the user that they did not get all results. One suggestion is, instead of changing the default, we introduce an optional maximum value for limit and then, if set, reject all requests without an explicit limit parameter. So that's a 400 Bad Request if limit is too high or if it's missing. |
|
@kxepal: I'm inclined to think that this change helps to protect servers from clients. However, at the same time, we don't want to make it a configurable change that doesn't force a response from the user while giving them different result sets. They might not notice a different result set. I like @rnewson's idea of forcing a required limit parameter upon them and fixing the default value to actually become infinity. This way, the old behavior stands but if something changes, the user is made aware. Now we need to consider how this "maximum" field integrates with mango. The original idea for mango was to make limits for both json and text indexes the same across the board (similar to what rnewson's PR for mango). But this change introduces another configurable value. What happens when this "maximum" limit is set to 10, but the mango limit is still the default of 25? Do we simply ignore this "maximum" and use the mango 25 limit or do we enforce the 10? Also, the maximum limit for text indexes is 200 and has bookmarks. So all these factors must be considered. Personally, I'm leaning towards making everything configurable but as it's own separate entity. |
|
@tonysun83 On second hand, limits doesn't helps to protect servers, because it's all depends on documents size which you can include into response. So I think that instead better to try stream views / mango without any limits. You can stop at anytime, remember the offset / seq and start over again. And noone will consume resources more than single view result cost unless someone does buffering. So these are my thoughts. I'm drifting a bit away from initial idea to set up just some the limit to a new feature that can solve same problem in a bit better fashion. Wouldn't alternative solutions be better in mid/long term? We had unlimited view results all the history and seems like there is no urgency to make rush decision here. |
Currently, all_docs are view results are practically unlimited
unless a user passes in a defined limit. This change allows an
administrator/operator to set a limit which takes priority over the
user defined limit. If this property is not set, then the default
value 16#10000000 is used.