Make Query Limit Results Configurable by tonysun83 · Pull Request #56 · apache/couchdb-couch-mrview

tonysun83 · 2016-09-05T01:22:22Z

Currently, all_docs are view results are practically unlimited
unless a user passes in a defined limit. This change allows an
administrator/operator to set a limit which takes priority over the
user defined limit. If this property is not set, then the default
value 16#10000000 is used.

Currently, all_docs are view results are practically unlimited unless a user passes in a defined limit. This change allows an administrator/operator to set a limit which takes priority over the user defined limit. If this property is not set, then the default value 16#10000000 is used. BugzId: 67462

kxepal · 2016-09-05T04:52:50Z

+1

rnewson · 2016-09-05T07:01:58Z

-1, the default must be the current behaviour. A partial result from _all_docs is not the same as a full one. What's the use case for forcibly clipping a response?

rnewson · 2016-09-05T07:02:32Z

and there's no JIRA ticket associated with this change either.

tonysun83 · 2016-09-05T17:02:47Z

@rnewson: I'm using the original value here: https://github.com/cloudant/couchdb-couch-mrview/blob/92575f448bbba34466e64ab595aa3ec8af3c96b6/include/couch_mrview.hrl#L76. Isn't that the current default behavior?

rnewson · 2016-09-05T17:08:54Z

hrm, very bleh. Ok, please make that a -define and update in both places; we'll have to come back to this. A default finite limit for _all_docs seems like a contract-breaking detail to me (even if the actual value is ~268 million). (cc @kocolosk @davisp).

Imo the default behaviour for _all_docs and views is that, unless constrained by startkey/endkey, you could receive any number of rows. Since wee stream them, there's no reason to put an artificial cap here, no matter how generous.

tonysun83 · 2016-09-05T17:30:47Z

updated with macros and created corresponding jira ticket

COUCHDB-3130

kxepal · 2016-09-05T17:44:34Z

Good points, @rnewson. Changing my to -1. I think this need for more wide discussion.

kxepal · 2016-09-05T17:46:36Z

If it still planned to be accepted, for the lesser surprise there should be:

The way to return good old behaviour back.
The warning in the logs that by default we do limit results now, so admins won't lost their time to figure out what's the problem diving into Erlang.

rnewson · 2016-09-05T18:47:42Z

include/couch_mrview.hrl

DEFAULT_MAX_QUERY_LIMIT, I'm sure a user could specify a larger value, right?

or just DEFAULT_LIMIT, this ain't Java. :D

rnewson · 2016-09-05T18:53:13Z

hrm, this still isn't quite right.

current behaviour, limit defaults to 16#10000000 (268,435,456 in decimal), but a user could specify a higher value and it would work (@tonysun83, right?)

the new behaviour should preserve the same default 16#10000000 and allow a separate and optional upper limit.

What it does right now is always impose an upper limit which defaults to 16#10000000.

COUCHDB-3130

tonysun83 · 2016-09-06T01:32:25Z

@rnewson: thx for pointing that out. I think my new change should adhere to the behavior we want:

If there is no config or if it's a non positive integer, then we just use whatever is provided by the user OR the original default. In this case, a user can give a higher limit than 16#10000000 which will be accepted.
If a positive config value is there, then it takes precedence over the user defined limit and we use that value instead.

rnewson · 2016-09-06T08:11:38Z

src/couch_mrview_http.erl

-
+    end, Args1, Props),
+    Limit = Args2#mrargs.limit,
+    case config:get_integer("couch_db", " default_query_limit", -1) of


this should be max_query_limit, right?

max_query_limit confusing. Is it max value I can put into limit query parameter?

But section name couch_db is something strange. We don't have any of such.

I just mean it needs to match with the setting in the test, but yes, that's the point, right? An optional forced upper value of ?limit=X.

good catch of section name, it should be couch_mrview imo.

I see. My point was about thin difference between default_query_limit and max_query_limit. First one assumes that you can specify any value instead to override the default, even greater than. The second implies (at least for me) that I cannot beat that value in config file, so ?limit=100 with max_query_limit=10 will still return 10 records back.

Hm..I should read the code probably, lol. The min(Limit, ConfigDefault1) is quite specific on the intentions. So everything is correct here, you're right.

rnewson · 2016-09-06T08:16:40Z

still some more issues but we're getting closer. Note that this cannot merge until 2.0 is cut, we're in feature freeze.

tonysun83 · 2016-10-03T16:39:53Z

@kxepal @rnewson @davisp : now that Couch2.0 is released, I wanted to bring to this back up for discussion. I'm open to suggestions for various configuration names, but I still think we should make all our index limits configurable, starting with views.

kxepal · 2016-10-03T16:48:05Z

@tonysun83

I still think we should make all our index limits configurable, starting with views.

That's a good position while requesting indexes is not cheap. Say, there are some O(N) operations which can cause DoS by requesting a little bit of big data. Otherwise, or if we can stream results chunk by chunk ala changes feed without any harm for server, there is no need in such limitations, even configurable, as they makes things only complicated while it's the client who will cause DoS himself by requesting more than it can process.

I would like to take a look on this feature under such kind of angle. Would the proposed limit save server from some kind of resources drain or it's about protection the clients? Can we make all our indexes steam like and encourage people use them instead? Will that help everyone to solve the problem of requesting unlimited data?

rnewson · 2016-10-03T22:01:42Z

I still think the default behaviour of _all_docs, _changes and _view should be to return everything that the parameters dictate (that is, we should fix the default, the longstanding large-but-finite default is also a bug).

If we all agree to change that, then it needs to be abundantly clear to the user that they did not get all results. One suggestion is, instead of changing the default, we introduce an optional maximum value for limit and then, if set, reject all requests without an explicit limit parameter. So that's a 400 Bad Request if limit is too high or if it's missing.

tonysun83 · 2016-10-04T19:42:09Z

@kxepal: I'm inclined to think that this change helps to protect servers from clients. However, at the same time, we don't want to make it a configurable change that doesn't force a response from the user while giving them different result sets. They might not notice a different result set. I like @rnewson's idea of forcing a required limit parameter upon them and fixing the default value to actually become infinity. This way, the old behavior stands but if something changes, the user is made aware.

Now we need to consider how this "maximum" field integrates with mango. The original idea for mango was to make limits for both json and text indexes the same across the board (similar to what rnewson's PR for mango). But this change introduces another configurable value. What happens when this "maximum" limit is set to 10, but the mango limit is still the default of 25? Do we simply ignore this "maximum" and use the mango 25 limit or do we enforce the 10? Also, the maximum limit for text indexes is 200 and has bookmarks. So all these factors must be considered. Personally, I'm leaning towards making everything configurable but as it's own separate entity.

kxepal · 2016-10-05T04:46:57Z

@tonysun83
With forcing limit query parameter we should provide quite clean and simple way to do pagination with the requests. Otherwise people will just send limit=100500 because they don't care and want to get their data without all these complications. And since pagination is not a simple thing, I fear that latter case will be very popular.

On second hand, limits doesn't helps to protect servers, because it's all depends on documents size which you can include into response. So limit=1000 may be fine for 1KiB docs, but it's not for 100MiB while include_docs=true is so easy and helpful. Anyway, that's quite strange protection through user API because it doesn't solves any user problem. Why they should care about the server issues?

I think that instead better to try stream views / mango without any limits. You can stop at anytime, remember the offset / seq and start over again. And noone will consume resources more than single view result cost unless someone does buffering.

So these are my thoughts. I'm drifting a bit away from initial idea to set up just some the limit to a new feature that can solve same problem in a bit better fashion. Wouldn't alternative solutions be better in mid/long term? We had unlimited view results all the history and seems like there is no urgency to make rush decision here.

use macro for default query limit

0c1b4e3

COUCHDB-3130

rnewson reviewed Sep 5, 2016
View reviewed changes

Tony Sun added 2 commits September 5, 2016 18:28

change default limit name

460615d

COUCHDB-3130

make admin configuration limit higher priority than user limit

a31f613

COUCHDB-3130

rnewson reviewed Sep 6, 2016
View reviewed changes

tonysun83 mentioned this pull request Oct 4, 2016

Configurable default limit apache/couchdb-mango#30

Merged

Conversation

tonysun83 commented Sep 5, 2016

Uh oh!

kxepal commented Sep 5, 2016

Uh oh!

rnewson commented Sep 5, 2016

Uh oh!

rnewson commented Sep 5, 2016

Uh oh!

tonysun83 commented Sep 5, 2016

Uh oh!

rnewson commented Sep 5, 2016

Uh oh!

tonysun83 commented Sep 5, 2016

Uh oh!

kxepal commented Sep 5, 2016

Uh oh!

kxepal commented Sep 5, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnewson commented Sep 5, 2016

Uh oh!

tonysun83 commented Sep 6, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnewson commented Sep 6, 2016

Uh oh!

tonysun83 commented Oct 3, 2016

Uh oh!

kxepal commented Oct 3, 2016

Uh oh!

rnewson commented Oct 3, 2016

Uh oh!

tonysun83 commented Oct 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kxepal commented Oct 5, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

tonysun83 commented Oct 4, 2016 •

edited

Loading