[BREAKING] Extend functionality of Range (to cover factorial variants) and make subclass of Matrix#3568
[BREAKING] Extend functionality of Range (to cover factorial variants) and make subclass of Matrix#3568gwhitney wants to merge 26 commits intojosdejong:v16from
Conversation
|
The most "controversial" change in this PR is that I couldn't wrap my mind around the following: previously, if you took the exact same string, say On the other hand, I left |
|
Also I just noticed that the extensions to Range in this PR include all of the functionality of |
b602743 to
255c778
Compare
b34d256 to
95cd4c2
Compare
|
Thanks for working out this POC Glen!
You're hitting the confusing differences between the JS API and the expression parser API, see docs. In the expression parser, the matrix indexes are one-based and the upper-bound of ranges is included. These differences (requiring the transform functions) are definitely confusing. We're trying to serve two different audiences which is problematic: mathematicians used to one based indexes and included range end versus programmers which are used to zero-based indexes and excluded range ends. Some first thoughts on this POC:
|
Yes, absolutely, I understand that the two are distinct with distinct rules of interpretation, and that is exactly why I felt the behavior of Range.parse() needed to change, because it is called So another possible route is to deprecate Range.parse() (but it would still execute with its old behavior, just issuing a console.warn() when called) and introduce a synonymous Range.fromString() or some other name you may like. If that would not be considered a breaking change, it would make this whole PR non-breaking to the best of my knowledge.
Yup, but actually other than this example I think the distinction is handled clearly and cleanly.
Thanks!
Actually benchmarking factorial(number) won't be informative: factorial overflows number at such a low value for the argument that the time it takes is essentially always trivial for any useful cases. In fact, if you look at stdlib.js, the implementation of factorial is solely a table lookup, since there are so few valid values. My personal thoughts so far are that the fact this PR gets
Not sure about the weights, but here's my list of evaluation points for this PR (leaving aside whether it is a breaking change, since hopefully that can be adjusted either way you like): Cons:
Pros:
Difference (that I don't see as either pro or con):
See above; I think this PR is only breaking because of the one change to Range.parse(), so if it is important it be non-breaking, hopefully we can finesse that. Otherwise, the new Range is intended to simply be a conservative extension of the behavior/capabilities of the old Range, so I don't see enough motivation to deprecate and replace. I mean, there are cases in which Range threw before because it couldn't handle Fractions, say, but I don't think extensions to capabilities are generally considered breaking. Oh, hmmm, I guess maybe there are also some cases in which Range formerly did some automatic conversion internally, like from bigint to number, that it doesn't do anymore. I handled any such cases in the indexing code, coercing the result of fetching an entry of a Range into number when necessary, rather than making all ranges be a range of numbers. That's on a fine line to me of whether it is a breaking change or not. I guess there might be some cases that did not arise in the unit tests in which before you could construct a Range with bigints (or maybe even BigNumbers) but then its entries would come out as regular JavaScript numbers -- now the entries are the same type as the Range was constructed with, for sure. Not sure if this change is significant, and in fact I am not 100% certain it did change; as I said, I don't see anywhere it affected the unit tests. Looking forward to your call on this. |
Ahh, yes I understand. Ok this makes sense. How about we just deprecate I think though that it will be best to mark this PR as a breaking change anyway, since function
Ok that makes sense.
That sounds good to me. It's very flexible to use Thanks for writing out the pros/cons. You've convinced me that the pros outweigh the cons, so I it would be awesome if you can work out this WIP further! I still have to get used a bit to the new property names |
OK, I will modify this PR to leave the behavior of Range.parse alone but mark it as deprecated (meaning in part that the first call to it will issue a console.warn of the deprecation, correct?) Do we need a "deprecation schedule" anywhere that shows all currently deprecated features, when they were deprecated, and when they are slated for discontinuation?
OK, as you like. I will change this to be a merge into the v16 branch (which is no real change at the moment, since I just created that branch), and add it to the tracking list.
OK, I will adjust the documentation of factorial accordingly, and consider this PR to resolve the need for variant factorials in #3583.
Another case of "programmers vs. mathematicians" -- this sort of thing comes up for mathematicians mostly in stuff like So how about a series of synonyms? The code already supports reading the first value of a
In addition, I would propose to make Ranges immutable in that all of the above properties would be read-only once constructed. How does that sound? In the meantime til you have a chance to weigh in, as I have time available, I will proceed with the the items listed above, together with the necessary tests and documentation to be ready for merging -- but if you want to adjust any of the above, please just let me know and I will amend what I've done to conform. Thanks! |
95cd4c2 to
7862b56
Compare
|
A related question: Right now, the new attribute-based specification of a Range (e.g. ( |
|
Thanks Glen! About the deprecation of Thanks for creating a I prefer having one set of property names and not multiple ways to do the same thing (which can give confusion). If you prefer Good idea to make |
Oops, I already made a deprecation status doc page, you will see it when I push the next commit and let me know what you think.
OK, we shall have just one set of properties.
No, I only changed to be sure I had gotten all of the instances. Now that I have, it's safe to switch to whatever combination is preferred. I think you like the "CS-ish" set
If we are being more conservative with property names, I think there is no need for this. No one will spontaneously start expecting |
|
There was an idea (from the post-closure discussion in #3581) that we could perhaps broaden the applicability of ranges with empty end to the argument of However, looking at it a bit more, I see that it's not really just the missing last index that's supported in e.g. How could we support that notation for e.g., A) Make both B) Parse There is also a more lightweight compromise option that doesn't support C) Outside of square-bracket indexing, Of course, there is also (D) Do nothing about the inconsistency betwee Option (D) is what you originally suggested in #3581. I am game to do any one of these options in this PR, just let me know. I am not going to do any of them until I hear back. If you choose (D), I will move (A)-(C) to a new design ideas Discussion so that maybe one, or something else one of them inspires, could be implemented in the future to remove the inconsistency. Basically, the problem is that we have two very different syntaxes for exactly the same semantics: E) Simplify Node by eliminating IndexNode and AccessorNode altogether, using a FunctionNode with function To be frank, I personally like (E) the best -- reducing the number of Node subtypes is a big win, in my book (fewer switch options to go through when traversing ASTs, etc. etc.) Anyhow, as I said, let me know how you'd like to proceed and I won't do anything until I hear. |
|
Going with Your idea (E) is interesting, I had never thought about such an approach. You're right that making the code base simpler and more unified is always good. In this case I'm in doubt though whether it would be helpful or confusing for users. It is technically possible to make I think this is a choice between (1) keeping this notation a special syntax handled by the expression parser or (2) having a special context introduced by certain functions like All in all I don't see a practical need for it. I think I prefer to keep it as it is right now: only support this syntax in the expression parser inside the index brackets |
|
Hmm, another quandary: There is a unit test in range.test.js (line 79) that says that the Array value of So, for this new systematization of Range, the previous unit test of The problem is that it's not necessarily clear which should change. addScalar.js defines only that number plus number is number, and bigint plus bigint is bigint. So the outcome of number plus bigint must be derived from the typed-function automatic conversion rules. So I conclude that the conversion bigint -> number is preferred over number -> bigint. And that choice seems fairly arbitrary: many bigints lose precision when converted to number, and many numbers lose precision when converted to bigint. So one could make the case that there should be no automatic conversion, but adopting that principle would break a bunch of things that are expecting that you can operate on mixed numbers and bigints. So I guess looking at the sort of small values that humans "typically" use, you're "more likely" to lose precision converting number -> bigint than bigint -> number. Hence the current preference. So frankly I don't think v16 should be second-guessing the v14 choice to prefer bigint -> number conversion. So my recommendation would be to change the defined "correct" behavior for |
|
OK, just push a progress update, but note I need to revert to just the {start, step, length, last, end} set of attributes, and stop supporting the {from, by, ...} synonyms. That will happen in the next commit, as soon as I can get to it. |
That said, option (D) is certainly the path of least resistance and since this PR is pretty massive already, I will go ahead with the leave-it-be strategy and just open a discussion. |
|
The broken browser test has uncovered subtle bugs in typed-function. So PR 171 there is now a blocker for this PR, and if/when that change to typed-function is adopted (perhaps with changes per review), this PR will have to include updating to the new version of typed-function. Thanks. |
Yes, totally agree! Good point. That will make the behavior consistent with other functions like |
|
OK, upgrading to typescript 4.2.2, the browser tests now pass. However, the node tests now fail with the tree-shaking bundle size at 135123 bytes, or 123 bytes over the limit in the test. I am very unsure how to address this. @josdejong are you able to provide any guidance on how to get the bundle size back down? The test file has this advice: I would be very happy to apply this advice, and I apologize if I am being dense, but I am at a loss as to how to figure out which code needs to be moved into a separate file and/or what variables might need the PURE annotation. Thanks for any further help you can provide. |
|
If the increase in bundle size is in line with the amount added code, then there is nothing wrong and we just have to increment this limit a bit. Only if the size "suddenly" jumps to a much larger value then something with tree shaking may be broken. |
|
I think you are saying that in this case it is ok just to change the limit on the test to say 137000 bytes and move on. I will do that when I get a chance. Assuming there were no other problems, I believe all that remains to mark this ready for review is to add (lots of) new tests for the new functionality and to remove the from, for, by,... synonyms. |
|
Yes indeed, you can increase the limit a bit. Thanks, this sounds like everything is falling into place. |
|
OK, eliminated from, for, by, to, til... As far as I know, all that is needed is a bunch of tests for all of the new features. I will get those in as soon as I can. |
|
This is an impressive piece of work Glen, thanks!
That is huge 😎 I haven't gone thought the PR in full detail, but here some first thoughts/questions:
|
|
With this commit, Range is now a full-fledged implementation of Matrix,
representing an arbitrary arithmetic sequence (of any type supported
by mathjs) in a "lazy" way, i.e. its entries are generated on demand.
(Thus, even infinite Ranges are supported.)
Note that Range therefore has many new dependencies, and the `matrix()`
convertor/constructor function must be able to create Ranges. Therefore,
to avoid a circular dependency on `matrix`, many mathjs methods have
their dependence on `matrix()` removed. Often this dependence was
very superficial, and removing it was essentially trivial; in all other
cases, the dependence could be shifted to DenseMatrix rather than matrix,
removing the possibility of circularity.
In addition, as it seemed very confusing (almost to the point of a bug)
that `math.parse('2:5')` produced a representation of 2, 3, 4, 5 while
`Range.parse('2:5')` produced a representation of 2, 3, 4. Therefore, this
commit changes the interpretation so that both `parse` command include the
endpoint.
Finally, this commit fixes the remaining doc examples tests in the
src/type directory.
…et()
* A single number is now interpreted as the entire "layer" at that position;
adds `layer` method to Matrix for efficient support of this convention.
* An array is just passed to index
* A string form of a Range is now allowed in the index; the lower and upper
limits may be elided, allowing `':'` as a wildcard for an entire dimension.
* Also removes unnecessary dependence of set functions on Index.
* Now passes all doc example testing for `subset`
The prod function now first reduces its argument to a 1d vector, then uses binary splitting without allocating any new matrices or arrays. Also implements the previously unimplemented second "dimension" argument to prod (albeit not as efficiently). As these changes added new dependencies to `prod`, some other minor refactors were necessary to avoid circular dependencies and/or keep unwanted dependencies from the number bundle. Also fixes a couple more instances of doc example tests.
* Update documentation as thoroughly as practicable.
* Attempt to make sure that string Ranges will work properly in
JavaScript API and mathjs expression language
* Support 'range' storage type in math.zeros() and math.ones()
* Resolve circular dependency caused by the preceding item
* Define Range-scalar arithmetic appropriately for subtract, multiply, and
divide in addition to add
* make math.range() just forward to new Range() as much as possible
* unify all of the places in which string notations for range are parsed,
to provide for consistent behavior
* update TypeScript types
* Add two synonyms for each Range attribute (start ~= from, for example)
But NOTE this is to be reverted in the next commit, per review feedback
128411e to
6a814c5
Compare
Achieving this and getting all tests to pass required a series of small
bugfixes with associated added tests, to wit:
- Ensure that zero(unit) has the same units and zero value as the input unit
- Shortcut zero(range) to directly produce a range of all zeros of the
proper length, without having to write out all those zeros as an array.
- Make deepEqual on units revert to equal (since they are scalars in this
sense), rather than calling their `.valueOf()` method that converts them
to string, producing misleading results.
- Derive the default step from the start or last or end of a sequence
so that it will add one to all entries of a collection value, or be
one of the corresponding unit for a unit value.
- Allow lengths to be induced between two collections with a scalar step
when the difference of the two collections has all identical entries.
- Check whether a step value is zero not via isZero (which operates
elementwise when the step is a collection, so is always truthy) but by
checking that every entry is zero.
- Check whether a Range is multidimensional based on the start, rather than
the step.
- Correct a typo in inferring a Range from its array value.
- Fixing Range.toString(). Note that some Ranges cannot be expressed in
`start:step:end` form, so these are now converted to strings like
`Range{start: s, step: t, length: l}` which can encode any Range. However,
Ranges that have a valid `start:step:end` form are still converted to
that sort of string.
|
OK:
Looking forward to further review. I would actually suggest you review and merge #3606 first into develop, as then I will rebase this and add per-function HISTORY for all of the many functions touched by this PR, which will get us to a pretty reasonably high coverage of functions with their own HISTORY. |
This is a work-in-progress, proof-of-concept PR that allows one to calculate e.g. the 10th rising factorial of bignumber 8.5 by
math.prod(math.range(math.bignumber(8.5), math.bignumber(19.5)). The major highlights include:Making Range an implementation of Matrix. (This change breaks important ground to pave the way for alternate Matrix implementations besides DenseMatrix and SparseMatrix, given that there have been discussions of flat matrices and ones based on JavaScript's type arrays.)
Allowing Range to encode arbitrary arithmetic progressions of arbitrary mathjs entities.
Complete reimplementation of prod.
Several indexing extensions, and addition of new scalarDivide(V, W), one(x), and zero(x) methods.
In addition, there is a significant reduction in the number of failing doc example tests.
This PR is being submitted for feedback on the approach and scope. It needs at least extensive addition of tests and documentation before being merge-ready.