Skip to content

Assignment 5 - code review#8

Open
karineek wants to merge 24 commits intoassignment5_basefrom
master
Open

Assignment 5 - code review#8
karineek wants to merge 24 commits intoassignment5_basefrom
master

Conversation

@karineek
Copy link
Owner

@karineek karineek commented Jul 9, 2017

No description provided.

{
/* Get the current order by prefix size, starting from 0 */
dict = getBasicAlphaBeit(alienWords, 0); /* Basic Alphabeit */
/* A,R,C from the example! */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline the declaration of dict and return null outside the if statement. Alternatively, have an if statement at the top if(alienWords == null || alienWords.length == 0) return null; to get rid of one layer of curly brackets

Copy link
Owner Author

@karineek karineek Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll split it to two methods, to avoid this.

* Note: Assume the input is valid here!
*/
private List<Character> getBasicAlphaBeit(String[] words, int prefixSize)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the parameter is misleading I would use index

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For which of these would you use index? are the words are the index or the current prefix size?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefixSize as it sort of implies that you are getting the alphabet for prefixes rather than position, does it make sense?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a prefix, since you compare all the prefix up to some size...

{
return 0;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a private method so you control what gets passed - I would be tempted to get rid of the parameter check

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, np

}

tempDict.add(0,alienWords[i]);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if condition is used to determine if the loop should terminate and hence should be added to within the for loop signature i.e. for (int i=curr+1; i < alienWords.length && alienWords[i].startsWith(prefix); i++)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer cleaner loops, is this part of the code standards of google?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no, just a personal vendetta against break that's it :) so happy if you just not agree :)

}

return ret;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than the for loop this can be return tempDict.toArray(new String[tempDict.size()]);

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, np

* @author Karine
*/
public class kemUnknownLanguage
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class names should start with a capital letter and the code should adhere to some styling guideline e.g. curly brackets should open on the same line in java

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, np

}
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to find some counter-examples. For example
["ART", "RAT", "CTT", "CTA", "CRA"]

In this example, the first iteration gives a dict = [A,R,C], then when you go into the second letter T needs to be before R (CTA > CRA) so dict = [A, T, R, C] but in the third iteration T needs to be before A (CTT > CTA)

Also, I seem to be getting an IndexOutOfBoundsException when words are followed by shorter words

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give the example of indexOutOfBound? tests 3,4,5 are with different lengths (longer after shorter and shorter after longer), so I am a bit confused...

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sure we shall continue for the third iteration...
Also what happen in this case:
["ART", "RAT", "CTTA", "CTTT", "CTA", "CRA"]
I think the "legal input" was a bit of an open question here...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh the example for the IndexOutOfBounds might be open ended as well"AC", "DR", "D", "TAR", "TAC", "TAD"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Collaborator

@simon-frankau simon-frankau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I don't understand the algorithm you're writing, and would like to see it rewritten/re-commented so that it's clearer, please.

}

return null;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more usual structure is to have early-out error conditions in the if statement, rather than the main logic inside the block with a 'return null' outside. This allows people to read the main body having got the edge-case-checks out of the way.

I generally try to structure my code to work well in the presence of corner cases, without requiring special handling. So, if you can write your code to still work without the "alienWords.length > 0" check, that's usually better style.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was trying to fix it several time. I will revert it to the original code.
The other reviewer also comment on this section, why don't you comment together on the same place? else it is very hard (for me) to get an agreement between the two reviewers...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made changes to the code after the other reviewer reviewed. When I review the latest version of the code, I don't see the comments that were attached to previous versions.

In this specific case, I agree with Aimee's comment - her suggestion to use "if(alienWords == null || alienWords.length == 0) return null;" is exactly what I'm suggesting, too.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm confused. Which version is better?

if(alienWords == null || alienWords.length == 0) return null;
=== VS. ===
List alphabet = null; if ((alienWords != null) && (alienWords.length > 0)) { .... } return alphabet;

Or do you have another idea?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the two, the first one is better. Both Aimee and I prefer this.

If you can write your code so that it reads "if (alienWords == null) return null;" and the case alienWords.length == 0 is naturally handled by the code body, that's even better.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll change it to that


private List<Character> extractAlphabeitFromWords(String[] alienWords) {
/* Get the current order by prefix size, starting from 0 */
List<Character> dict = getBasicAlphaBeit(alienWords, 0); /* Basic Alphabeit */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"dict" is a somewhat confusing name - it is neither a dictionary in the lexicographical sense, not in the computer science sense. "alphabet" might be better.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, np

}

return dict;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I just don't understand the algorithm you're attempting here, so I'm stopping the review at this point.

Reading code shouldn't be hard work, even if it's harder to write to make it easy to read.

I have a strong suspicion that the code will have cases that don't work - what I can make out doesn't suggest an algorithm that I think will work in all corner cases (having done a couple of code reviews with nasty corner case failures for other mentees already).

What I suggest is to extend the commenting on the code to express what you're expecting at each stage of the algorithm, any invariants you might be wanting to maintain, etc. If you can break your algorithm into a clear pipeline of steps, that will also help. Take a look at the code written by the others to get ideas on how to cleanly express your intent.

Copy link
Owner Author

@karineek karineek Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do really try to learn what is the right level of comments. Before, you've said that "the code shall document itself", now, that there shall be comments describing the code.

It is for sure, will not help me to learn.

Regarding the bugs, I would expect some suggestion of cases (and not to base your answer on feelings). However, what I can do, is to add all other team members cases to see if it works well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding comments, it is important to comment the things that are not obvious, and not comment the things that are obvious. Putting a comment on a constructor saying it's a constructor is not helpful. Having a tricky method not make clear what its effect is supposed to be is also unhelpful. It's much easier to see if a method does what the method comment says it's supposed to do than to try to work out what it does.

I would say a good rule of thumb for method comments is that there should be a clear method comment for any method that it's not obvious what it's doing from a quick glance. I think you are putting those comments in, but I'm not finding them helpful.

I'll add some code review comments to your method comments to try to make it clearer what I'm expecting.

As for bugs, I've found bugs in everyone else's implementation of this assignment. It is tricky. Perhaps we should concentrate on making the code understandable first, and return to looking for bugs later?

Copy link
Owner Author

@karineek karineek Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wish me to learn something, you have to be more specific.
I changed the code, and did try to add comments, if there is still a mysterious method for you, please point it out...

Obvious or not obvious things is a very personal matter. Do you more like comments at the top of the method? on each line? before a condition? I do really try to generalise a rule.

@karineek
Copy link
Owner Author

I committed the recent changes

Copy link
Collaborator

@simon-frankau simon-frankau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to be more concrete about the method comments, and also dig a little bit into why I suspect that there are corner cases that don't work, and why the algorithm is unclear to me.

}

return null;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the two, the first one is better. Both Aimee and I prefer this.

If you can write your code so that it reads "if (alienWords == null) return null;" and the case alienWords.length == 0 is naturally handled by the code body, that's even better.

/*
* Input: a dictionary (a list of words in lexicographic order) of all words in an unknown/invented language,
* Output: alphabet (an ordered list of characters) of that language, letters we cannot decide are appended in the end.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you can express this purely in terms of "input" and "output". I think it's better to say what it's doing and then express the input/output in terms of that. e.g.

Infer an alphabet (ordered list of characters) from a dictionary (list of words in lexicographic order).
Input: ...
Output: ...

Note that "letters we cannot decide are appended in the end" is confusing - it sounds like all ambiguous characters get put at the end, but ambiguity can occur anywhere in the alphabet. D, XA, XC, XD, YA, YB, YD would mean that we know that B and C are between A and D (i.e. can't be appended at the end), but we don't know the relative ordering of B and C.

*
* Note: Assume the input is valid here!
*/
private List<Character> getBasicAlphaBeit(String[] words, int prefixSize) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the clearer methods, but it's still a bit messy.

Perhaps call it "inferAlphabetFromCommonPrefix"?
/* Given an array of words that all match up to a common prefix length, infer an alphabetical ordering of the characters based on the order of the first differing character. No check is made that the common prefix really is the same across the list. */

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure np.

So what is missing here, is using the same terms that the text in the assignment used?

}
}

private boolean addToDictByPrefix(String[] alienWords, List<Character> dict) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is the core of the algorithm, but it's not commented. What's the return value mean? How does it decide when it's done? Even noting that it's modifying dict in-place helps people understand what's going on.

}

/*
* Gives the prefix of a char that isn't in the dictionary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be clearer to either "Give the index of the first character not in the dictionary", or "Gives the length of the longest prefix that consists of dictionary characters".

/*
* Input: the original set of words, current word we are working on, its prefix size
* Output: the set of the words with this prefix
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of this method is actually pretty clear to me, although I'd still prefer to have it explained in words, rather than via "Input: ... Output: ...".

"Output: the set of the words with this prefix" is mildly misleading - "set" implies a lack of ordering, while in this context order is very important. "Ordered list of words with this prefix" is less ambiguous.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps me to remember the technical part of it. But I added the whole description as what is the algorithm or process. Thanks!

private boolean addToDictByPrefix(String[] alienWords, List<Character> dict) {
boolean hasReachMaxPrefix = true;
for (int i=0; i < alienWords.length; i++) {
int prefixSize = getSizeOfPrefix(alienWords[i], dict);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you build a prefix up to the first unknown letter? Even if you've seen a letter before, its next usage may still give you more information.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changed quite a lot, I tried as possible to make it more clear and more general (you had some comment earlier regarding this). The idea here was to only pass on letters which we don't know their location.

It wasn't such a good idea, since anyhow we need to go over all the combination and then we need an outer loop, in addition it can be buggy in case we need to move the letter to be before, if you already push it to the end (if I remember right). It is a bit too complicated. I agree :-)

In the new version I removed it completely and remove the outer while loop...

* Get two dictionary with common latters and merge them into the first dict,
* if none, cannot know, add all these cases later on
*/
private void mergeDict(List<Character> dict1, List<Character> dict2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are potential alphabets, not dictionaries. I guess the comment could be:

/* Given two sub-alphabets, generates an (arbitrary) merged alphabet which is consistent with both of them. */

I think this method is the core of my problem with the algorithm. Given two sub-alphabets (e.g. ABD, ACD) there may be multiple potential alphabets that satisfy both (e.g. ABCD, ACBD). This algorithm will choose one, and then a later pass may reveal whether the ordering is actually BC or CB.

In other words, this method takes two orderings and produces a potential ordering, but it might be the wrong one based on information revealed later.

Your code may do subtle things to make this work, but I see no comments or structures in the code to suggest you've addressed this issue.

return null;
}

private List<Character> extractAlphabeitFromWords(String[] alienWords) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lack of comment here is pretty reasonable, given it's called just once from a nearby wrapper method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants