Assignment 5 - code review by karineek · Pull Request #8 · karineek/KEM_CodeU_Assignments

karineek · 2017-07-09T09:58:50Z

No description provided.

AimeeBorda · 2017-07-09T11:57:34Z

CodeU_Assignement_5/src/codeu_assignement_5/kemUnknownLanguage.java

+        {
+            /* Get the current order by prefix size, starting from 0 */
+            dict = getBasicAlphaBeit(alienWords, 0); /* Basic Alphabeit */ 
+            /* A,R,C from the example! */


inline the declaration of dict and return null outside the if statement. Alternatively, have an if statement at the top if(alienWords == null || alienWords.length == 0) return null; to get rid of one layer of curly brackets

Yes, I'll split it to two methods, to avoid this.

AimeeBorda · 2017-07-09T11:58:13Z

CodeU_Assignement_5/src/codeu_assignement_5/kemUnknownLanguage.java

+     * Note: Assume the input is valid here!
+    */
+    private List<Character> getBasicAlphaBeit(String[] words, int prefixSize)
+    {


the name of the parameter is misleading I would use index

For which of these would you use index? are the words are the index or the current prefix size?

prefixSize as it sort of implies that you are getting the alphabet for prefixes rather than position, does it make sense?

Yes, it is a prefix, since you compare all the prefix up to some size...

AimeeBorda · 2017-07-09T11:58:25Z

CodeU_Assignement_5/src/codeu_assignement_5/kemUnknownLanguage.java

+        {
+            return 0;
+        }
+


This is a private method so you control what gets passed - I would be tempted to get rid of the parameter check

AimeeBorda · 2017-07-09T11:58:39Z

CodeU_Assignement_5/src/codeu_assignement_5/kemUnknownLanguage.java

+            }
+
+            tempDict.add(0,alienWords[i]); 
+        }


The if condition is used to determine if the loop should terminate and hence should be added to within the for loop signature i.e. for (int i=curr+1; i < alienWords.length && alienWords[i].startsWith(prefix); i++)

I prefer cleaner loops, is this part of the code standards of google?

oh no, just a personal vendetta against break that's it :) so happy if you just not agree :)

AimeeBorda · 2017-07-09T11:58:52Z

CodeU_Assignement_5/src/codeu_assignement_5/kemUnknownLanguage.java

+        }
+
+        return ret;
+    }


Rather than the for loop this can be return tempDict.toArray(new String[tempDict.size()]);

AimeeBorda · 2017-07-09T12:01:10Z

CodeU_Assignement_5/src/codeu_assignement_5/kemUnknownLanguage.java

+ * @author Karine
+ */
+public class kemUnknownLanguage 
+{


class names should start with a capital letter and the code should adhere to some styling guideline e.g. curly brackets should open on the same line in java

AimeeBorda · 2017-07-09T12:18:44Z

CodeU_Assignement_5/src/codeu_assignement_5/kemUnknownLanguage.java

+                }
+            }
+        }
+


I managed to find some counter-examples. For example
["ART", "RAT", "CTT", "CTA", "CRA"]

In this example, the first iteration gives a dict = [A,R,C], then when you go into the second letter T needs to be before R (CTA > CRA) so dict = [A, T, R, C] but in the third iteration T needs to be before A (CTT > CTA)

Also, I seem to be getting an IndexOutOfBoundsException when words are followed by shorter words

Can you give the example of indexOutOfBound? tests 3,4,5 are with different lengths (longer after shorter and shorter after longer), so I am a bit confused...

No sure we shall continue for the third iteration...
Also what happen in this case:
["ART", "RAT", "CTTA", "CTTT", "CTA", "CRA"]
I think the "legal input" was a bit of an open question here...

tbh the example for the IndexOutOfBounds might be open ended as well"AC", "DR", "D", "TAR", "TAC", "TAD"

problem with rename of a file

simon-frankau

I'm afraid I don't understand the algorithm you're writing, and would like to see it rewritten/re-commented so that it's clearer, please.

simon-frankau · 2017-07-18T20:53:26Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+        }
+
+        return null;
+    }


The more usual structure is to have early-out error conditions in the if statement, rather than the main logic inside the block with a 'return null' outside. This allows people to read the main body having got the edge-case-checks out of the way.

I generally try to structure my code to work well in the presence of corner cases, without requiring special handling. So, if you can write your code to still work without the "alienWords.length > 0" check, that's usually better style.

Yes, I was trying to fix it several time. I will revert it to the original code.
The other reviewer also comment on this section, why don't you comment together on the same place? else it is very hard (for me) to get an agreement between the two reviewers...

You made changes to the code after the other reviewer reviewed. When I review the latest version of the code, I don't see the comments that were attached to previous versions.

In this specific case, I agree with Aimee's comment - her suggestion to use "if(alienWords == null || alienWords.length == 0) return null;" is exactly what I'm suggesting, too.

Now I'm confused. Which version is better?

if(alienWords == null || alienWords.length == 0) return null;
=== VS. ===
List alphabet = null; if ((alienWords != null) && (alienWords.length > 0)) { .... } return alphabet;

Or do you have another idea?

Of the two, the first one is better. Both Aimee and I prefer this.

If you can write your code so that it reads "if (alienWords == null) return null;" and the case alienWords.length == 0 is naturally handled by the code body, that's even better.

OK, I'll change it to that

simon-frankau · 2017-07-18T20:57:00Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+
+    private List<Character> extractAlphabeitFromWords(String[] alienWords) {
+        /* Get the current order by prefix size, starting from 0 */
+        List<Character> dict = getBasicAlphaBeit(alienWords, 0); /* Basic Alphabeit */ 


"dict" is a somewhat confusing name - it is neither a dictionary in the lexicographical sense, not in the computer science sense. "alphabet" might be better.

simon-frankau · 2017-07-18T22:21:05Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+        }
+
+        return dict;
+    }


I'm afraid I just don't understand the algorithm you're attempting here, so I'm stopping the review at this point.

Reading code shouldn't be hard work, even if it's harder to write to make it easy to read.

I have a strong suspicion that the code will have cases that don't work - what I can make out doesn't suggest an algorithm that I think will work in all corner cases (having done a couple of code reviews with nasty corner case failures for other mentees already).

What I suggest is to extend the commenting on the code to express what you're expecting at each stage of the algorithm, any invariants you might be wanting to maintain, etc. If you can break your algorithm into a clear pipeline of steps, that will also help. Take a look at the code written by the others to get ideas on how to cleanly express your intent.

I do really try to learn what is the right level of comments. Before, you've said that "the code shall document itself", now, that there shall be comments describing the code.

It is for sure, will not help me to learn.

Regarding the bugs, I would expect some suggestion of cases (and not to base your answer on feelings). However, what I can do, is to add all other team members cases to see if it works well.

Regarding comments, it is important to comment the things that are not obvious, and not comment the things that are obvious. Putting a comment on a constructor saying it's a constructor is not helpful. Having a tricky method not make clear what its effect is supposed to be is also unhelpful. It's much easier to see if a method does what the method comment says it's supposed to do than to try to work out what it does.

I would say a good rule of thumb for method comments is that there should be a clear method comment for any method that it's not obvious what it's doing from a quick glance. I think you are putting those comments in, but I'm not finding them helpful.

I'll add some code review comments to your method comments to try to make it clearer what I'm expecting.

As for bugs, I've found bugs in everyone else's implementation of this assignment. It is tricky. Perhaps we should concentrate on making the code understandable first, and return to looking for bugs later?

If you wish me to learn something, you have to be more specific.
I changed the code, and did try to add comments, if there is still a mysterious method for you, please point it out...

Obvious or not obvious things is a very personal matter. Do you more like comments at the top of the method? on each line? before a condition? I do really try to generalise a rule.

for the public methods of the algorithm

karineek · 2017-07-19T12:38:38Z

I committed the recent changes

simon-frankau

I've tried to be more concrete about the method comments, and also dig a little bit into why I suspect that there are corner cases that don't work, and why the algorithm is unclear to me.

simon-frankau · 2017-07-19T13:11:58Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+        }
+
+        return null;
+    }


Of the two, the first one is better. Both Aimee and I prefer this.

If you can write your code so that it reads "if (alienWords == null) return null;" and the case alienWords.length == 0 is naturally handled by the code body, that's even better.

simon-frankau · 2017-07-19T13:17:00Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+    /*
+     * Input: a dictionary (a list of words in lexicographic order) of all words in an unknown/invented language, 
+    *  Output: alphabet (an ordered list of characters) of that language, letters we cannot decide are appended in the end.
+    */


While you can express this purely in terms of "input" and "output". I think it's better to say what it's doing and then express the input/output in terms of that. e.g.

Infer an alphabet (ordered list of characters) from a dictionary (list of words in lexicographic order).
Input: ...
Output: ...

Note that "letters we cannot decide are appended in the end" is confusing - it sounds like all ambiguous characters get put at the end, but ambiguity can occur anywhere in the alphabet. D, XA, XC, XD, YA, YB, YD would mean that we know that B and C are between A and D (i.e. can't be appended at the end), but we don't know the relative ordering of B and C.

simon-frankau · 2017-07-19T13:22:00Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+     *
+     * Note: Assume the input is valid here!
+    */
+    private List<Character> getBasicAlphaBeit(String[] words, int prefixSize) {


This is one of the clearer methods, but it's still a bit messy.

Perhaps call it "inferAlphabetFromCommonPrefix"?
/* Given an array of words that all match up to a common prefix length, infer an alphabetical ordering of the characters based on the order of the first differing character. No check is made that the common prefix really is the same across the list. */

Yes, sure np.

So what is missing here, is using the same terms that the text in the assignment used?

simon-frankau · 2017-07-19T13:26:24Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+        }
+    }
+
+    private boolean addToDictByPrefix(String[] alienWords, List<Character> dict) {


This method is the core of the algorithm, but it's not commented. What's the return value mean? How does it decide when it's done? Even noting that it's modifying dict in-place helps people understand what's going on.

simon-frankau · 2017-07-19T13:28:22Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+    }
+
+    /*
+     * Gives the prefix of a char that isn't in the dictionary


It would be clearer to either "Give the index of the first character not in the dictionary", or "Gives the length of the longest prefix that consists of dictionary characters".

simon-frankau · 2017-07-19T13:30:49Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+    /*
+     * Input: the original set of words, current word we are working on, its prefix size
+     * Output: the set of the words with this prefix
+    */


The intention of this method is actually pretty clear to me, although I'd still prefer to have it explained in words, rather than via "Input: ... Output: ...".

"Output: the set of the words with this prefix" is mildly misleading - "set" implies a lack of ordering, while in this context order is very important. "Ordered list of words with this prefix" is less ambiguous.

It helps me to remember the technical part of it. But I added the whole description as what is the algorithm or process. Thanks!

simon-frankau · 2017-07-19T13:33:32Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+    private boolean addToDictByPrefix(String[] alienWords, List<Character> dict) {
+        boolean hasReachMaxPrefix = true;
+        for (int i=0; i < alienWords.length; i++) {
+            int prefixSize = getSizeOfPrefix(alienWords[i], dict);


Why do you build a prefix up to the first unknown letter? Even if you've seen a letter before, its next usage may still give you more information.

The code changed quite a lot, I tried as possible to make it more clear and more general (you had some comment earlier regarding this). The idea here was to only pass on letters which we don't know their location.

It wasn't such a good idea, since anyhow we need to go over all the combination and then we need an outer loop, in addition it can be buggy in case we need to move the letter to be before, if you already push it to the end (if I remember right). It is a bit too complicated. I agree :-)

In the new version I removed it completely and remove the outer while loop...

simon-frankau · 2017-07-19T13:42:56Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+     * Get two dictionary with common latters and merge them into the first dict,
+     * if none, cannot know, add all these cases later on
+    */
+    private void mergeDict(List<Character> dict1, List<Character> dict2) {


These are potential alphabets, not dictionaries. I guess the comment could be:

/* Given two sub-alphabets, generates an (arbitrary) merged alphabet which is consistent with both of them. */

I think this method is the core of my problem with the algorithm. Given two sub-alphabets (e.g. ABD, ACD) there may be multiple potential alphabets that satisfy both (e.g. ABCD, ACBD). This algorithm will choose one, and then a later pass may reveal whether the ordering is actually BC or CB.

In other words, this method takes two orderings and produces a potential ordering, but it might be the wrong one based on information revealed later.

Your code may do subtle things to make this work, but I see no comments or structures in the code to suggest you've addressed this issue.

simon-frankau · 2017-07-19T13:43:39Z

CodeU_Assignement_5/src/codeu_assignement_5/KemUnknownLanguage.java

+        return null;
+    }
+
+    private List<Character> extractAlphabeitFromWords(String[] alienWords) {


I think the lack of comment here is pretty reasonable, given it's called just once from a nearby wrapper method.

Karine and others added 7 commits July 5, 2017 19:01

assignment 5, second commit

1ed2e91

remove env. files

ce33dca

remove env. files

a40d4e2

remove env. files

101c856

remove env. files

7d5e18a

remove env. files

32c04d4

remove env. files

637ef21

karineek requested review from 0xlbr, AimeeBorda, NataliaDymnikova, alice006, simon-frankau and veronikakolejak July 9, 2017 09:59

AimeeBorda requested changes Jul 9, 2017

View reviewed changes

Karine and others added 10 commits July 11, 2017 11:32

assignment 5, fix after code review

3875d6f

Merge origin/master

8b9c545

assignment 5, fix build

a35bdb9

assignment 5, fix build - second try

95b497a

assignment 5, fix build - second try

dd83412

brute force fix of commits

9966ad7

problem with rename of a file

brute force fix of commits

e029e22

problem with rename of a file

brute force fix of commits

9f039b5

problem with rename of a file

brute force fix of commits

c1d102c

problem with rename of a file

assignment 5, fix error from code review

435aae2

simon-frankau requested changes Jul 18, 2017

View reviewed changes

Karine added 3 commits July 19, 2017 08:53

assignment 5, revert back to 1ed2e91

5094dfe

for the public methods of the algorithm

rename dict to alphabet

e927bb8

fix for recent code review

fd92daa

simon-frankau reviewed Jul 19, 2017

View reviewed changes

Karine added 4 commits July 19, 2017 21:06

first commit, 6

3921b2e

Basic working project for summit

defbadf

code review's fix

edade50

code review's fix

a077a09

Conversation

karineek commented Jul 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karineek Jul 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simon-frankau left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karineek Jul 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karineek Jul 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karineek commented Jul 19, 2017

Uh oh!

simon-frankau left a comment

Choose a reason for hiding this comment

karineek Jul 11, 2017 •

edited

Loading

karineek Jul 19, 2017 •

edited

Loading

karineek Jul 19, 2017 •

edited

Loading