Escape sequences in string arguments - unicode and others

**Describe the bug**
handling unicode has some bugs and weird behaviors:
1. parsing unicode with double escape chars. this makes it impossible to pass strings with the pattern "\uXXXX"
2. any escape sequence, not only unicodes: string within triple quotes are still parsed the same as single quotes (possible ruby related?)


**Versions**

`graphql` version: 2.0.19
`rails` (or other framework): 6.0.4.7
other applicable versions (`graphql-batch`, etc): none

**GraphQL schema**

in both the examples below i used the following simple resolver.
note that the result is the rendered result to the client, so there are escape characters added.
i also printed the string passed to the resolver char by char to make sure i'm not making mistakes:


```ruby
class Resolvers::GetString < GraphQL::Schema::Resolver
  argument :string, String, "the name to return"
  type String, null: false

  def resolve(**arguments)
    puts "***"
    puts "string.length: #{arguments[:string].length}"
    puts "string value: \"#{arguments[:string]}\""
    # printing the chars one by one
    i = 0
    arguments[:string].each_char do |c|
      puts "string[#{i}] = \"#{c}\""
      i += 1
    end
    puts "***"
    arguments[:string]
  end
end

class Types::QueryType < Types::BaseObject
  field :get_string, resolver: Resolvers::GetString, description: "Just return the given string"
end


class ApplicationSchema < GraphQL::Schema
  query Types::QueryType
end
```

**GraphQL query**

Example GraphQL query and response (if query execution is involved)

example for bug 1:

```graphql
query bug1 {
  example1: get_string(string: "\u0064") # correct: should return "d"
  example2: get_string(string: "\\u0064") # incorrect: should return "\\u0064"
  # example3: get_string(string: "\u006") # correct: validation error
  example4: get_string(string: "\\u006") # correct: should return "\\u006"
}
```

the log:

```
***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 5
string value: "\u006"
string[0] = "\"
string[1] = "u"
string[2] = "0"
string[3] = "0"
string[4] = "6"
***
```

the result:

```json
{
  "data": {
    "example1": "d",
    "example2": "d",
    "example4": "\\u006"
  }
}
```

example for bug 2

```graphql
query bug2 {
  # example1: get_string(string: """\a""") # incorrect (fails): should be "\\a"
  # example2: get_string(string: """\u006""") # incorrect (fails): should be "\\u006"
  example3: get_string(string: """\n""") # incorrect: should be "\\n"
  example4: get_string(string: """\u0064""") # incorrect: should be "\\u0064"
}
```

the log:
```
***
string.length: 1
string value: "
"
string[0] = "
"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***
```


the result:

```json
{
  "data": {
    "example3": "\n",
    "example4": "d"
  }
}
```


**Steps to reproduce**

Steps to reproduce the behavior

**Expected behavior**

parsing string arguments:
1. `"\\u0064" => "\u0064"`
2. `"""\u0064""" => "\u0064"`

**Actual behavior**

parsing string arguments:
1. `"\\u0064" => "n"`
2. `"""\u0064""" => "n"`

<details>
<summary>Click to view exception backtrace</summary>
n/a

</details>

**Additional context**
I saw an old version changing the behavior of `\\u` back and forth, stating it was reverted because of failing other parsers test suite:
https://github.com/rmosolgo/graphql-ruby/pull/372
it mentions a test suite in graphql-js in [this url](https://github.com/graphql/graphql-js/blob/688ee2f2a2f938e80bcf3808bc435d3d87a3ff3e/src/language/__tests__/lexer-test.ts) (it moved, this is the updated link)
**note**: in the tests, the string that are written are parsed by the language parser (js). so, for example, the string `'\u{1234}'` is actually passed to the parser as `ሴ`, while the string `\\u{1234}` is passed as the characters `\`, `u`, `{`, etc.

the tests there behave as i the expected behavior i described. examples:
1. in this test the parser is given the characters: `"unicode \u1234\u5678\u90AB\uCDEF"`, and it returns the characters: `unicode ሴ噸邫췯`
```
    expect(lexOne('"unicode \\u1234\\u5678\\u90AB\\uCDEF"')).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 34,
      value: 'unicode \u1234\u5678\u90AB\uCDEF',
    });
```
this works the same as ruby.

2. in this test the parser is given the characters: `"escaped \n\r\b\t\f"` and returns `escaped ` followed by newline, carriage return, etc.
```
    expect(lexOne('"escaped \\n\\r\\b\\t\\f"')).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 20,
      value: 'escaped \n\r\b\t\f',
    });
```
this works the same as ruby.

3. in this test the parser is given the characters: `"""unescaped \n\r\b\t\f\u1234"""` and returns the characters: `unescaped \n\r\b\t\f\u1234`
```
    expect(lexOne('"""unescaped \\n\\r\\b\\t\\f\\u1234"""')).to.contain({
      kind: TokenKind.BLOCK_STRING,
      start: 0,
      end: 32,
      line: 1,
      column: 1,
      value: 'unescaped \\n\\r\\b\\t\\f\\u1234',
    });
```
in ruby the chars are converted to special characters, same as 2 - which is a bug.

4. to check the escape sequence of unicodes, i cloned the repo and added another local test.
in the test i passed the characters: `"unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE"` and the result is the characters: `unicode \u{1234}\u{5678}\u{90AB}\u{CDE`. the unicode is not parsed, and the escape of the `\` before the `u` works.
```
    expect(
      lexOne('"escaped unicode \\\\u{1234}\\\\u{5678}\\\\u{90AB}\\\\u{CDE"'),
    ).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 52,
      value: 'escaped unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE',
    });
```
in ruby the first 3 unicodes are parsed, and the last one is not. the escape of the `\` before the `u` does not work.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escape sequences in string arguments - unicode and others #4721

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Escape sequences in string arguments - unicode and others #4721

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions