-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Describe the bug
handling unicode has some bugs and weird behaviors:
- parsing unicode with double escape chars. this makes it impossible to pass strings with the pattern "\uXXXX"
- any escape sequence, not only unicodes: string within triple quotes are still parsed the same as single quotes (possible ruby related?)
Versions
graphql version: 2.0.19
rails (or other framework): 6.0.4.7
other applicable versions (graphql-batch, etc): none
GraphQL schema
in both the examples below i used the following simple resolver.
note that the result is the rendered result to the client, so there are escape characters added.
i also printed the string passed to the resolver char by char to make sure i'm not making mistakes:
class Resolvers::GetString < GraphQL::Schema::Resolver
argument :string, String, "the name to return"
type String, null: false
def resolve(**arguments)
puts "***"
puts "string.length: #{arguments[:string].length}"
puts "string value: \"#{arguments[:string]}\""
# printing the chars one by one
i = 0
arguments[:string].each_char do |c|
puts "string[#{i}] = \"#{c}\""
i += 1
end
puts "***"
arguments[:string]
end
end
class Types::QueryType < Types::BaseObject
field :get_string, resolver: Resolvers::GetString, description: "Just return the given string"
end
class ApplicationSchema < GraphQL::Schema
query Types::QueryType
endGraphQL query
Example GraphQL query and response (if query execution is involved)
example for bug 1:
query bug1 {
example1: get_string(string: "\u0064") # correct: should return "d"
example2: get_string(string: "\\u0064") # incorrect: should return "\\u0064"
# example3: get_string(string: "\u006") # correct: validation error
example4: get_string(string: "\\u006") # correct: should return "\\u006"
}the log:
***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 5
string value: "\u006"
string[0] = "\"
string[1] = "u"
string[2] = "0"
string[3] = "0"
string[4] = "6"
***
the result:
{
"data": {
"example1": "d",
"example2": "d",
"example4": "\\u006"
}
}example for bug 2
query bug2 {
# example1: get_string(string: """\a""") # incorrect (fails): should be "\\a"
# example2: get_string(string: """\u006""") # incorrect (fails): should be "\\u006"
example3: get_string(string: """\n""") # incorrect: should be "\\n"
example4: get_string(string: """\u0064""") # incorrect: should be "\\u0064"
}the log:
***
string.length: 1
string value: "
"
string[0] = "
"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***
the result:
{
"data": {
"example3": "\n",
"example4": "d"
}
}Steps to reproduce
Steps to reproduce the behavior
Expected behavior
parsing string arguments:
"\\u0064" => "\u0064""""\u0064""" => "\u0064"
Actual behavior
parsing string arguments:
"\\u0064" => "n""""\u0064""" => "n"
Click to view exception backtrace
n/aAdditional context
I saw an old version changing the behavior of \\u back and forth, stating it was reverted because of failing other parsers test suite:
#372
it mentions a test suite in graphql-js in this url (it moved, this is the updated link)
note: in the tests, the string that are written are parsed by the language parser (js). so, for example, the string '\u{1234}' is actually passed to the parser as ሴ, while the string \\u{1234} is passed as the characters \, u, {, etc.
the tests there behave as i the expected behavior i described. examples:
- in this test the parser is given the characters:
"unicode \u1234\u5678\u90AB\uCDEF", and it returns the characters:unicode ሴ噸邫췯
expect(lexOne('"unicode \\u1234\\u5678\\u90AB\\uCDEF"')).to.contain({
kind: TokenKind.STRING,
start: 0,
end: 34,
value: 'unicode \u1234\u5678\u90AB\uCDEF',
});
this works the same as ruby.
- in this test the parser is given the characters:
"escaped \n\r\b\t\f"and returnsescapedfollowed by newline, carriage return, etc.
expect(lexOne('"escaped \\n\\r\\b\\t\\f"')).to.contain({
kind: TokenKind.STRING,
start: 0,
end: 20,
value: 'escaped \n\r\b\t\f',
});
this works the same as ruby.
- in this test the parser is given the characters:
"""unescaped \n\r\b\t\f\u1234"""and returns the characters:unescaped \n\r\b\t\f\u1234
expect(lexOne('"""unescaped \\n\\r\\b\\t\\f\\u1234"""')).to.contain({
kind: TokenKind.BLOCK_STRING,
start: 0,
end: 32,
line: 1,
column: 1,
value: 'unescaped \\n\\r\\b\\t\\f\\u1234',
});
in ruby the chars are converted to special characters, same as 2 - which is a bug.
- to check the escape sequence of unicodes, i cloned the repo and added another local test.
in the test i passed the characters:"unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE"and the result is the characters:unicode \u{1234}\u{5678}\u{90AB}\u{CDE. the unicode is not parsed, and the escape of the\before theuworks.
expect(
lexOne('"escaped unicode \\\\u{1234}\\\\u{5678}\\\\u{90AB}\\\\u{CDE"'),
).to.contain({
kind: TokenKind.STRING,
start: 0,
end: 52,
value: 'escaped unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE',
});
in ruby the first 3 unicodes are parsed, and the last one is not. the escape of the \ before the u does not work.