Skip to content

Escape sequences in string arguments - unicode and others #4721

@Eli-Siegel1

Description

@Eli-Siegel1

Describe the bug
handling unicode has some bugs and weird behaviors:

  1. parsing unicode with double escape chars. this makes it impossible to pass strings with the pattern "\uXXXX"
  2. any escape sequence, not only unicodes: string within triple quotes are still parsed the same as single quotes (possible ruby related?)

Versions

graphql version: 2.0.19
rails (or other framework): 6.0.4.7
other applicable versions (graphql-batch, etc): none

GraphQL schema

in both the examples below i used the following simple resolver.
note that the result is the rendered result to the client, so there are escape characters added.
i also printed the string passed to the resolver char by char to make sure i'm not making mistakes:

class Resolvers::GetString < GraphQL::Schema::Resolver
  argument :string, String, "the name to return"
  type String, null: false

  def resolve(**arguments)
    puts "***"
    puts "string.length: #{arguments[:string].length}"
    puts "string value: \"#{arguments[:string]}\""
    # printing the chars one by one
    i = 0
    arguments[:string].each_char do |c|
      puts "string[#{i}] = \"#{c}\""
      i += 1
    end
    puts "***"
    arguments[:string]
  end
end

class Types::QueryType < Types::BaseObject
  field :get_string, resolver: Resolvers::GetString, description: "Just return the given string"
end


class ApplicationSchema < GraphQL::Schema
  query Types::QueryType
end

GraphQL query

Example GraphQL query and response (if query execution is involved)

example for bug 1:

query bug1 {
  example1: get_string(string: "\u0064") # correct: should return "d"
  example2: get_string(string: "\\u0064") # incorrect: should return "\\u0064"
  # example3: get_string(string: "\u006") # correct: validation error
  example4: get_string(string: "\\u006") # correct: should return "\\u006"
}

the log:

***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 5
string value: "\u006"
string[0] = "\"
string[1] = "u"
string[2] = "0"
string[3] = "0"
string[4] = "6"
***

the result:

{
  "data": {
    "example1": "d",
    "example2": "d",
    "example4": "\\u006"
  }
}

example for bug 2

query bug2 {
  # example1: get_string(string: """\a""") # incorrect (fails): should be "\\a"
  # example2: get_string(string: """\u006""") # incorrect (fails): should be "\\u006"
  example3: get_string(string: """\n""") # incorrect: should be "\\n"
  example4: get_string(string: """\u0064""") # incorrect: should be "\\u0064"
}

the log:

***
string.length: 1
string value: "
"
string[0] = "
"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***

the result:

{
  "data": {
    "example3": "\n",
    "example4": "d"
  }
}

Steps to reproduce

Steps to reproduce the behavior

Expected behavior

parsing string arguments:

  1. "\\u0064" => "\u0064"
  2. """\u0064""" => "\u0064"

Actual behavior

parsing string arguments:

  1. "\\u0064" => "n"
  2. """\u0064""" => "n"
Click to view exception backtrace n/a

Additional context
I saw an old version changing the behavior of \\u back and forth, stating it was reverted because of failing other parsers test suite:
#372
it mentions a test suite in graphql-js in this url (it moved, this is the updated link)
note: in the tests, the string that are written are parsed by the language parser (js). so, for example, the string '\u{1234}' is actually passed to the parser as , while the string \\u{1234} is passed as the characters \, u, {, etc.

the tests there behave as i the expected behavior i described. examples:

  1. in this test the parser is given the characters: "unicode \u1234\u5678\u90AB\uCDEF", and it returns the characters: unicode ሴ噸邫췯
    expect(lexOne('"unicode \\u1234\\u5678\\u90AB\\uCDEF"')).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 34,
      value: 'unicode \u1234\u5678\u90AB\uCDEF',
    });

this works the same as ruby.

  1. in this test the parser is given the characters: "escaped \n\r\b\t\f" and returns escaped followed by newline, carriage return, etc.
    expect(lexOne('"escaped \\n\\r\\b\\t\\f"')).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 20,
      value: 'escaped \n\r\b\t\f',
    });

this works the same as ruby.

  1. in this test the parser is given the characters: """unescaped \n\r\b\t\f\u1234""" and returns the characters: unescaped \n\r\b\t\f\u1234
    expect(lexOne('"""unescaped \\n\\r\\b\\t\\f\\u1234"""')).to.contain({
      kind: TokenKind.BLOCK_STRING,
      start: 0,
      end: 32,
      line: 1,
      column: 1,
      value: 'unescaped \\n\\r\\b\\t\\f\\u1234',
    });

in ruby the chars are converted to special characters, same as 2 - which is a bug.

  1. to check the escape sequence of unicodes, i cloned the repo and added another local test.
    in the test i passed the characters: "unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE" and the result is the characters: unicode \u{1234}\u{5678}\u{90AB}\u{CDE. the unicode is not parsed, and the escape of the \ before the u works.
    expect(
      lexOne('"escaped unicode \\\\u{1234}\\\\u{5678}\\\\u{90AB}\\\\u{CDE"'),
    ).to.contain({
      kind: TokenKind.STRING,
      start: 0,
      end: 52,
      value: 'escaped unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE',
    });

in ruby the first 3 unicodes are parsed, and the last one is not. the escape of the \ before the u does not work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions