-
Notifications
You must be signed in to change notification settings - Fork 77
Add internal DataFrame.toCode utility to help convert examples #1603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| schema().print() | ||
| } | ||
|
|
||
| fun DataFrame<*>.toCode(variableName: String = "df"): String = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Could this return a
CodeString? similar togenerateInterfaces()etc. - I was a bit worried about the name, but now I look in the autocomplete; among
toHtml(),toList(),toJson(), "toCode()" is not that cluttering or weird :) (actually, it's thegenerateX()functions that stand out...) - I do think that the
val %variableName =in the beginning should be omitted. We probably can't assume that people who want to use it will use it solely as an assignment. Maybe they want to use the generated dataframe-creating code to be an expression, like:
exec("val list = listOf(${df.toCode()})")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this return a CodeString? similar to generateInterfaces() etc.
I think yes. I'd also consider CodeString.copyToClipboard()
I do think that the val %variableName = in the beginning should be omitted.
What if nullable parameter: variableName: String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea for both :)
| } | ||
|
|
||
| private fun StringBuilder.appendColumn(column: DataColumn<Any?>, pad: String) { | ||
| append("$pad\"${column.name()}\" to columnOf(") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's safer to include the type in columnOf explicitly :) It solves a lot of the .toLiteral() cases already, like columnOf<Long>(1, 2, 3), but even ones you don't yet catch, like columnOf<Short>(1, 2, 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add Short and other literals, would prefer to avoid redundant type parameter so it's easy to copy-paste as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we can somehow guarantee that the provided literals/class initializations are always the expected type, it will be fine. Maybe we could also parameterize this option if we actually want to promote this method to the public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll setup extensive testing with Kotest https://kotest.io/docs/proptest/property-based-testing.html and Jupyter Kernel to execute generated code and compare resulting instance to original + compilation
| append("),\n") | ||
| } | ||
|
|
||
| private fun Any?.toLiteral(): String = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can supply a KType to the function, which saves instance checks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think instance check is very cheap on JVM and we have a smartcast as a result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you prefer cheap or free? ;P but yeah, the smart casts are nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How free? :o Maybe i don't get the idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep instance checks, i added a lot of branches where smart casts are useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's free because you get the values from a DataColumn where the KType is already known. So if we would supply this KType to toLiteral() it's simply an equality check instead of an instance check.
| is Char -> "'$this'" | ||
| is Long -> "${this}L" | ||
| is Float -> "${this}f" | ||
| else -> toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lists might be another nice (and commonly used) addition. Maybe maps?
Actually, that does point out a limitation of the current implementation; it's not extensible and easy to break with unsupported types.
Maybe we could add a lambda to this function so people can supply their own 'toCode renderer'. Something like
typealias ValueToCodeRenderer = (value: Any?, type: KType) -> CodeString
val toStringRenderer: ValueToCodeRenderer = { value, _ -> value.toString() }
fun DataFrame<*>.toCode(customRenderer: ValueToCodeRenderer = toStringRenderer): CodeStringwdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List and Map seem reasonable by default. Code renderer too
dataFrameOf is compiler plugin-friendly way to serialize dataframes without FrameColumns, but converting existing row-based dataFrameOf(names)(values), etc, is bothersome. This API could help to quickly go from notebook to gradle project or from tests to project
I put it in tests module with intention to maybe later expose it as API