-
Notifications
You must be signed in to change notification settings - Fork 77
Add internal DataFrame.toCode utility to help convert examples #1603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,9 @@ | ||
| package org.jetbrains.kotlinx.dataframe | ||
|
|
||
| import org.jetbrains.kotlinx.dataframe.api.getColumns | ||
| import org.jetbrains.kotlinx.dataframe.api.print | ||
| import org.jetbrains.kotlinx.dataframe.api.schema | ||
| import org.jetbrains.kotlinx.dataframe.columns.ColumnGroup | ||
| import org.jetbrains.kotlinx.dataframe.io.renderToString | ||
| import org.jetbrains.kotlinx.dataframe.types.UtilTests | ||
| import java.net.URL | ||
|
|
@@ -28,3 +30,45 @@ fun <T : DataFrame<*>> T.alsoDebug(println: String? = null, rowsLimit: Int = 20) | |
| print(borders = true, title = true, columnTypes = true, valueLimit = -1, rowsLimit = rowsLimit) | ||
| schema().print() | ||
| } | ||
|
|
||
| fun DataFrame<*>.toCode(variableName: String = "df"): String = | ||
| buildString { | ||
| append("val $variableName = dataFrameOf(\n") | ||
| appendColumns(this@toCode, indent = 1) | ||
| append(")") | ||
| } | ||
|
|
||
| private fun StringBuilder.appendColumns(df: DataFrame<*>, indent: Int) { | ||
| val pad = " ".repeat(indent) | ||
| df.getColumns { colsAtAnyDepth().simplify() }.forEach { col -> | ||
| if (col is ColumnGroup<*>) { | ||
| append("$pad\"${col.name()}\" to columnOf(\n") | ||
| appendColumns(col, indent + 1) | ||
| append("$pad),\n") | ||
| } else { | ||
| appendColumn(col, pad) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| private fun StringBuilder.appendColumn(column: DataColumn<Any?>, pad: String) { | ||
| append("$pad\"${column.name()}\" to columnOf(") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe it's safer to include the type in
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add Short and other literals, would prefer to avoid redundant type parameter so it's easy to copy-paste as is
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As long as we can somehow guarantee that the provided literals/class initializations are always the expected type, it will be fine. Maybe we could also parameterize this option if we actually want to promote this method to the public API?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll setup extensive testing with Kotest https://kotest.io/docs/proptest/property-based-testing.html and Jupyter Kernel to execute generated code and compare resulting instance to original + compilation |
||
| append(column.values().joinToString(", ") { it.toLiteral() }) | ||
| append("),\n") | ||
| } | ||
|
|
||
| private fun Any?.toLiteral(): String = | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can supply a KType to the function, which saves instance checks :)
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think instance check is very cheap on JVM and we have a smartcast as a result
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you prefer cheap or free? ;P but yeah, the smart casts are nice
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How free? :o Maybe i don't get the idea
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's keep instance checks, i added a lot of branches where smart casts are useful
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's free because you get the values from a |
||
| when (this) { | ||
| null -> "null" | ||
| is String -> "\"${escape()}\"" | ||
| is Char -> "'$this'" | ||
| is Long -> "${this}L" | ||
| is Float -> "${this}f" | ||
| else -> toString() | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lists might be another nice (and commonly used) addition. Maybe maps? Actually, that does point out a limitation of the current implementation; it's not extensible and easy to break with unsupported types. Maybe we could add a lambda to this function so people can supply their own 'toCode renderer'. Something like typealias ValueToCodeRenderer = (value: Any?, type: KType) -> CodeString
val toStringRenderer: ValueToCodeRenderer = { value, _ -> value.toString() }
fun DataFrame<*>.toCode(customRenderer: ValueToCodeRenderer = toStringRenderer): CodeStringwdyt?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. List and Map seem reasonable by default. Code renderer too |
||
| } | ||
|
|
||
| private fun String.escape() = | ||
| replace("\\", "\\\\") | ||
| .replace("\"", "\\\"") | ||
| .replace("\n", "\\n") | ||
| .replace("\t", "\\t") | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeString? similar togenerateInterfaces()etc.toHtml(),toList(),toJson(), "toCode()" is not that cluttering or weird :) (actually, it's thegenerateX()functions that stand out...)val %variableName =in the beginning should be omitted. We probably can't assume that people who want to use it will use it solely as an assignment. Maybe they want to use the generated dataframe-creating code to be an expression, like:exec("val list = listOf(${df.toCode()})")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes. I'd also consider CodeString.copyToClipboard()
What if nullable parameter: variableName: String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea for both :)