Skip to content

Conversation

@koperagen
Copy link
Collaborator

dataFrameOf is compiler plugin-friendly way to serialize dataframes without FrameColumns, but converting existing row-based dataFrameOf(names)(values), etc, is bothersome. This API could help to quickly go from notebook to gradle project or from tests to project
I put it in tests module with intention to maybe later expose it as API

@koperagen koperagen added this to the 1.0.0-Beta5 milestone Dec 3, 2025
@koperagen koperagen requested a review from Jolanrensen December 3, 2025 14:34
@koperagen koperagen self-assigned this Dec 3, 2025
schema().print()
}

fun DataFrame<*>.toCode(variableName: String = "df"): String =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could this return a CodeString? similar to generateInterfaces() etc.
  • I was a bit worried about the name, but now I look in the autocomplete; among toHtml(), toList(), toJson(), "toCode()" is not that cluttering or weird :) (actually, it's the generateX() functions that stand out...)
  • I do think that the val %variableName = in the beginning should be omitted. We probably can't assume that people who want to use it will use it solely as an assignment. Maybe they want to use the generated dataframe-creating code to be an expression, like:
exec("val list = listOf(${df.toCode()})")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this return a CodeString? similar to generateInterfaces() etc.

I think yes. I'd also consider CodeString.copyToClipboard()

I do think that the val %variableName = in the beginning should be omitted.

What if nullable parameter: variableName: String?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea for both :)

}

private fun StringBuilder.appendColumn(column: DataColumn<Any?>, pad: String) {
append("$pad\"${column.name()}\" to columnOf(")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's safer to include the type in columnOf explicitly :) It solves a lot of the .toLiteral() cases already, like columnOf<Long>(1, 2, 3), but even ones you don't yet catch, like columnOf<Short>(1, 2, 3)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add Short and other literals, would prefer to avoid redundant type parameter so it's easy to copy-paste as is

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we can somehow guarantee that the provided literals/class initializations are always the expected type, it will be fine. Maybe we could also parameterize this option if we actually want to promote this method to the public API?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll setup extensive testing with Kotest https://kotest.io/docs/proptest/property-based-testing.html and Jupyter Kernel to execute generated code and compare resulting instance to original + compilation

append("),\n")
}

private fun Any?.toLiteral(): String =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can supply a KType to the function, which saves instance checks :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think instance check is very cheap on JVM and we have a smartcast as a result

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer cheap or free? ;P but yeah, the smart casts are nice

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How free? :o Maybe i don't get the idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep instance checks, i added a lot of branches where smart casts are useful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's free because you get the values from a DataColumn where the KType is already known. So if we would supply this KType to toLiteral() it's simply an equality check instead of an instance check.

is Char -> "'$this'"
is Long -> "${this}L"
is Float -> "${this}f"
else -> toString()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists might be another nice (and commonly used) addition. Maybe maps?

Actually, that does point out a limitation of the current implementation; it's not extensible and easy to break with unsupported types.

Maybe we could add a lambda to this function so people can supply their own 'toCode renderer'. Something like

typealias ValueToCodeRenderer = (value: Any?, type: KType) -> CodeString
val toStringRenderer: ValueToCodeRenderer = { value, _ -> value.toString() }

fun DataFrame<*>.toCode(customRenderer: ValueToCodeRenderer = toStringRenderer): CodeString

wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List and Map seem reasonable by default. Code renderer too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants