Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/Utils.kt
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
package org.jetbrains.kotlinx.dataframe

import org.jetbrains.kotlinx.dataframe.api.getColumns
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.schema
import org.jetbrains.kotlinx.dataframe.columns.ColumnGroup
import org.jetbrains.kotlinx.dataframe.io.renderToString
import org.jetbrains.kotlinx.dataframe.types.UtilTests
import java.net.URL
Expand All @@ -28,3 +30,45 @@ fun <T : DataFrame<*>> T.alsoDebug(println: String? = null, rowsLimit: Int = 20)
print(borders = true, title = true, columnTypes = true, valueLimit = -1, rowsLimit = rowsLimit)
schema().print()
}

fun DataFrame<*>.toCode(variableName: String = "df"): String =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could this return a CodeString? similar to generateInterfaces() etc.
  • I was a bit worried about the name, but now I look in the autocomplete; among toHtml(), toList(), toJson(), "toCode()" is not that cluttering or weird :) (actually, it's the generateX() functions that stand out...)
  • I do think that the val %variableName = in the beginning should be omitted. We probably can't assume that people who want to use it will use it solely as an assignment. Maybe they want to use the generated dataframe-creating code to be an expression, like:
exec("val list = listOf(${df.toCode()})")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this return a CodeString? similar to generateInterfaces() etc.

I think yes. I'd also consider CodeString.copyToClipboard()

I do think that the val %variableName = in the beginning should be omitted.

What if nullable parameter: variableName: String?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea for both :)

buildString {
append("val $variableName = dataFrameOf(\n")
appendColumns(this@toCode, indent = 1)
append(")")
}

private fun StringBuilder.appendColumns(df: DataFrame<*>, indent: Int) {
val pad = " ".repeat(indent)
df.getColumns { colsAtAnyDepth().simplify() }.forEach { col ->
if (col is ColumnGroup<*>) {
append("$pad\"${col.name()}\" to columnOf(\n")
appendColumns(col, indent + 1)
append("$pad),\n")
} else {
appendColumn(col, pad)
}
}
}

private fun StringBuilder.appendColumn(column: DataColumn<Any?>, pad: String) {
append("$pad\"${column.name()}\" to columnOf(")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's safer to include the type in columnOf explicitly :) It solves a lot of the .toLiteral() cases already, like columnOf<Long>(1, 2, 3), but even ones you don't yet catch, like columnOf<Short>(1, 2, 3)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add Short and other literals, would prefer to avoid redundant type parameter so it's easy to copy-paste as is

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we can somehow guarantee that the provided literals/class initializations are always the expected type, it will be fine. Maybe we could also parameterize this option if we actually want to promote this method to the public API?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll setup extensive testing with Kotest https://kotest.io/docs/proptest/property-based-testing.html and Jupyter Kernel to execute generated code and compare resulting instance to original + compilation

append(column.values().joinToString(", ") { it.toLiteral() })
append("),\n")
}

private fun Any?.toLiteral(): String =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can supply a KType to the function, which saves instance checks :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think instance check is very cheap on JVM and we have a smartcast as a result

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer cheap or free? ;P but yeah, the smart casts are nice

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How free? :o Maybe i don't get the idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep instance checks, i added a lot of branches where smart casts are useful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's free because you get the values from a DataColumn where the KType is already known. So if we would supply this KType to toLiteral() it's simply an equality check instead of an instance check.

when (this) {
null -> "null"
is String -> "\"${escape()}\""
is Char -> "'$this'"
is Long -> "${this}L"
is Float -> "${this}f"
else -> toString()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists might be another nice (and commonly used) addition. Maybe maps?

Actually, that does point out a limitation of the current implementation; it's not extensible and easy to break with unsupported types.

Maybe we could add a lambda to this function so people can supply their own 'toCode renderer'. Something like

typealias ValueToCodeRenderer = (value: Any?, type: KType) -> CodeString
val toStringRenderer: ValueToCodeRenderer = { value, _ -> value.toString() }

fun DataFrame<*>.toCode(customRenderer: ValueToCodeRenderer = toStringRenderer): CodeString

wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List and Map seem reasonable by default. Code renderer too

}

private fun String.escape() =
replace("\\", "\\\\")
.replace("\"", "\\\"")
.replace("\n", "\\n")
.replace("\t", "\\t")
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ import io.kotest.matchers.shouldBe
import org.jetbrains.kotlinx.dataframe.DataColumn
import org.jetbrains.kotlinx.dataframe.DataRow
import org.jetbrains.kotlinx.dataframe.api.columnOf
import org.jetbrains.kotlinx.dataframe.api.dataFrameOf
import org.jetbrains.kotlinx.dataframe.api.group
import org.jetbrains.kotlinx.dataframe.api.into
import org.jetbrains.kotlinx.dataframe.documentation.UnifyingNumbers
import org.jetbrains.kotlinx.dataframe.impl.asArrayAsListOrNull
import org.jetbrains.kotlinx.dataframe.impl.commonParent
Expand All @@ -17,6 +20,7 @@ import org.jetbrains.kotlinx.dataframe.impl.isArray
import org.jetbrains.kotlinx.dataframe.impl.isPrimitiveArray
import org.jetbrains.kotlinx.dataframe.impl.nothingType
import org.jetbrains.kotlinx.dataframe.impl.replaceGenericTypeParametersWithUpperbound
import org.jetbrains.kotlinx.dataframe.toCode
import org.junit.Test
import java.io.Serializable
import java.math.BigDecimal
Expand All @@ -25,6 +29,7 @@ import kotlin.reflect.KClass
import kotlin.reflect.KType
import kotlin.reflect.typeOf

@Suppress("ktlint:standard:argument-list-wrapping")
class UtilTests {

@OptIn(ExperimentalUnsignedTypes::class)
Expand Down Expand Up @@ -462,4 +467,31 @@ class UtilTests {
// Edge case with null
getUnifiedNumberClassOrNull(null, Int::class) shouldBe Int::class
}

@Test
fun `get dataFrameOf constructor`() {
val df = dataFrameOf("firstName", "lastName", "age", "city", "weight", "isHappy")(
"Alice", "Cooper", 15, "London", 54, true,
"Bob", "Dylan", 45, "Dubai", 87, true,
"Charlie", "Daniels", 20, "Moscow", null, false,
"Charlie", "Chaplin", 40, "Milan", null, true,
"Bob", "Marley", 30, "Tokyo", 68, true,
"Alice", "Wolf", 20, null, 55, false,
"Charlie", "Byrd", 30, "Moscow", 90, true,
).group("firstName", "lastName").into("name")

df.toCode() shouldBe
"""
val df = dataFrameOf(
"name" to columnOf(
"firstName" to columnOf("Alice", "Bob", "Charlie", "Charlie", "Bob", "Alice", "Charlie"),
"lastName" to columnOf("Cooper", "Dylan", "Daniels", "Chaplin", "Marley", "Wolf", "Byrd"),
),
"age" to columnOf(15, 45, 20, 40, 30, 20, 30),
"city" to columnOf("London", "Dubai", "Moscow", "Milan", "Tokyo", null, "Moscow"),
"weight" to columnOf(54, 87, null, null, 68, 55, 90),
"isHappy" to columnOf(true, true, false, true, true, false, true),
)
""".trimIndent()
}
}