-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-52012][CORE][SQL] Restore IDE Index with type annotations #50798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@yaooqinn Thank you for the efforts you've invested in resolving this issue. I've also been struggling with this problem for quite a while. After this one, has the issue been fully resolved, or have we only achieved partial mitigation? also cc @hvanhovell , do you have a more effective solution to address this issue? The core obstacle stems from IntelliJ's inability to properly distinguish and utilize class/object definitions with identical names between the
|
Technically, it covers all. However, I haven't covered all due to huge LOC. |
Local verification results are as follows:
It is uncertain whether the above conclusions hold true in other developers' IDE environments. To ensure robustness, it would be prudent to involve additional contributors in verifying the actual outcomes. |
lazy val fileConstantMetadataColumns: Seq[AttributeReference] = output.collect { | ||
// Collect metadata columns to be handled outside of the scan by appending constant columns. | ||
case FileSourceConstantMetadataAttribute(attr) => attr | ||
} | ||
|
||
override def vectorTypes: Option[Seq[String]] = | ||
protected def sessionState: SessionState = relation.sparkSession.sessionState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this approach may resolve the immediate issue, it could incur long-term development costs:
Validating whether newly added code triggers "red highlighting" in the IDE for every pr submission would be impractical. This might lead to a proliferation of similar follow-up tasks or recurring issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add style check later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the long-term maintenance cost remains low, I'm +1 on this
@@ -44,8 +45,9 @@ private[sql] class UDTFRegistration private[sql] (tableFunctionRegistry: TableFu | |||
| udfDeterministic: ${udtf.udfDeterministic} | |||
""".stripMargin) | |||
|
|||
val sessionState: SessionState = SparkSession.getActiveSession.get.sessionState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the actual change here? the type annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Honestly, I have no clue why explicit type annotations work for the IDE, I just tried and saw it helped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I'm a little reluctant to bring SparkContext
back to the code (even internal ones), @yaooqinn .
val sc: SparkContext = spark.sparkContext
val broadcastedConf = sc.broadcast(new SerializableConfiguration(hadoopConf))
Do you think you can propose this without SparkContext
-related changes?
Hi @dongjoon-hyun, w/o SparkContext-related changes, Do you have any suggestions to make it work w/o them? |
Technically, I don't have a suggestion to make it work. However, I can see this PR could have an undesirable side effect not only for now but forever because this PR seems unintentionally to advert
|
BTW, I'll leave the final decision to you and the other reviewers, @yaooqinn . |
I think we need to first understand why explicit type annotation works. The current approach looks a bit hacky to me. |
@dongjoon-hyun, I will make the following change to solve issues with the main part of the SparkContext callings, and restore the rest to the status before this PR
|
Hi @cloud-fan, I do think that the approach here is hacky because type annotation for declarations is a common language feature. The hacky part here might belong to IDEA itself to understand implicit definitions in the shims way we applied in Spark |
@@ -20,7 +20,9 @@ import java.io.{ObjectInputStream, ObjectOutputStream} | |||
|
|||
import org.apache.hadoop.conf.Configuration | |||
|
|||
import org.apache.spark.SparkContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for changes in the hive-thriftserver module, this is the only place doing import SparkContext
now. And it's in spark core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @yaooqinn .
@@ -53,16 +53,14 @@ case class ParquetScanBuilder( | |||
override protected val supportsNestedSchemaPruning: Boolean = true | |||
|
|||
override def pushDataFilters(dataFilters: Array[Filter]): Array[Filter] = { | |||
val sqlConf = sparkSession.sessionState.conf | |||
if (sqlConf.parquetFilterPushDown) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code looks okay to me. Could you confirm that this part also has a broken type annotations before this PR? Or, is this simply revised to reuse conf
instead of sqlConf
at line 56?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has multiple types of improvements.
Although I understand these were changed to serve the same goal IDE Type Annotations
issue, if you don't mind, I'd like to propose you to split this PR into more narrow-scoped ones to be more safe . For example,
- A PR to change
Class Loader
usage.
withClassLoader
- // Always use the latest class loader provided by executionHive's state.
- val executionHiveClassLoader = session.sharedState.jarClassLoader
- Thread.currentThread().setContextClassLoader(executionHiveClassLoader)
protected def withClassLoader(f: => Unit): Unit = {
val executionHiveClassLoader = session.sharedState.asInstanceOf[SharedState].jarClassLoader
Thread.currentThread().setContextClassLoader(executionHiveClassLoader)
f
}
- A PR to add new methods and use them
final protected def sessionState: SessionState = session.sessionState
final protected def sparkContext: SparkContext = session.sparkContext
private def sparkSessionState: SparkSessionState = SparkSQLEnv.sparkSession.sessionState
-
Variable creation to reusing
-
Explicit type casting with
asInstanceOf
.
WDYT?
@@ -52,7 +52,7 @@ private[sql] object AvroUtils extends Logging { | |||
spark: SparkSession, | |||
options: Map[String, String], | |||
files: Seq[FileStatus]): Option[StructType] = { | |||
val conf = spark.sessionState.newHadoopConfWithOptions(options) | |||
val conf = spark.sessionState.asInstanceOf[SessionState].newHadoopConfWithOptions(options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes like this are not very reasonable. It's only needed to work around the current IntelliJ limitation of not recognizing the dependency exclusion.
I'm a bit hesitant to merge this PR. If we need a temporary fix for IntelliJ support, shall we use maven profiles and the default profile does not include connect-shim
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be experimentally tested in Intellij by manually ignoring the connect-shims
module. We will observe that this action causes the sql-api
module to be marked with compilation errors. For instance, the definitions of JavaRDD
and RDD
cannot be found when working with classes like org.apache.spark.sql.DataFrameReader
.
@yaooqinn @cloud-fan @dongjoon-hyun I've noticed that the For example, the I’m currently testing this idea in #50815 |
Thank you, @LuciferYang . |
Thank you, @dongjoon-hyun and @cloud-fan . I would like to put this pull request on hold and wait @LuciferYang 's #50815 |
Currently, we have not shaded the
Therefore, although the tests for #50815 may pass, I still recommend maintaining the current definitions in |
Thank you for the input @LuciferYang |
What changes were proposed in this pull request?
Restore IDE Index with type annotations
Why are the changes needed?
Restore IDE Indexing for code navi to improve Spark contribution experience.
Does this PR introduce any user-facing change?
dev only change
How was this patch tested?
GA
Was this patch authored or co-authored using generative AI tooling?
no