Skip to content

Commit 7d2cf42

Browse files
LogicalRelation Leaf Logical Operator et al.
1 parent 50429c5 commit 7d2cf42

4 files changed

Lines changed: 86 additions & 35 deletions

File tree

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
title: ResolveDataSource
3+
---
4+
5+
# ResolveDataSource Logical Analysis Rule
6+
7+
`ResolveDataSource` is...FIXME

docs/logical-operators/ExposesMetadataColumns.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ title: ExposesMetadataColumns
44

55
# ExposesMetadataColumns Logical Operators
66

7-
`ExposesMetadataColumns` is an [extension](#contract) of the [LogicalPlan](LogicalPlan.md) abstraction for [logical operators](#implementations) that can [withMetadataColumns](#withMetadataColumns).
7+
`ExposesMetadataColumns` is an [extension](#contract) of the [LogicalPlan](LogicalPlan.md) abstraction for [logical operators](#implementations) that can [add extra metadata columns to output columns](#withMetadataColumns).
88

99
## Contract
1010

11-
### <span id="withMetadataColumns"> withMetadataColumns
11+
### Add Metadata Columns to Output Columns { #withMetadataColumns }
1212

1313
```scala
1414
withMetadataColumns(): LogicalPlan
@@ -18,6 +18,7 @@ See:
1818

1919
* [DataSourceV2Relation](DataSourceV2Relation.md#withMetadataColumns)
2020
* [LogicalRelation](LogicalRelation.md#withMetadataColumns)
21+
* `StreamingRelationV2` ([Spark Structured Streaming]({{ book.structured_streaming }}/logical-operators/StreamingRelationV2/#withMetadataColumns))
2122

2223
Used when:
2324

@@ -27,4 +28,4 @@ Used when:
2728

2829
* [DataSourceV2Relation](DataSourceV2Relation.md)
2930
* [LogicalRelation](LogicalRelation.md)
30-
* `StreamingRelation` ([Spark Structured Streaming]({{ book.structured_streaming }}/logical-operators/StreamingRelation))
31+
* `StreamingRelationV2` ([Spark Structured Streaming]({{ book.structured_streaming }}/logical-operators/StreamingRelationV2))

docs/logical-operators/LogicalPlan.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ resolved: Boolean
208208
??? note "Lazy Value"
209209
`resolved` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
210210

211-
## Metadata Output Attributes { #metadataOutput }
211+
## Metadata Output Columns { #metadataOutput }
212212

213213
```scala
214214
metadataOutput: Seq[Attribute]

docs/logical-operators/LogicalRelation.md

Lines changed: 74 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ title: LogicalRelation
66

77
`LogicalRelation` is a [leaf logical operator](LeafNode.md) that represents a [BaseRelation](#relation) in a [logical query plan](LogicalPlan.md).
88

9+
`LogicalRelation` is a [ExposesMetadataColumns](ExposesMetadataColumns.md).
10+
911
`LogicalRelation` is a [MultiInstanceRelation](MultiInstanceRelation.md).
1012

1113
## Creating Instance
@@ -17,9 +19,9 @@ title: LogicalRelation
1719
* <span id="catalogTable"> Optional [CatalogTable](../CatalogTable.md)
1820
* <span id="isStreaming"> `isStreaming` flag
1921

20-
`LogicalRelation` is created using [apply](#apply) factory.
22+
`LogicalRelation` is created using [apply](#apply) utility.
2123

22-
## <span id="apply"> apply Utility
24+
## Create LogicalRelation { #apply }
2325

2426
```scala
2527
apply(
@@ -41,44 +43,51 @@ val baseRelation: BaseRelation = ???
4143
val data = spark.baseRelationToDataFrame(baseRelation)
4244
```
4345

46+
---
47+
4448
`apply` is used when:
4549

46-
* `SparkSession` is requested for a [DataFrame for a BaseRelation](../SparkSession.md#baseRelationToDataFrame)
4750
* [CreateTempViewUsing](CreateTempViewUsing.md) command is executed
4851
* `FallBackFileSourceV2` logical resolution rule is executed
49-
* [ResolveSQLOnFile](../logical-analysis-rules/ResolveSQLOnFile.md) and [FindDataSourceTable](../logical-analysis-rules/FindDataSourceTable.md) logical evaluation rules are executed
52+
* `FileStreamSource` ([Spark Structured Streaming]({{ book.structured_streaming }}/datasources/file/FileStreamSource/#getBatch)) is requested to `getBatch`
5053
* `HiveMetastoreCatalog` is requested to [convert a HiveTableRelation](../hive/HiveMetastoreCatalog.md#convertToLogicalRelation)
51-
* `FileStreamSource` ([Spark Structured Streaming]({{ book.structured_streaming }}/connectors/file/FileStreamSource/)) is requested to `getBatch`
54+
* [ResolveDataSource](../logical-analysis-rules/ResolveDataSource.md) logical analysis rule is executed (to [resolve a V1BatchSource](../logical-analysis-rules/ResolveDataSource.md#loadV1BatchSource))
55+
* [ResolveSQLOnFile](../logical-analysis-rules/ResolveSQLOnFile.md) and [FindDataSourceTable](../logical-analysis-rules/FindDataSourceTable.md) logical evaluation rules are executed
56+
* `SparkSession` is requested for a [DataFrame for a BaseRelation](../SparkSession.md#baseRelationToDataFrame)
5257

53-
## <span id="refresh"> refresh
58+
## Refresh (Files of HadoopFsRelation) { #refresh }
5459

55-
```scala
56-
refresh(): Unit
57-
```
60+
??? note "LogicalPlan"
61+
62+
```scala
63+
refresh(): Unit
64+
```
5865

59-
`refresh` is part of [LogicalPlan](LogicalPlan.md#refresh) abstraction.
66+
`refresh` is part of [LogicalPlan](LogicalPlan.md#refresh) abstraction.
6067

6168
`refresh` requests the [FileIndex](../files/HadoopFsRelation.md#location) (of the [HadoopFsRelation](#relation)) to refresh.
6269

63-
!!! note
70+
??? note "HadoopFsRelation Supported Only"
6471
`refresh` does the work for [HadoopFsRelation](../files/HadoopFsRelation.md) relations only.
6572

66-
## <span id="simpleString"> Simple Text Representation
73+
## Simple Text Representation { #simpleString }
6774

68-
```scala
69-
simpleString(
70-
maxFields: Int): String
71-
```
75+
??? note "QueryPlan"
7276

73-
`simpleString` is part of the [QueryPlan](../catalyst/QueryPlan.md#simpleString) abstraction.
77+
```scala
78+
simpleString(
79+
maxFields: Int): String
80+
```
81+
82+
`simpleString` is part of the [QueryPlan](../catalyst/QueryPlan.md#simpleString) abstraction.
7483

7584
`simpleString` is made up of the [output schema](#output) (truncated to `maxFields`) and the [relation](#relation):
7685

7786
```text
7887
Relation[[output]] [relation]
7988
```
8089

81-
### <span id="simpleString-demo"> Demo
90+
### Demo { #simpleString-demo }
8291

8392
```text
8493
val q = spark.read.text("README.md")
@@ -88,22 +97,56 @@ scala> println(logicalPlan.simpleString)
8897
Relation[value#2] text
8998
```
9099

91-
## <span id="computeStats"> computeStats
100+
## Statistics { #computeStats }
92101

93-
```scala
94-
computeStats(): Statistics
95-
```
102+
??? note "LeafNode"
96103

97-
`computeStats` is part of the [LeafNode](LeafNode.md#computeStats) abstraction.
104+
```scala
105+
computeStats(): Statistics
106+
```
98107

99-
---
108+
`computeStats` is part of the [LeafNode](LeafNode.md#computeStats) abstraction.
100109

101110
`computeStats` takes the optional [CatalogTable](#catalogTable).
102111

103112
If available, `computeStats` requests the `CatalogTable` for the [CatalogStatistics](../CatalogTable.md#stats) that, if available, is requested to [toPlanStats](#toPlanStats) (with the `planStatsEnabled` flag enabled when either [spark.sql.cbo.enabled](../SQLConf.md#cboEnabled) or [spark.sql.cbo.planStats.enabled](../SQLConf.md#planStatsEnabled) is enabled).
104113

105114
Otherwise, `computeStats` creates a [Statistics](../cost-based-optimization/Statistics.md) with the `sizeInBytes` only to be the [sizeInBytes](../BaseRelation.md#sizeInBytes) of the [BaseRelation](#relation).
106115

116+
## Metadata Output Columns { #metadataOutput }
117+
118+
??? note "LogicalPlan"
119+
120+
```scala
121+
metadataOutput: Seq[AttributeReference]
122+
```
123+
124+
`metadataOutput` is part of the [LogicalPlan](LogicalPlan.md#metadataOutput) abstraction.
125+
126+
`metadataOutput` checks out whether this [BaseRelation](#relation) is a [HadoopFsRelation](../files/HadoopFsRelation.md).
127+
If so, `metadataOutput` requests the [FileFormat](../files/HadoopFsRelation.md#fileFormat) (of this [BaseRelation](#relation)) for [metadata columns](../files/FileFormat.md#createFileMetadataCol).
128+
129+
Otherwise, `metadataOutput` returns no metadata columns (`Nil`).
130+
131+
??? note "Lazy Value"
132+
`metadataOutput` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
133+
134+
Learn more in the [Scala Language Specification]({{ scala.spec }}/05-classes-and-objects.html#lazy).
135+
136+
## Add Metadata Columns to Output Columns { #withMetadataColumns }
137+
138+
??? note "ExposesMetadataColumns"
139+
140+
```scala
141+
withMetadataColumns(): LogicalRelation
142+
```
143+
144+
`withMetadataColumns` is part of the [ExposesMetadataColumns](ExposesMetadataColumns.md#withMetadataColumns) abstraction.
145+
146+
`withMetadataColumns` determines whether thare are any extra [metadata columns](#metadataOutput) to be added to this [output columns](#output).
147+
148+
If so, `withMetadataColumns` creates a new `LogicalRelation` with the extra [metadata columns](#metadataOutput) added. Otherwise, `withMetadataColumns` does nothing.
149+
107150
## Demo
108151

109152
The following are two logically-equivalent batch queries described using different Spark APIs: Scala and SQL.
@@ -114,23 +157,23 @@ val path = "../datasets/people.csv"
114157
```
115158

116159
```scala
117-
val q = spark
160+
val loadQuery = spark
118161
.read
119-
.option("header", true)
120162
.format(format)
163+
.option("header", true)
121164
.load(path)
122165
```
123166

124167
```text
125-
scala> println(q.queryExecution.logical.numberedTreeString)
126-
00 Relation[id#16,name#17] csv
168+
scala> println(loadQuery.queryExecution.logical.numberedTreeString)
169+
00 UnresolvedDataSource format: csv, isStreaming: false, paths: 1 provided
127170
```
128171

129172
```scala
130-
val q = sql(s"select * from `$format`.`$path`")
173+
val selectQuery = sql(s"select * from `$format`.`$path`")
131174
```
132175

133176
```text
134-
scala> println(q.queryExecution.optimizedPlan.numberedTreeString)
135-
00 Relation[_c0#74,_c1#75] csv
177+
scala> println(selectQuery.queryExecution.optimizedPlan.numberedTreeString)
178+
00 Relation [_c0#75,_c1#76] csv
136179
```

0 commit comments

Comments
 (0)