Skip to content

Commit 48e90c3

Browse files
authored
Merge branch 'apache:master' into fix/commons-beanutils_exclude
2 parents 6d3d3d5 + 9773cc8 commit 48e90c3

File tree

9 files changed

+242
-4
lines changed

9 files changed

+242
-4
lines changed
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
2+
---
3+
title: "作业状态改变监听器"
4+
nav-title: job-status-listener
5+
nav-parent_id: advanced
6+
nav-pos: 5
7+
---
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
http://www.apache.org/licenses/LICENSE-2.0
17+
Unless required by applicable law or agreed to in writing,
18+
software distributed under the License is distributed on an
19+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
20+
KIND, either express or implied. See the License for the
21+
specific language governing permissions and limitations
22+
under the License.
23+
-->
24+
25+
## 作业状态改变监听器
26+
Flink 为用户提供了一个可插入接口,用于注册处理作业状态变化的自定义逻辑,其中提供了有关源/接收器的沿袭信息。这使用户能够实现自己的 Flink 数据血缘报告器,将沿袭信息发送到第三方数据沿袭系统,例如 Datahub 和 Openlineage。
27+
28+
每次应用程序发生状态更改时,都会触发作业状态更改监听器。数据沿袭信息包含在 JobCreatedEvent 中。
29+
30+
### 为你的自定义丰富器实现插件
31+
32+
要实现自定义 JobStatusChangedListener 插件,您需要:
33+
34+
- 添加自己的 JobStatusChangedListener 通过实现 {{< gh_link file="/flink-core/src/main/java/org/apache/flink/core/execution/JobStatusChangedListener.java" name="JobStatusChangedListener" >}} 接口。
35+
36+
- 添加自己的 JobStatusChangedListenerFactory 通过实现 {{< gh_link file="/flink-core/src/main/java/org/apache/flink/core/execution/JobStatusChangedListenerFactory.java" name="JobStatusChangedListenerFactory" >}} 接口。
37+
38+
- 添加Java服务条目。创建文件 `META-INF/services/org.apache.flink.core.execution.JobStatusChangedListenerFactory` 其中包含您的作业状态更改侦听器工厂类的类名 (请看 [Java Service Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html) 文档了解更多详情)。
39+
40+
41+
然后,创建一个包含 `JobStatusChangedListener`, `JobStatusChangedListenerFactory`, `META-INF/services/` 以及所有外部依赖项的 Java 库.
42+
在 Flink 发行版的 `plugins/` 中创建一个目录,使用任意名称,例如“job-status-changed-listener”,并将 jar 放入此目录中。
43+
有关更多详细信息,请参阅 [Flink Plugin]({{< ref "docs/deployment/filesystems/plugins" >}})。
44+
45+
JobStatusChangedListenerFactory 示例:
46+
47+
``` java
48+
package org.apache.flink.test.execution;
49+
50+
public static class TestingJobStatusChangedListenerFactory
51+
implements JobStatusChangedListenerFactory {
52+
53+
@Override
54+
public JobStatusChangedListener createListener(Context context) {
55+
return new TestingJobStatusChangedListener();
56+
}
57+
}
58+
```
59+
60+
JobStatusChangedListener 示例:
61+
62+
``` java
63+
package org.apache.flink.test.execution;
64+
65+
private static class TestingJobStatusChangedListener implements JobStatusChangedListener {
66+
67+
@Override
68+
public void onEvent(JobStatusChangedEvent event) {
69+
statusChangedEvents.add(event);
70+
}
71+
}
72+
```
73+
74+
### 配置
75+
76+
Flink 组件在启动时加载 JobStatusChangedListener 插件。为确保加载 JobStatusChangedListener 的所有实现,所有类名都应定义在 [execution.job-status-changed-listeners]({{< ref "docs/deployment/config#execution.job-status-changed-listeners" >}}).
77+
如果此配置为空,则不会启动任何监听器。例如
78+
```
79+
execution.job-status-changed-listeners = org.apache.flink.test.execution.TestingJobStatusChangedListenerFactory
80+
```
81+
82+
{{< top >}}
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: 数据血缘
3+
weight: 12
4+
type: docs
5+
aliases:
6+
- /zh/internals/data_lineage.html
7+
---
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
# 原生血缘支持
28+
数据血缘在数据生态系统中变得越来越重要。随着 Apache Flink 被广泛用于流数据湖中的数据提取和 ETL,我们需要一个端到端的沿袭解决方案,用于包括但不限于以下场景:
29+
- `数据质量保证`: 通过将数据错误追溯到数据管道内的来源来识别和纠正数据不一致.
30+
- `数据治理`: 通过记录数据来源和转换来建立明确的数据所有权和责任制.
31+
- `数据合规`: 通过在整个生命周期中跟踪数据流和转换,确保遵守数据隐私和合规性法规.
32+
- `数据优化`: 识别冗余的数据处理步骤并优化数据流以提高效率.
33+
34+
Apache Flink 为满足社区需求提供了原生的沿袭支持,它提供了一个内部沿袭数据模型和 [作业状态监听器]({{< ref "docs/deployment/advanced/job_status_listener" >}}) 以便开发人员将血缘元数据集成到外部系统中,例如 [OpenLineage](https://openlineage.io).
35+
在 Flink 运行时创建作业时,包含沿袭图元数据的 JobCreatedEvent 将被发送到这个作业状态监听器里.
36+
37+
# 血统数据模型
38+
Flink 原生的 Lineage 接口分为两层定义,第一层是所有 Flink 作业和 Connector 的通用接口,第二层则单独定义了 Table 和 DataStream 的扩展接口,接口和类的关系定义如下图所示。
39+
40+
{{< img src="/fig/lineage_interfaces.png" alt="Lineage Data Model" width="80%">}}
41+
42+
默认情况下,Table 相关的 lineage 接口或类主要在 Flink Table Runtime 中使用,因此 Flink 用户不需要接触这些接口。Flink 社区将逐步支持所有
43+
常见的连接器,例如 Kafka、JDBC、Cassandra、Hive 等。如果您定义了自定义连接器,则需要自定义 source/sink 实现 LineageVertexProvider 接口。
44+
在 LineageVertex 中,定义了一个 Lineage Dataset 列表作为 Flink source/sink 的元数据。
45+
46+
47+
```java
48+
@PublicEvolving
49+
public interface LineageVertexProvider {
50+
LineageVertex getLineageVertex();
51+
}
52+
```
53+
54+
接口详细信息请参考 [FLIP-314](https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener).
55+
56+
{{< top >}}

docs/content/docs/deployment/advanced/job_status_listener.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ This enables users to implement their own flink lineage reporter to send lineage
2828

2929
The job status changed listeners are triggered every time status change happened for the application. The data lineage info is included in the JobCreatedEvent.
3030

31-
### Implement a plugin for your custom enricher
31+
### Implement a plugin for Job status changed listener
3232

3333
To implement a custom JobStatusChangedListener plugin, you need to:
3434

@@ -79,3 +79,5 @@ Flink components loads JobStatusChangedListener plugins at startup. To make sure
7979
```
8080
execution.job-status-changed-listeners = org.apache.flink.test.execution.TestingJobStatusChangedListenerFactory
8181
```
82+
83+
{{< top >}}
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
title: Data Lineage
3+
weight: 12
4+
type: docs
5+
aliases:
6+
- /internals/data_lineage.html
7+
---
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
# Native Lineage Support
28+
As organisations look to govern their data ecosystems; understanding data lineage, where data is coming from and going to, becomes critical. As Apache Flink is widely used for data ingestion and ETL in Streaming Data Lakes, we need
29+
an end to end lineage solution for scenarios including but not limited to:
30+
- `Data Quality Assurance`: Identifying and rectifying data inconsistencies by tracing data errors back to their origin within the data pipeline.
31+
- `Data Governance`: Establishing clear data ownership and accountability by documenting data origins and transformations.
32+
- `Regulatory Compliance`: Ensuring adherence to data privacy and compliance regulations by tracking data flow and transformations throughout its lifecycle.
33+
- `Data Optimization`: Identifying redundant data processing steps and optimizing data flows to improve efficiency.
34+
35+
Apache Flink provides a native lineage support by providing an internal lineage data model and [Job Status Listener]({{< ref "docs/deployment/advanced/job_status_listener" >}}) for
36+
developer to integrate lineage metadata into external lineage system, for example [OpenLineage](https://openlineage.io). When a job is created in Flink runtime, the JobCreatedEvent
37+
contains the Lineage Graph metadata that will be sent to Job Status Listeners.
38+
39+
# Lineage Data Model
40+
Flink native lineage interfaces are defined in two layers. The first layer is the generic interface for all Flink jobs and connector, and the second layer defines
41+
the extended interfaces for Table and DataStream independently. The interface and class relationships are defined in the diagram below.
42+
43+
{{< img src="/fig/lineage_interfaces.png" alt="Lineage Data Model" width="80%">}}
44+
45+
By default, Table related lineage interfaces or classes are used in Flink Table environment, thus Flink users doesn't need to touch these interfaces. The Flink community will gradually support all
46+
of the common connectors, such as Kafka, JDBC, Cassandra, Hive. If you have a customized connector defined, you need to have customized source/sink implementations of the LineageVertexProvider interface.
47+
Within a LineageVertex, a list of Lineage Datasets are defined as metadata for Flink source/sink.
48+
49+
50+
```java
51+
@PublicEvolving
52+
public interface LineageVertexProvider {
53+
LineageVertex getLineageVertex();
54+
}
55+
```
56+
57+
For the interface details, please refer to [FLIP-314](https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener).
58+
59+
{{< top >}}
127 KB
Loading

flink-python/pyflink/datastream/slot_sharing_group.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ def __lt__(self, other: 'MemorySize'):
9999
return self._j_memory_size.compareTo(other._j_memory_size) == -1
100100

101101
def __le__(self, other: 'MemorySize'):
102-
return self.__eq__(other) and self.__lt__(other)
102+
return self.__eq__(other) or self.__lt__(other)
103103

104104
def __str__(self):
105105
return self._j_memory_size.toString()

flink-python/pyflink/datastream/tests/test_slot_sharing_group.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,18 @@ def test_build_slot_sharing_group_without_all_required_config(self):
6464
.set_cpu_cores(1.0) \
6565
.set_task_off_heap_memory_mb(10) \
6666
.build()
67+
68+
69+
class MemorySizeTests(PyFlinkTestCase):
70+
71+
def test_le_method(self):
72+
"""Test the __le__ method of MemorySize."""
73+
m1 = MemorySize.of_mebi_bytes(100)
74+
m2 = MemorySize.of_mebi_bytes(100)
75+
m3 = MemorySize.of_mebi_bytes(200)
76+
self.assertEqual(m1, m2)
77+
self.assertTrue(m1 <= m2)
78+
self.assertTrue(m2 <= m1)
79+
80+
self.assertTrue(m1 <= m3)
81+
self.assertFalse(m3 <= m1)

flink-python/pyflink/datastream/tests/test_window.py

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
from pyflink.datastream.tests.test_util import DataStreamTestSinkFunction, \
3434
SecondColumnTimestampAssigner
3535
from pyflink.java_gateway import get_gateway
36-
from pyflink.testing.test_case_utils import PyFlinkStreamingTestCase
36+
from pyflink.testing.test_case_utils import PyFlinkStreamingTestCase, PyFlinkTestCase
3737
from pyflink.util.java_utils import get_j_env_configuration
3838

3939

@@ -668,3 +668,27 @@ def process(self,
668668
context: 'ProcessAllWindowFunction.Context',
669669
elements: Iterable[tuple]) -> Iterable[tuple]:
670670
return [(context.window().start, context.window().end, len([e for e in elements]))]
671+
672+
673+
class TestTimeWindow(PyFlinkTestCase):
674+
675+
def test_le_method(self):
676+
"""Test the __le__ method of TimeWindow."""
677+
# Create test windows
678+
w1 = TimeWindow(100, 200)
679+
w2 = TimeWindow(100, 200)
680+
w3 = TimeWindow(150, 250)
681+
w4 = TimeWindow(50, 150)
682+
w5 = TimeWindow(100, 180)
683+
684+
self.assertTrue(w1 <= w2)
685+
self.assertTrue(w2 <= w1)
686+
687+
self.assertTrue(w1 <= w3)
688+
self.assertFalse(w3 <= w1)
689+
690+
self.assertTrue(w4 <= w1)
691+
self.assertFalse(w1 < w4)
692+
693+
self.assertTrue(w5 <= w1)
694+
self.assertFalse(w1 <= w5)

flink-python/pyflink/datastream/window.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ def __lt__(self, other: 'TimeWindow'):
171171
return self.start == other.start and self.end < other.end or self.start < other.start
172172

173173
def __le__(self, other: 'TimeWindow'):
174-
return self.__eq__(other) and self.__lt__(other)
174+
return self.__eq__(other) or self.__lt__(other)
175175

176176
def __repr__(self):
177177
return "TimeWindow(start={}, end={})".format(self.start, self.end)

0 commit comments

Comments
 (0)