简介:
当客户希望梳理每个桌面应用下面有哪些页面时,如果完全依赖人工逐个点开查看,效率较低,也不利于后续复用。
这类需求的关键点在于:桌面应用内部的文件夹、页面和外链结构,通常保存在 web_portal.structure 中,因此更推荐通过“元数据库输入数据集 + ETL 解析结构 JSON”的方式,输出一张可复用的桌面应用页面明细表。
建议最终输出字段包括:
应用ID
应用
文件夹
页面ID
页面名称
页面类型
外链地址
涉及数据集:
本次方案建议使用 1 个元数据输入数据集和 1 个 ETL 输出数据集。
元数据输入数据集直接读取 web_portal,保留 structure 字段,供 ETL 后续解析。参考 SQL 如下:
SELECT
app_id,
name AS app_name,
app_type,
is_release,
parent_dir_id,
ctime,
utime,
structure
FROM web_portal
WHERE dom_id = '<目标dom_id>'
AND IFNULL(is_del, 0) = 0
AND app_type = 1
ETL 输出的是一张桌面应用页面明细表,建议保留字段如下:
应用ID
应用
文件夹
页面ID
页面名称
页面类型
外链地址
数据处理:
这套方案的核心思路是:先从元数据库的 web_portal 表中识别桌面应用,再从 structure 中解析应用内部的文件夹、页面和外链结构,最终输出一张可复用的明细表。
如果统计对象是“桌面应用”,应从 web_portal 出发,而不是从 page + directory 出发。对于 MySQL 5.7 这类不适合直接展开复杂 JSON 的环境,更推荐通过 ETL 实现。
ETL 处理时,建议完成以下动作:
- 使用
from_json 解析 web_portal.structure
- 展开多层
children
- 拼接应用内文件夹路径
- 输出真实页面节点与外链节点
- 最终整理为业务字段
桌面应用内部建议按如下口径识别节点:
- 文件夹节点:通常
fold = true,或存在 children
- 真实页面节点:通常
type = 'PAGE' 或 type = 'LARGE_SCREEN'
- 外链节点:
type = 'LINK',并从 content.url 中提取跳转地址
其中外链节点建议采用以下输出口径:
页面类型 保留为 LINK
页面ID 置空
外链地址 保留实际跳转地址
本次实际使用的 SQL 节点如下:
WITH app_source AS (
SELECT
app_id,
app_name,
is_release,
parent_dir_id AS app_parent_dir_id,
ctime AS app_ctime,
utime AS app_utime,
from_json(
structure,
'array<
struct<
name:string,
id:string,
pgId:string,
type:string,
content:struct<
url:string
>,
fold:boolean,
children:array<
struct<
name:string,
id:string,
pgId:string,
type:string,
content:struct<
url:string
>,
fold:boolean,
children:array<
struct<
name:string,
id:string,
pgId:string,
type:string,
content:struct<
url:string
>,
fold:boolean,
children:array<
struct<
name:string,
id:string,
pgId:string,
type:string,
content:struct<
url:string
>,
fold:boolean
>
>
>
>
>
>
>
>'
) AS pages
FROM input1
),
l0 AS (
SELECT
app_id,
app_name,
is_release,
app_parent_dir_id,
app_ctime,
app_utime,
p.name AS node_name,
p.pgId AS pg_id,
p.type AS node_type,
p.content.url AS external_url,
coalesce(p.fold, false) AS is_folder,
p.children AS children,
cast(null as string) AS folder_path
FROM app_source
LATERAL VIEW OUTER explode(pages) exploded AS p
),
l1 AS (
SELECT
app_id,
app_name,
is_release,
app_parent_dir_id,
app_ctime,
app_utime,
c.name AS node_name,
c.pgId AS pg_id,
c.type AS node_type,
c.content.url AS external_url,
coalesce(c.fold, false) AS is_folder,
c.children AS children,
l0.node_name AS folder_path
FROM l0
LATERAL VIEW OUTER explode(l0.children) exploded AS c
WHERE l0.is_folder = true
),
l2 AS (
SELECT
app_id,
app_name,
is_release,
app_parent_dir_id,
app_ctime,
app_utime,
c.name AS node_name,
c.pgId AS pg_id,
c.type AS node_type,
c.content.url AS external_url,
coalesce(c.fold, false) AS is_folder,
c.children AS children,
concat_ws('/', l1.folder_path, l1.node_name) AS folder_path
FROM l1
LATERAL VIEW OUTER explode(l1.children) exploded AS c
WHERE l1.is_folder = true
),
l3 AS (
SELECT
app_id,
app_name,
is_release,
app_parent_dir_id,
app_ctime,
app_utime,
c.name AS node_name,
c.pgId AS pg_id,
c.type AS node_type,
c.content.url AS external_url,
coalesce(c.fold, false) AS is_folder,
cast(null as array<string>) AS children,
concat_ws('/', l2.folder_path, l2.node_name) AS folder_path
FROM l2
LATERAL VIEW OUTER explode(l2.children) exploded AS c
WHERE l2.is_folder = true
),
page_nodes_l0 AS (
SELECT
app_id,
app_name,
is_release,
app_parent_dir_id,
app_ctime,
app_utime,
'(应用根目录)' AS folder_path,
pg_id,
node_name,
node_type,
external_url
FROM l0
WHERE coalesce(is_folder, false) = false
AND pg_id IS NOT NULL
),
page_nodes_l1 AS (
SELECT
app_id,
app_name,
is_release,
app_parent_dir_id,
app_ctime,
app_utime,
coalesce(folder_path, '(应用根目录)') AS folder_path,
pg_id,
node_name,
node_type,
external_url
FROM l1
WHERE coalesce(is_folder, false) = false
AND pg_id IS NOT NULL
),
page_nodes_l2 AS (
SELECT
app_id,
app_name,
is_release,
app_parent_dir_id,
app_ctime,
app_utime,
coalesce(folder_path, '(应用根目录)') AS folder_path,
pg_id,
node_name,
node_type,
external_url
FROM l2
WHERE coalesce(is_folder, false) = false
AND pg_id IS NOT NULL
),
page_nodes_l3 AS (
SELECT
app_id,
app_name,
is_release,
app_parent_dir_id,
app_ctime,
app_utime,
coalesce(folder_path, '(应用根目录)') AS folder_path,
pg_id,
node_name,
node_type,
external_url
FROM l3
WHERE coalesce(is_folder, false) = false
AND pg_id IS NOT NULL
),
page_nodes AS (
SELECT * FROM page_nodes_l0
UNION ALL
SELECT * FROM page_nodes_l1
UNION ALL
SELECT * FROM page_nodes_l2
UNION ALL
SELECT * FROM page_nodes_l3
)
SELECT
n.app_id AS `应用ID`,
n.app_name AS `应用`,
n.folder_path AS `文件夹`,
CASE
WHEN n.node_type = 'LINK' THEN cast(null as string)
ELSE n.pg_id
END AS `页面ID`,
n.node_name AS `页面名称`,
n.node_type AS `页面类型`,
n.external_url AS `外链地址`
FROM page_nodes n
;
输出结果:
ETL 输出后的字段结构建议如下:
应用ID:桌面应用 ID
应用:桌面应用名称
文件夹:页面所在的应用内文件夹路径
页面ID:真实页面 ID;如果是外链则置空
页面名称:页面名称或外链名称
页面类型:例如 PAGE、LARGE_SCREEN、LINK
外链地址:如果是外链则保留 URL,否则为空
这张表可以直接用于:
- 查看每个桌面应用下有哪些页面
- 查看页面位于应用内哪个文件夹
- 区分真实页面与外链
- 作为后续卡片或治理分析的数据源
总结:
对于“提取桌面应用下的页面清单”这类需求,推荐采用“元数据库输入数据集 + ETL 解析结构 JSON”的实现方式。
其中最关键的几点是:
- 先确认需求对象是桌面应用,而不是普通页面目录树
- 从
web_portal 和 structure 出发理解数据模型
- 在 ETL 中统一完成结构展开、节点识别和字段整理
- 对外链节点保留
外链地址,并将 页面ID 置空
如果数据库版本较低,或桌面应用内部结构层级较复杂,优先使用 ETL 会比单 SQL 更稳定,也更适合后续复用。 |