Skip to content

feat(sql lab): display presto and trino tracking url#20799

Merged
ktmud merged 3 commits intoapache:masterfrom
airbnb:presto-tracking-url
Jul 27, 2022
Merged

feat(sql lab): display presto and trino tracking url#20799
ktmud merged 3 commits intoapache:masterfrom
airbnb:presto-tracking-url

Conversation

@ktmud
Copy link
Copy Markdown
Member

@ktmud ktmud commented Jul 21, 2022

SUMMARY

Enable tracking URL for Presto and Trino (previously it is only available for Hive queries) and also change TRACKING_URL_TRANSFORMER to run at runtime (as opposed to when queries are fetched) so that we can display different tracking URL based on query age (most tracking URLs will expire after a certain amount of time---in Trino/Preso, this is configured by query.min-expire-age.

Had to update query execution logics to add end time for failed executions, too.

While debugging, I also realized the TIMED_OUT status isn't actually in use---and now the frontend also cannot handle that status. We should probably add it back, but let's save it for another PR.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

When queries are still running:

image

We will now display the tracking url even for failed jobs, this is useful for users to debug their failed queries:

Xnip2022-07-20_14-46-55

TESTING INSTRUCTIONS

  1. Connect to a Presto or Trino dataset and run some long-running or erroneous queries
  2. Optionally, update Superset config to test customize TRACKING_URL_TRANSFORMER. For example, following setting will only display tracking urls for queries finished within the last 5 minutes:
     def TRACKING_URL_TRANSFORMER(url: str, query: "Query") -> Optional[str]:
         if not query.end_time or (
             now_as_float() - float(query.end_time)
             < timedelta(minutes=5).total_seconds() * 1000
         ):
             return re.sub(
                 r"^https?://([^/]+)/", "https://trino.datalake.example.org/", url
             )
         return None

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@ktmud ktmud force-pushed the presto-tracking-url branch from 67595f3 to fe967b8 Compare July 21, 2022 00:39
@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 21, 2022

Codecov Report

Merging #20799 (ee37fbc) into master (1b577d1) will increase coverage by 0.01%.
The diff coverage is 55.88%.

@@            Coverage Diff             @@
##           master   #20799      +/-   ##
==========================================
+ Coverage   66.23%   66.24%   +0.01%     
==========================================
  Files        1757     1757              
  Lines       66978    67031      +53     
  Branches     7117     7118       +1     
==========================================
+ Hits        44360    44408      +48     
- Misses      20801    20806       +5     
  Partials     1817     1817              
Flag Coverage Δ
hive 53.22% <40.62%> (-0.03%) ⬇️
javascript 51.91% <0.00%> (-0.01%) ⬇️
mysql 81.00% <39.06%> (-0.08%) ⬇️
postgres 81.07% <39.06%> (-0.08%) ⬇️
presto 53.12% <57.81%> (+<0.01%) ⬆️
python 81.55% <59.37%> (+0.01%) ⬆️
sqlite 79.68% <37.50%> (?)
unit 50.23% <32.81%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...frontend/src/SqlLab/components/ResultSet/index.tsx 51.38% <0.00%> (-0.36%) ⬇️
superset-frontend/src/components/Button/index.tsx 100.00% <ø> (ø)
superset/sqllab/exceptions.py 67.39% <0.00%> (ø)
superset/utils/dates.py 100.00% <ø> (ø)
superset/views/core.py 75.58% <0.00%> (+0.54%) ⬆️
superset/db_engine_specs/trino.py 73.40% <15.00%> (-15.93%) ⬇️
superset/db_engine_specs/presto.py 87.97% <58.82%> (-1.06%) ⬇️
superset/config.py 91.47% <100.00%> (ø)
superset/db_engine_specs/hive.py 86.87% <100.00%> (+0.99%) ⬆️
superset/models/sql_lab.py 78.90% <100.00%> (+1.52%) ⬆️
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

@ktmud ktmud force-pushed the presto-tracking-url branch 2 times, most recently from 6068fee to 6f74d4a Compare July 21, 2022 01:15
Comment thread UPDATING.md Outdated
Comment thread superset/config.py Outdated
Comment thread superset/db_engine_specs/presto.py Outdated
Comment thread superset/db_engine_specs/presto.py Outdated
Comment thread superset/db_engine_specs/presto.py Outdated
Comment thread superset/db_engine_specs/presto.py Outdated
Comment thread superset/db_engine_specs/trino.py Outdated
Comment thread superset/db_engine_specs/trino.py Outdated
Comment thread superset/models/sql_lab.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be a method called get_tracking_url and then you wouldn't need to rename the column virtually or have a getter/setter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the FAB-based API to always return the transformed url but I couldn't find a clean way to do it without adding an (unnecessary) new field.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar @betodealmeida do you have any recommendation on what's the best practice for such use case?

Comment thread superset/sql_lab.py Outdated
Comment thread superset/db_engine_specs/trino.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info_uri is only available in the latest version of the Trino client, which we haven't upgrade to yet.

Comment thread superset/db_engine_specs/presto.py Outdated
Comment thread superset/models/sql_lab.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the FAB-based API to always return the transformed url but I couldn't find a clean way to do it without adding an (unnecessary) new field.

Comment thread superset/sql_lab.py Outdated
@ktmud ktmud force-pushed the presto-tracking-url branch 3 times, most recently from afe9f3e to 07f3687 Compare July 22, 2022 20:38
Comment thread superset/sqllab/exceptions.py Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bycatch: wrong import

@ktmud ktmud force-pushed the presto-tracking-url branch from 07f3687 to a7cb2fd Compare July 22, 2022 20:50
@ktmud
Copy link
Copy Markdown
Member Author

ktmud commented Jul 22, 2022

@john-bodley I addressed your comments and added some integration tests. Can you take another look?

@ktmud ktmud force-pushed the presto-tracking-url branch from a7cb2fd to 83d165f Compare July 22, 2022 22:14
Comment thread superset/db_engine_specs/presto.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pass
return None

Copy link
Copy Markdown
Member Author

@ktmud ktmud Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final return is expected to be explicit if you have a return statement earlier. I think there is a pylint or mypy warning for this.

Comment thread superset/db_engine_specs/presto.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return None

Comment thread superset/db_engine_specs/trino.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment.

Comment thread superset/db_engine_specs/hive.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud per you PR description this logic has changed. Can we confirmed that the previous behavior is preserved? I just want to confirm this isn't a breaking change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud ktmud requested a review from john-bodley July 26, 2022 17:43
@ktmud ktmud force-pushed the presto-tracking-url branch from 83d165f to 11a0658 Compare July 26, 2022 20:54
Comment thread tests/integration_tests/sqllab_tests.py Outdated
Copy link
Copy Markdown
Member Author

@ktmud ktmud Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bycatch: this test is flaky. [QUERY_1, QUERY_2, QUERY_3] are executed consequently, with supposedly increasing start_time.

Since the original code int(...) basically rounded down the float epoch, the actual query results returned [QUERY_1, QUERY_2]---even though we filtered second_query_time by QUERY_3. But this still sometimes fail because the rounded down time can be anywhere between QUERY_1 start time and QUERY_3 start time, so if QUERY_1 and QUERY_2 are executed within the same second, we wouldn't have QUERY_2 in the filtered results. E.g. when the start time for the queries are 1.1, 1.5, 2---we'd be filtered to between [1, 1].

@ktmud ktmud force-pushed the presto-tracking-url branch from 84ded7a to d400ac4 Compare July 26, 2022 22:05
Comment thread tests/integration_tests/sqllab_tests.py Outdated
resp = self.get_resp("/superset/search_queries?" + "&".join(params))
data = json.loads(resp)
self.assertEqual(2, len(data))
# pylint: disable=line-too-long
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this at the end of line #420, otherwise I think it disables the error for the entire block.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reformatted it so pylint disable isn't necessary.

@ktmud ktmud merged commit 77db065 into apache:master Jul 27, 2022
@john-bodley john-bodley deleted the presto-tracking-url branch February 17, 2023 22:16
@mistercrunch mistercrunch added the 🚢 2.1.3 First shipped in 2.1.3 label Feb 18, 2024
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 First shipped in 2.1.0 labels Mar 13, 2024
@mistercrunch mistercrunch removed the 🚢 2.1.3 First shipped in 2.1.3 label Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0 First shipped in 2.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants