0 votes
367 views

Hi, 

I am trying to do the following query  in Astro Data Lab jupyter notebook. 

query = """
select TOP 10 a.designation, a.ra, a.dec, a.sigra, a.sigdec, 
a.w1mpro, a.w1sigmpro, a.w1snr, a.w2mpro, a.w2sigmpro, a.w2snr, a.w3mpro, a.w3sigmpro, a.w3snr, a.w4mpro,a.w4sigmpro, a.w4snr,
a.ext_flg, a.var_flg as var_flag, a.ph_qual

from allwise.source a, mydb://wise_new_objs u
where a.designation = u.designation
"""

result = qc.query (sql=query, timeout=600)

I get the following error.

ReadTimeout: HTTPSConnectionPool(host='datalab.noirlab.edu', port=443): Read timed out. (read timeout=600)

When I only select the top five rows in the table wise_new_objs, there is no such error and the query works perfectly. Selecting more than five rows results in the timeout error as above. 

wise_new_objs table consists of 200k rows. I want to select 1000s of rows or all 200k rows at once instead of just five rows to X-match with allwise.source. What is the most effective way to do so? 

asked May 2, 2022 by anonymous | 367 views

1 Answer

0 votes

Hi, thanks for reaching out. The reason for the slowness and timeouts is that the matching column you are using, 'designation', is not indexed in our database. Indexed columns can be orders of magnitudes faster.

You can check which columns are indexed in our schema browser. All column names printed in bold font are indexed:
https://datalab.noirlab.edu/query.php?name=allwise.source

Would another column work for your case? For instance 'source_id' ?

query = """select ...
where a.source_id = u.source_id"""

This runs in under 5 seconds for a mydb table that is 200K rows long.

Please let us know if this was helpful,

Robert

answered May 4, 2022 by robertdemo (5,160 points)
Great!! I can download the data very fast using the 'source_id' column.
Thank you for your help.

359 questions

372 answers

385 comments

2,451 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.