0 votes
255 views
I am trying to download a large table from mydb, but am having some difficulties. I cannot do a synchronous download because it exceeds the 600s time limit. When I attempt to do an asynchronous download, it takes a long time, but the resulting table is incorrect and very small. Is there another way to access my large table?

Example:

>>> tab = qc.query(sql='SELECT * FROM mydb://hsc_dr2', fmt='table', async_=False, timeout=60000)

Output:

>>> tab
<Table length=1>
n20ja9v7ywkcx4d
     str16      
----------------
n20ja9v7ywkcx4d
asked Apr 20, 2021 by ctheissen (120 points) | 255 views

1 Answer

0 votes

Hi, thank you for reaching out.

I can not reproduce exactly the issue you are reporting, but I confirm that I, too, can not query a mydb table in async mode right now (I'm getting a different error). We are looking to fix the issue right now.

Your query quoted above is in sync mode, but the output you quoted is the jobID that is returned when you launch an async query.

Until we fixed the querying of mydb tables in async mode, I'd like to suggest a few work-arounds:

- I f the table in your mydb was created by for instance a query to Data Lab's catalogs, and you want to download such a table, you could instead direct the output of the query to a file in your VOSpace, and then download from there. I.e. something like this:

qc.query('SELECT ... FROM ...',fmt='csv',out='vos://mytable001.csv') 

- If you need to get it from your mydb, and your hsc_dr2 table can be split up, for instance in slices of RA, you could download it in pieces, for instance:

df1 = qc.query('SELECT * FROM mydb://hsc_dr2 WHERE RA>0 and RA<=90',timeout=600,fmt='pandas')

df2 = qc.query('SELECT * FROM mydb://hsc_dr2 WHERE RA>90 and RA<=180',timeout=600,fmt='pandas')

etc. Not ideal, but it could stay under 600 seconds for each RA slice.

We'll ping back here where the underlying issue has been fixed.

Thank you for reporting the problem, and for your patience.

Robert for the DL team

answered Apr 23, 2021 by anonymous
Even if I do:

df1 = qc.query(sql='SELECT TOP 5 FROM mydb://hsc_dr2', fmt='pandas', async_=False, timeout=600)

It still times out.

359 questions

372 answers

385 comments

2,451 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.