0 votes
697 views
How can I download/export a table on mydb to my own machine?
asked Jun 12, 2021 by mattia (120 points) | 697 views

1 Answer

0 votes

Hi Mattia, the easiest way is to have the datalab CLI installed on your local computer:

pip install --ignore-installed --no-cache-dir noaodatalab

Then log in once:

datalab login # enter DL username and password

and finally download the table:

datalab query --sql="select * from mydb://mytable" --out=mytable.csv

Let us know if you run into trouble.
Cheers,
Robert

answered Jun 12, 2021 by robertdemo (5,160 points)
edited Jun 12, 2021 by robertdemo
Thanks Robert, this is very useful.

Are there some memory/table size limitations I should be aware of?

If I e.g. run

datalab query --sql="select * from mydb://nsc_dr2_object_adfs" -—fmt="csv" --out=nsc_dr2_object_adfs.csv

I get

Error:

or

Error: out of memory

I've also tried the --timeout argument to no avail.

Cheers - Mattia
Hi Mattia, indeed your query eats up our memory right now. The table seems to be 2e6 rows, but has the full width of the nsc_dr2.object table.

As a work-around, could you select only some rows?

If all fails (let us know), we can dump your table for you as a one-off and put it somewhere where you can download it.

We'll fix this issue down the road, but it requires a bit of plumbing.

Sorry for the troubles,
Robert
Thanks Robert, I see, sorry, didn't quite know what to make of the error message!

Will work around it, but more in general, would using vos be more efficient for my purpose here?

I went for mydb because of my familiarity with the SciServer, but I think your docs/notebooks mostly point to vos these days, and I wonder if that amounts to a recommendation on your side?
Hi Mattia, sorry for the delay in responding further. You could try to output to a file in your VOSpace "vos://myfile.csv". Depending on the table size (both nrows and ncols), but also on what else is going on on the servers, this might work better.
Finally, some found that writing the results of a query to a file straight in your notebook directory is fastest. For instance, this:

q = "select * from gaia_edr3.gaia_source limit 1000000"
res = qc.query(sql=q,out='./myfile_gaia_1e6.csv')

takes about 2m40s and writes out a file of almost 1GB to your NB directory. From there, you can e.g. download it through the browser (right-click, Download).

Hope this is somewhat helpful.
Best,
Robert

359 questions

372 answers

385 comments

2,451 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.