+1 vote
149 views

Dear colleagues,

I have encountered an issue about the search algorithm. Here is the story: I have a sample of ~500 nearby galaxies, which are extracted from SDSS spectra survey. I would like to extract the objects within 0.25 deg of the target using the function of "q3c_join" and meet the criteria of "ls.dered_mag_r < 18 AND ls.dered_mag_r > 15". However, it takes too long to finish the search in the database and has the issue of timeout. I have already created the database using the 500 target galaxies and below is the full script of the SQL:

sql = '''SELECT  lsls.ra, lsls.dec, lsls.dered_mag_g as LS_mag_g, lsls.dered_mag_r as LS_mag_r, 
               lsls.dered_mag_z as LS_mag_z, lsls.snr_z as LS_snr_z,
               lsls.dered_mag_w1 as mag_w1, lsls.dered_mag_w2 as mag_w2, 
               q3c_dist(ng.ra,ng.dec,lsls.ra,lsls.dec) as dist_deg 
         FROM mydb://nearbygalaxyz009 AS ng
         LEFT JOIN LATERAL (
               SELECT ls.* 
                    FROM 
                        ls_dr9.tractor AS ls
                    WHERE
                        q3c_join(ng.ra, ng.dec, ls.ra, ls.dec, 0.25) AND ls.dered_mag_r < 18 
                        AND ls.dered_mag_r > 15
                    ORDER BY
                        q3c_dist(ng.ra,ng.dec,ls.ra,ls.dec)
                    ASC LIMIT 1
               ) as lsls ON true;'''

  I am writing here to ask for any help. Maybe there is some other function which could resolve this issue. Thanks.

asked Dec 7, 2022 by anonymous | 149 views

1 Answer

+1 vote

Firstly, q3c_join() is the correct function to use in this case, however a 0.25deg radius seems rather large.  You say you want objects within that radius but then use a "LIMIT 1" constraint to make is just a nearest-neighbor search.  Remove the LIMIT statement if you want all objects, otherwise a much smaller search radius should suffice and run faster.

Even though your mydb table is only ~500 objects, the inner query with the magnitude constraints is executed for each of those objects.  A more efficient method would be to create a second mydb table with just the objects satisfying the magnitude constraint (reducing the total size to ~31M rows) and use that in the query.  For the q3c_join() to work efficiently, however, the table needs to have a Q3C index which you can create with the mydb_index() method (see below).  Your script would then look something like:

from dl import queryClient as qc

# Create the magnitude-constrained tractor table and indexes
stat = qc.query ('select * from ls_dr9.tractor where dered_mag_r between 15.0 an
d 18.0', out='mydb://ls_dered_mag_r')
qc.mydb_index(token,'mydb://ls_dered_mag_r','ra')
qc.mydb_index(token,'mydb://ls_dered_mag_r','dec')
qc.mydb_index(token,'mydb://ls_dered_mag_r','', q3c='ra,dec', cluster=True)

sql = """
       SELECT  lsls.*, q3c_dist(ng.ra,ng.dec,lsls.ra,lsls.dec) as dist_deg
            FROM mydb://nearbygalaxyz009 AS ng
            LEFT JOIN LATERAL (
                  SELECT ls.*
                       FROM mydb://ls_dered_mag_r as ls
                       WHERE
                           q3c_join(ng.ra, ng.dec, ls.ra, ls.dec, 0.25)
                       ORDER BY
                           q3c_dist(ng.ra,ng.dec,ls.ra,ls.dec)
                  ) as lsls ON true;
      """

# Execute the query.
res = qc.query(sql=sql)


Creating the mydb://ls_dered_mag_r table will still take some time and may need to be run async separately, but the final query should execute in a few seconds and produce ~4200 rows. 

Hope this helps, let us know if you continue to have problems.

answered Dec 12, 2022 by datalab (19,200 points)
Thank you so much for the help. I tried that, but still had the read timeout issue for the Creation of the magnitude-constrained tractor table. The token I have is like this "token = ac.login ('huanian',getpass.getpass('Account password: '))". Not sure whether that is the token you needed. The error is "timeout: The read operation timed out". How can I avoid that?
The creation of the mydb://ls_dered_mag_r table should probably then be done as an async call with the indexing and such being done once it is complete.  For example, something like

import time
job = qc.query('select * from .....', out='mydb://ls_dered_mag_r',async_=True)
stat = qc.status(job)
while stat != 'COMPLETED'
     time.sleep(60)
     stat = qc.status(job)

I checked and your account already has the table in your mydb, so even though the query timed out is appears to have completed on the backend.  I've gone ahead and added the indexes so try moving ahead with the crossmatches.

359 questions

372 answers

385 comments

2,451 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.