Ying Zu's homepage > Spear Howto

Getting ready

Here we briefly explain how to use SPEAR to caculate the line lags for the AGN hosted by an imaginary Loopdeloop galaxy. Every file and script referred to here can be found inside the code package.

We strongly suggest that you read the paper before using the code. Essentially all the steps and visualizations are presented inside the paper. Also please read the README file supplied with the code. This HOWTO assumes that you have already read both the paper and the README file.

Prerequisites

SPEAR relies on LAPACK and BLAS to solve systems of linear equations. It requires no extra effort to install them as many systems either come with LAPACK and BLAS pre-installed (MAC), or have them conveniently in software repositaries (Linux distributions). LAPACK and BLAS are usually bundled with each other, and like BLAS it has spawned many different variants as well, especially ATLAS if you want your SPEAR to be multithreading in a multi-core environment.

For more information about LAPACK/BLAS and ATLAS, please read LAPACK FAQ and ATLAS FAQ.

The code is written in Fortran 90. To compile, please make sure you have a valid fortran compiler in your system, then edit the "Compilation options" section in the "Makefile", specifically,

    # @file  Makefile
    # Compilation Options
    F90     = ifort      # Your Fortran compiler, e.g., gfortran and ifort, etc
    LIBDIR  = pathtolib  # Path to your LAPACK/BLAS/ATLAS libraries, e.g., /usr/lib/atlas-base/
    BINDIR = $(HOME)/bin # Place where you want to put the spear executable file,
                         # please make sure it is in your system PATH, e.g., ~/bin

Prepare the data and input files

After modifying the "Makefile", you can type "make" in your favorate terminal to get the executable file "spear".

SPEAR requires a specific format for the light curve data file. You can find example data files "lc_loopdeloop_*.dat" in the "loopdeloop/single" and "loopdeloop/tophat_h*" directories. Imagine you want to fit two light curves, the first one should always be the continuum light curves and the second one be the line light curve. If the continuum light curve has 5 data points while the line light curve has 4, the data file should be like (texts after # are comments, not part of the file)

    2                       # number of light curves, continuum first
    5                       # number of data points in the first light curve
    8461.5  22.48    0.36   # each light curve entry consists of "Date", "Flux", and  "Flux_Uncertainty"
    8490.6  20.30    0.30
    8520.3  19.59    0.56
    8545.8  20.11    0.15
    8769.6  21.12    1.20
    4                       # number of data points in the second light curve
    8545.8   9.82    0.23
    8890.4  11.86    0.58
    8949.4  10.55    0.87
    8988.6  11.06    0.27

You can either tweak the code for your own format or generate it with the perl script "GenLC.pl" . If you enter "loopdeloop" directory, there are three separate 3-column light curve files "loopdeloop_con.dat", "loopdeloop_ha.dat", and "loopdeloop_hb.dat", run:

    [yingzu@loopdeloop]$ GenLC.pl
    USAGE: gen_lc.pl OBJECT NCURVE NAME_LINE01 [NAME_LINE02] ...
    
    [yingzu@loopdeloop]$ GenLC.pl loopdeloop 1 con
    ******************************
    Object: loopdeloop
    # of Lines: 1
    Names: con
    OUTPUT: lc_loopdeloop_con.dat
    ******************************
    Opening: loopdeloop_con.dat
    number of data points in cont: 36
    DONE.

which will generate "lc_loopdeloop_con.dat" that works with the code in convolution mode "single", you can then copy this file to the "single" directory.

    [yingzu@loopdeloop]$ cp lc_loopdeloop_con.dat ./single/

The example input files can be found as "input.*" files under "loopdeloop/single" and "loopdeloop/tophat_h*". If you feed the input by STDIN, the program will prompt you whenever an input is needed. Generally it asks for the light curve data file, the convmode, the initial parameters, which of them are to be fixed during fitting, whether you want the outputs of simulated light curves ("error snakes"), whether you want to do MCMC run, and if so, where do you want to save the MCMC output.

We implemented 6 convolution modes in the main code, and we refer to them in the code as

    0 DELTAFUNC    = use a delta function as transfer function, expect 2 light curves
    1 TOPHAT       = use a tophat function as transfer function, expect 2 light curves
    2 SINGLE       = model a single unsmoothed light curve, for modeling continuum line curve only
    3 PHOTOECHO    = model two broad band photometric light curves
    4 DOUBLETOPHAT = use multiple tophats for different emission lines or velocity bins, expect 3 or more light curves
    5 BERET        = use luminosity dependent transfer function, expect 2 light curves

Continuum Fitting

The first step is to model the continuum light curve alone to determine the prior on tau and sigma. Enter the "single" directory, and run:

    [yingzu@loopdeloop]$ cd ./single/
    [yingzu@single]$ vi input.single
    lc_loopdeloop_con.dat
    2                                        # single mode
    0.794328234724282 25.1188643150958       # \hat{\sigma} \tau
    1 1                                      # vary everything
    1                                        # run ameoba
    1                                        # run prediction
    1                                        # run MCMC
    mcmc_con.dat
    [yingzu@single]$ spear < input.single

here we assume the executable "spear" always exists in the system variable `$PATH`. The output files are:

    [yingzu@single]$ ls
    best3.new  fort.13  fort.25  mcmc_con.dat

where "best3.new" contains the best-fit parameters from ameoba, "mcmc_con.dat" is the output from an MCMC chain, "fort.13" are the predicted continuum light curve at unmesured time, and "fort.25" records accepted MCMC steps before "burn-in" for the test of mixing and convergence. The "unmeasured time" light curve simply samples the period covered by the data at 10 times the mean temporal resolution.

From the MCMC output, we can get the prior required for the fitting of lines:

    [yingzu@loopdeloop]$ FindCL.pl -s mcmc_con.dat -t stats.dat

The output "stats.dat" will be needed for the combined fitting of continuum and lines. It has only one line with 6 numbers asfollows,

    Log10(Tau) Log10(Tau_lo) Log10(Tau_hi) Log10(Sig) Log10(Sig_lo) Log10(Sig_hi)

Log10 means the logarithm to the base 10. "Tau" and "Sig" are the center values of \tau and \hat{sigma}, respectively. subscript "lo" and "hi" indicate the lowest and highest values enclosing 68.3% of distribution.

Although the ameoba routine only finds the local maximum of likelihood, it does not matter at this stage, because all AGNs occupy a very distinct region in the sigma-tau parameter plane. The MCMC run will quickly converge to the best-fit parameter pairs in any case. However, you can always initialize tau with a reasonable initial value from its scaling with optical luminosity (see Figure 3 of our paper), or run a grid likelihood calculation to isolate the best-fit parameter pairs as peaks on the grid, which we have implemented in an automated script "SinWrapper.pl":

    [yingzu@single]$ SinWrapper.pl -h
    usage: SinWrapper.pl -f file [-h -v -w -a -r -p -m rootname -z redshift -s spear -g s1_s2_ds_t1_t2_dt -d grid.dat]
     -h         : this (help) message
     -v         : turn on verbose mode for script output, defaut: off
     -w         : turn on verbose mode for spear output, default: off
     -a         : show an ASCII plot of the likelihood grid on sigma-tau plane, default: off
     -f foo.dat           : light curve data (only the continuum)
     -r rootname          : root name for all the output files, default: null string
     -z redshift: redshift of the object for converting all time to rest-frame, default: 0
     -s pathtospear       : path to the spear executable, default: spear
     -g s1_s2_ds_t1_t2_dt : specify your own grid geometry in a '_' concatenated string, default: -3_1_41_0_4_41
     -d pathtogrid        : use existing grid file without running calculation, default: none
     -p         : predict hires light curves at the peak, default: off
     -m         : run a MCMC chain, default: off
    example: SinWrapper.pl -f lc_loopdeloop_con.dat -v -p -m -a

Only the [-f] option is required that you have to indicate the continuum light curve data file. By default, "SinWrapper.pl" will do a grid likelihood calculation on a tau-sigma plane and identify the most significant peak of likelihood as the input for subsequent light curve prediction and MCMC confidence level estiamtion. So if you run:

    [yingzu@single]$ SinWrapper.pl -f loopdeoop_con.dat -v -p -m -a

the code will write a "grid.dat" as the result of grid likelihood calculation, a "hires.dat" as the predicted light curves from the best fit parameters recorded in "best3.txt" found by searching the grid , a "mcmc.dat" as the MCMC chain around the peak, and a "stats.dat" as prior input for model parameters in lag determination.

Joint Fitting of Continuum and Line Light Curves

The lag-likelihood surface frequently has multiple peaks, so simply running a minimization routine ("ameoba" here) may not find the global peak likelihood. Assuming it is close to the BLR radius-luminosity relationship could introduce dangerous biases. At present, we generate a grid of solutions by repeatedly running the code using a PERL script.

For example, to search for a global maximum of the likelihood function for both lag and width of the Tophat transfer function, we can calculate the likelihood on a 2-D grid and run an MCMC around the peak identified from the grid calculation.

The grid calculation can be done with "TopWrapper.pl". Return to the parent directory and generate the light curve file for the joint fit of continuum and H-alpha light curves,

    [yingzu@single]$ cd ..
    [yingzu@loopdeloop]$ GenLC.pl loopdeloop 2 con ha
    ******************************
    Object: loopdeloop 
    # of Lines: 2 
    Names: con ha
    OUTPUT: lc_loopdeloop_con_ha.dat
    ******************************
    Opening: loopdeloop_con.dat
    number of data points in cont: 36
    Opening: loopdeloop_ha.dat
    number of data points in cont: 18
    DONE.

Copy "lc_loopdeloop_con_ha.dat" and enter the "tophat_ha" directory,

    [yingzu@loopdeloop]$ cd ./tophat_ha/

and try:

    [yingzu@tophat_ha]$ TopWrapper.pl -h
    usage: ./TopWrapper.pl -f file [-h -v -w -a -r rootname -s spear -t stats.dat -g t1_t2_dt_w1_w2_dw -p -d grid.dat]
     -h         : this (help) message
     -v         : verbose mode for script output
     -w         : verbose mode for spear output
     -a         : show an ASCII plot of the grid and peaks found
     -f foo.dat : light curve data (continuum + line)
     -r rootname          : root name for all the output files
     -s pathtospear       : path to the spear executable
     -t statsfile         : path to the prior file (if other than ./stats.dat 
                            or ../single/stats.dat)
    
     -g t1_t2_dt_w1_w2_dw : specify your own grid geometry in a '_' concatenated string
     -d pathtogrid        : use existing grid file without running calculation
     -p         : predict hires light curves at the peak
    
    example: ./TopWrapper.pl -f lc_loopdeloop_con_ha.dat -v -a -g 0_50_1_0.01_10.01_1

"TopWrapper.pl" does similar things as "SinWrapper.pl". It assumes a TopHat transfer function for the BLR and creates a lag-width grid whose geometry can be set by the "-g" option. After finishing the grid calculation, it uses a Coonnected Component Labeling algothrim to identify isolated peaks. Unlike "SinWrapper.pl", it try to identify multiple, isolated solutions because there could be multiple solutions of similar significance thus need to be kept for further analysis. For the H-alpha line of Loopdeloop, we have only one solution and you can find it by running,

    [yingzu@tophat_ha]$ TopWrapper.pl loopdeloop_con_ha.dat -v -a -g 0_50_1_0.01_10.01_10

The "-g 0_50_1_0.01_10.01_1" indicates the geometry of the 2-d grid (concatenated by "_" to avoid perl parse error), corresponding to "lagmin lagmax lagstep widmin widmax widstep".

where "lagmin" and "lagmax" give the range of possible lags, which usually start from 0 and run to around 2/3 of the length of the light curves, while "widmin" and "widmax" give the range of transfer function width, starting from small values (but above 0) to no more than 15 days. "lagstep" and "widstep" indicate the step size of the grid in two axes. You can choose their values according to the limit on your computational resources, and what resolution you need to resolve each peak in your lag likelihood distribution function.

Note that you also need "stats.dat" file generated by the continuum fitting and put it in either current directory or "../single" for the joint fit to work, or assign the postion by using "-t" option. The most important output of "TopWrapper.pl" is "rootname.peaks.grid" (rootname is set by using "-r" option), which records the information of all the solutions found by the grid calculation, with formats like,

    logL lag width sigma tau scale

where "logL" is the log likelihood value of this solution of "lag" + "width", "sigma tau" are the best fit parameters for the model parameters \hat{\sigma} and \tau, and "scale" is a nuisance parameter. Once you finish the grid calculation and find out there is only one peak spitted out, then you can eye-ball check whether this solution is an artefact (could be a false peak induced by a large grid, or could be the light curve is too noisy that SPEAR favors a extreme long width transfer function to smear out variabilities). If everything seems all right, congratulations, you can run another MCMC around this peak to get the confidence limits on your lag estimation,

    [yingzu@tophat_ha]$ vi input.tophat.mcmc
    lc_loopdeloop_con_ha.dat
    1                                        # tophat mode  
    0.838097 34.480781 15 0.607393 1.001     # directly from "*.peaks.grid"
    1 1 1 1 1                                # vary all 5 parameters
    0 do amoeba                              # no ameoba
    0 predict light curves                   # no prediction
    1 do mcmc                                # run MCMC
    mcmc_ha.dat
    
    [yingzu@tophat_ha]$ spear < input.tophat.mcmc

Run "FindCL.pl" on the MCMC output "mcmc_ha.dat" ("-c 9" as the 9th column is the lag, please refer to the source code for the column keys),

    [yingzu@tophat_ha]$ FindCL.pl -m mcmc_ha.dat -c 9
    Using column: 9
    CL:     14.400810     13.575960     15.229258

thus the H-alpha lag estimate is 14.4 days, with 1-sigma range [13.57 days - 15.23 days].

Joint Fitting of Continuum and Line Light Curves

What if two or more peaks are found after the joint fit? You have to resort to a third light curve, or pray for better light curves to come. If you apply the above procedures to the H-beta line in the same directory, you will find two H-beta lag solutions of the similar significance. Nonetheless we know from above that the H-alpha lag estimation has only a single solution, thus we can gain extra constraining power by fitting the H-beat line together with H-alpha line as if the H-alpha light curve is a lagged and smoothed version of the underlying continuum with different sampling epoches as the "true" one.

Examples for DOUBLETOPHAT mode and BERET mode will be added soon.