Incorporating Stata into reproducible documents

Hua Peng@StataCorp

https://huapeng01016.github.io/reptalk/

Reproducible research and reproducible documents

Stata is good at reproducible research

  • Manually performed data management and analysis can be easily turned into scripts (do-files)
  • Scripts from 30 years ago still run and produce the same results today and will do the same in the future
  • Datasets created 30 years ago can be read today and in the future

Stata 15 added commands to automate report generation

  • dyndoc - convert dynamic Markdown documents to web pages
  • putdocx - create Word documents
  • putpdf - create PDF files

A dyndoc example

dyndoc fuel.txt, replace    

Why dynamic documents?

  • Eliminate manual steps such as hand-editing documents
  • Include outputs, saved results, and graphs

A hands-on session

  • load and examine data
  • run analysis
  • save commands to a script
  • run script
  • write report

Dynamic tags

dd_do for a block of Stata code

<<dd_do>>
sysuse auto, clear
generate fuel = 100/mpg
label variable fuel "Fuel consumption (Gallons per 100 Miles)"
regress fuel weight
<</dd_do>>

. sysuse auto, clear
(1978 Automobile Data)

. generate fuel = 100/mpg

. label variable fuel "Fuel consumption (Gallons per 100 Miles)"

. regress fuel weight

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =    194.71
       Model |  87.2964969         1  87.2964969   Prob > F        =    0.0000
    Residual |  32.2797639        72  .448330054   R-squared       =    0.7300
-------------+----------------------------------   Adj R-squared   =    0.7263
       Total |  119.576261        73  1.63803097   Root MSE        =    .66957

------------------------------------------------------------------------------
        fuel |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |    .001407   .0001008    13.95   0.000      .001206    .0016081
       _cons |   .7707669   .3142571     2.45   0.017     .1443069    1.397227
------------------------------------------------------------------------------

Attributes change a tag's behavior

<<dd_do:quietly>>
matrix eb = e(b)
<</dd_do>>

dd_display for inline Stata results

  • For every unit increase in weight, a <<dd_display:%9.4f eb[1,1]>> unit increase in fuel consumption is predicted.
  • For every unit increase in weight, a 0.0014 unit increase in fuel consumption is predicted.

dd_graph

<<dd_do>>
scatter fuel weight, mcolor(%50)
<</dd_do>>
<<dd_graph:sav(sc_gp100m_weight.png) replace>>

More dynamic tags

Display contents based on condition

<<dd_skip_if: ("`details'"=="")>>
The following explains why Gallons per 100 Mile is a better 
measurement than Miles per Gallon. Going from a 10 Miles per Gallon 
car to a 20 Miles per Gallon car saves 5 Gallons per 100 Miles when 
Miles per Gallon increases 10. Going from a 20 Miles per Gallon car 
to a 40 Miles per Gallon car *only* saves 2.5 Gallons per 100 Miles 
when Miles per Gallon increases 20.      
<<dd_skip_end>>

Include a text file

<<dd_include: /path/file>>

Some Markdown syntax

Headings

# H1
## H2
### H3

Fenced code block

Use "~~~~" or "````" for fenced code block

Emphasis

Use asterisks (*) and underscores (_) for emphasis. 

Image

![Alt text](/path/to/img.jpg "Optional title")

A longer example

dyndoc fuel_consumption.txt, replace 

".do files on steroids"

  • macros work the same as in a do-file
  • accept arguments

Use arguments in dyndoc

Produce a set of different HTML pages from one dynamic document with different arguments

Community-contributed software

Some commands on ssc that use pandoc to convert Markdown documents:

  • dynpandoc
  • markstat
  • markdoc
  • webdoc

Use pandoc instead of Stata's markdown command

From a single dynamic document, we may produce

The commands used are

    // web page
dynpandoc fuel_cc.txt, saving(fuel_pandoc.html) /// 
        from(markdown) replace  
    // docx
dynpandoc fuel_cc.txt, saving(fuel_pandoc.docx) /// 
        from(markdown) replace                  /// 
        pargs("--reference-doc=reference.docx") 
    // PDF
dynpandoc fuel_cc.txt, saving(fuel_pandoc.pdf)  /// 
        from(markdown) replace

putdocx

Generate tables from saved results

From estimation command

regress fuel weight
putdocx table tbl_reg = etable

From margins

regress fuel weight i.foreign i.rep78
margins foreign rep78 
putdocx table tbl_marg = etable

From estimates table

quietly regress fuel weight gear turn
estimates store model1
quietly regress fuel weight gear turn foreign
estimates store model2
estimates table model1 model2, b(%7.4f) stats(N r2 r2_a) star
putdocx table tbl_est = etable

From dataset

putdocx table tbl_data = data(_all)

Change table styles and layout

        // add a table without borders
putdocx table tbl_data_1 = data(_all), border(all,nil)
        // add a double width line border 
        // at the bottom of the first row
putdocx table tbl_data_1(1,.), border(bottom,double)
putdocx table tbl_data_1(3,.), border(bottom, dotted)
        // make the first cell of 
        // the first row span 3 columns
putdocx table tbl_data_1(1,1), colspan(3) halign(center)
putdocx table tbl_data_1(14,.), border(top,dotted)
putdocx table tbl_data_1(17,.), border(top,double)
        // make the first cells of 17th and 18th 
        // rows span 3 columns, also make the contents 
        // of the cells left aligned and italic  
putdocx table tbl_data_1(17,1), colspan(3) halign(left) italic
putdocx table tbl_data_1(18,1), colspan(3) halign(left) italic

After change table styles and layout

Nested table

    // table may be created in memory using -memtable- option
regress fuel weight if foreign
putdocx table tbl_f = etable, memtable
regress fuel weight if !foreign
putdocx table tbl_d = etable, memtable
    // add tables in memory into cells of another tbale
putdocx table tbl_l = (2, 2)
putdocx table tbl_l(1, 1)  = ("Foreign"), halign(center)
putdocx table tbl_l(1, 2)  = ("Domestic"), halign(center)
putdocx table tbl_l(2, 1)  = table(tbl_f)
putdocx table tbl_l(2, 2)  = table(tbl_d)   

Resulted nested table

Community-contributed commands based on putdocx

  • sum2docx
  • reg2docx
  • t2docx
  • corr2docx

Recap

dyndoc to generate web pages

  • Include outputs, saved results, and graph in text file using dynamic tags
  • Use Markdown syntax to format text file to produce web pages

putdocx and putpdf

  • Include saved results and graphs in Word and PDF documents
  • Easily create tables
  • Easily modify table styles and layouts

Thanks!