Evolve You
Published in

Evolve You

Troubleshooting With ROOT Macros

Problem solving with CERN’s analysis language in C++

Written in April of 2020.

ROOT is a data analysis language created by CERN that is used by many research scientists across the world. Although it is very powerful and multipurpose, many of its users will agree that ROOT has a bit of a learning curve.

In this article, I document the learning process that I went through to write my first complete C++ macro and the solutions to errors that I came across. I write these troubleshooting articles not only for the benefit of others with the same issues, but also so that future-me can remember the basics. Although the documentation is a great resource, I found there to be a lack of an organic user guide, so I hope to fill that niche as well.

Table of contents:

A. Intro

  1. The Goal
  2. Starting ROOT
  3. Understanding how to use ROOT
  4. Understanding how to use macros

B. Functions and errors

  1. Storing the names of .root files in the directory
  2. Creating a histogram from data and integrating it
    a. Warning in <Fit>: Fit data is empty
    b. no known conversion from…
    c. ROOT crashing with no error messages
  3. Getting the starting time and ending time of a data run
  4. Plotting the x and y axis
    a. Graph shows up blank
    b. Using SetTimeDisplay()
  5. Putting all functions together
    a. Memory leaks

C. General advice

A. Intro

1. The Goal

It’s useful to know what your goal is before writing any code. My goal was to open a .root file, create a histogram with the data, integrate a portion of the histogram, and repeat with many files, eventually plotting the integrated value as a function of file. Each .root file represented a day of radon monitor data, so my goal was basically plotting radon concentration as a function of time.

2. Starting ROOT

After installing ROOT and setting environment variables, you can enter the ROOT command line by typing

root

in a command prompt or terminal.

I learned that you can start root without the startup animation using

root -l

or run root without any window popups during runtime with

root -b

3. Understanding how to use ROOT

The first important realization was that the ROOT language is a beefed up C++. This means that on top of calculator inputs and ROOT-specific commands, the ROOT terminal accepts C++. For example, you can declare variables and write for loops:

(ROOT command line)
root [0] int x = 5
(int) 5
root [1] for(int i = 0; i<x; i++){std::cout<<i<<std::endl;}
0
1
2
3
4
root [2]

The additions to C++ that ROOT brings are some classes and datatypes that are added on top of the C++ foundation.

The first useful class that I found was the TBrowser. The browser lets you see the .root files in your directory and explore their contents. It also lets you plot histograms by double clicking the contents of .root files, which helped me know when I got the histogram right.

To use it, you type

TBrowser b;

in the ROOT terminal. When you type this, you are declaring a new TBrowser called b, which automatically opens the browser and lets you navigate through your files. Besides visualizing the available files, the browser is also useful for checking that a data file isn’t empty. For my job, I was given .root files, which made this a very nice way to inspect what was given to me.

Some other essential classes are the TFile and TCanvas, which let you handle files and draw things programmatically respectively.

As for datatypes, the common ones that came up are Double_t and Int_t. I’ve been told that they have powerful functions, but they didn’t always work with my code so my instinct was to always try to use them but revert to standard C++ double and int if they didn’t work.

All these C++ inputs, object declarations, etc. can be written directly in the command line, or written in a pre-packaged bundle called a macro.

4. Understanding how to use macros

Up to now, we’ve only used the ROOT command line. Another way to execute code is to write macros. Macros are essentially programs written in the C++/ROOT language. They are preferred for repetitive tasks, because you get to execute hundreds or thousands of lines of code by running the macro instead of typing them all into the ROOT command line individually.

To write a macro, you have to name the file macroName.cc or macroName.c and the file has to have a function that shares the same name as the file.

For example, a file named plot.cc should contain

void plot() {
//your code
}

As is the case with most programming languages, it would make sense to break up the goal into a few smaller functions that each get a macro. For my goal, I might want a macro that gets a list of the .root files, and another macro that integrates the histogram given a file.

It is possible to call macros from other macros (several files), but it’s not easy to pass parameters into them, so instead you can write many functions in one macro (one file), and call them in the main macro. The pseudo-code for the latter looks like this:

File named rnAnalysis.cc:

void rnAnalysis(){
//code using function1 and function 2
}
double function1(){
//code addressing part of the problem
}
vector<string> function2(double d){
//code addressing another part of the problem
}

If you arrange the macro as I’ve laid it out, your code editor might complain about function1 and function2 not being declared, which is fixed by creating function prototypes for function 1 and 2 at the top of the page or in a header file.

To put your macro in action, you can either run

root macroName.cc

in your command line/terminal, or

.x macroName.cc 

in the ROOT command line.

Now that we know the basics of macros, I’ll lay out the functions that I made and the problems that came with them.

B. Functions and Errors

Here, I’ll talk about the functions that I wrote. It helps a lot if you have C++ or C experience heading into ROOT, but I’ll admit a lot of it is reading documentation and copying forum solutions.

1. Storing the names of .root files in the directory

I had no real issues here; I essentially copied this forum answer and adapted it to output to a vector instead of just printing.

Code for listFiles:

vector <string> listFiles(const char *dirname = pathToRootFiles, const char *ext = ".root") {
cout << "Looking for .root files..." << endl;
//store array of .root filenames
vector <string> arr;
TSystemDirectory dir(dirname, dirname);
TList *files = dir.GetListOfFiles();
if (files) {
TSystemFile *file;
TString fname;
TIter next(files);
while ((file = (TSystemFile *) next())) {
fname = file->GetName();
if (!file->IsDirectory() && fname.EndsWith(ext)) {
//add .root files to arr
arr.push_back(fname.Data());
}
}
} else {
cout << "No .root files found. Change the path in rnAnalysis.cc" << endl;
}
if (publishFileDetails) {
for (auto x : arr)
cout << x << "\n";
}
delete files;
return arr;
}

I only understand how it works on a surface level, so I can’t necessarily explain this function in detail. To use it, I declared a const char* variable before it:

const char *pathToRootFiles = "C:/path/to/root/files" 

This gets used in the first parameter of listFiles.

2. Creating a histogram from data and integrating it

Since we want to create a histogram for every file and do calculations on each one, it’s useful to have a function that takes a file name and does the integration of the associated histogram.

This is the complete histIntegrateRN function:

double histIntegrateRN(string filename) {
gROOT->cd();
//Turn string into char*
char *cstr = new char[filename.length() + 1];
strcpy(cstr, filename.c_str());
//manipulate histogram, fit it, integrate
TH1D *h = new TH1D(cstr, "fadc;channel;#entries", 8039., 0., 4096.);
TFile *f = TFile::Open(cstr);
TTree *t;
TCanvas *c = new TCanvas("c", "titles", 600, 600);
f->GetObject("r", t);
gROOT->cd();
try {
t->Project(cstr, "fadc_channel");
}
catch (int e) {
cout << "Error in histIntegrateRN" << endl;
}
h->Fit("gaus", "Q", "", 1850, 2050);
TF1 *func = h->GetFunction("gaus");
double ans = 0.0;
//Check that the fit worked
if (func == nullptr) {
cout << "Fit did not work, check fit bounds and histogram" << endl;
} else {
ans = h->GetFunction("gaus")->Integral(1850, 2050);
}
cout << "Counts found from integrated fit: " << ans << endl;
//garbage collection
delete[] cstr;
delete c;
delete f;
delete func;
delete h;
return ans;
}

I won’t go too far into the code, but at a basic level, I:

  1. Change the string filename parameter to a more appropriate char * called cstr
  2. Create a histogram and canvas to receive the data
  3. Create a tree from the root file and project one of its channels onto the histogram. This is the step where things are first drawn onto the canvas.
  4. Fit the histogram with a gaussian function and integrate said function. The gaussian function is automatically shown on the canvas.

Here are the errors that I had to fix:

a. Warning in <Fit>: Fit data is empty

One of the bigger problems was that I kept getting “Warning in <Fit>: Fit data is empty” for any histograms past the first one when I tried calling rnAnalysis on many files in succession. What alleviated this problem was putting

gROOT->cd(); 

after opening the file, and once at the beginning to be safe.

What gROOT->cd() does is return to the main directory of root, which is important because opening root files sets them as your current directory, which means that canvases or other objects you create might be located in the wrong place. The way I figured out the problem was by putting

gDirectory->pwd();

after every line of code. This function prints out the current directory, which allowed me to see that I was in the wrong directory when making the next file’s canvas. gDirectory->pwd() is a very useful debugging tool.

b. no known conversion from ‘std::string’ (aka ‘basic_string<char, char_traits<char>, allocator<char> >’) to ‘const char *’ for 1st argument

Many functions in ROOT (and C++) take a const char *, which is different from a string. If I understand it correctly, const char * is a pointer that “points” to the beginning of a sequence of characters. I find it useful to work with strings because the file names are easy to translate into strings, but it means that I have to convert the strings to char * s.

This is done here:

char *cstr = new char[filename.length() + 1];
strcpy(cstr, filename.c_str());

The first line initializes a char *, and the second one copies filename (which comes from the function parameter) into it. cstr is then used everywhere else instead of the filename variable.

c. ROOT crashing with no error messages

This problem was very puzzling: normally, when ROOT crashes, it gives you some error messages and a stack trace that you can interpret to find where the problem was. However, for this problem, my program kept crashing without and error messages or stack trace.

The biggest clue was that it always crashed after a set number of files processed. This made me think that it was a memory problem.

After trying many things, the solution was to do “garbage collection”. Garbage collection is the process of deleting ROOT objects such as TCanvases, TFiles etc. at the end of functions. Without deleting these objects, your computer runs out of memory since it creates new objects for every file. An example of garbage collection can be seen at bottom of the histIntegrateRN function:

//garbage collection
delete[] cstr;
delete c;
delete f;
delete func;
delete h;

Notice that not every ROOT object is deleted: when deleting some objects (e.g. TTrees), ROOT refuses (crashes) and I’m not sure why. My methodology is to try to delete everything and then remove the delete lines that aren’t working until the program runs.

3. Getting the starting time and ending time of a data run

Two functions are covered here: extractDates and runTimeInHours

I didn’t have major issues with either function.

extractDates takes the file name and returns a date

vector <double> extractDates(vector <string> input) {
cout << "Extracting dates..." << endl;
vector <double> datesOnly;
for (auto x : input) {
char *cstr = new char[x.length() + 1];
strcpy(cstr, x.c_str());
TFile *f = TFile::Open(cstr);
TTree *t;
f->GetObject("r", t);
double tmin = t->GetMinimum("ftimestamp");
datesOnly.push_back(tmin);
delete f;
}
if (publishFileDetails) {
for (auto y : datesOnly)
cout << y << "\n";
}
return datesOnly;
}

It uses the GetMinimum function of the TTree class on the ftimestamp channel of the .root file, which essentially means that it gets the first recorded time.

This time is used later as the x axis for the plot.

runTimeInHours uses the previously used GetMinimum and the reciprocal function GetMaximum to find out how many hours a run lasted for:

double runTimeInHours(string filename) {
char *cstr = new char[filename.length() + 1];
strcpy(cstr, filename.c_str());
TFile *f = TFile::Open(cstr);
TTree *t;
f->GetObject("r", t);
double tmin = t->GetMinimum("ftimestamp");
double tmax = t->GetMaximum("ftimestamp");
double dt = tmax - tmin;
std::cout << "dt = " << dt / 60 / 60 << " [hr]" << std::endl;
//garbage collection
delete[] cstr;
delete f;
return dt / 60 / 60;
}

This run time is used in conjunction with the integrated value to find a “concentration”, essentially how many counts there are per hour of data collection.

There aren’t any errors to discuss here, but note the garbage collection at the end of both functions.

4. Plotting the x and y axis

The desired y axis is the concentration of Radon, and the x axis is the date found with extractDates. To graph these, I made a plot function that takes a vector of doubles for concentration and for date (date is stored in Unix epoch time, which can be a double).

The code for the plot function is here:

void plot(vector<double> conc, vector <double> date) {
auto c = new TCanvas("c", "c", 1200, 500);
//sanity check
if (conc.size() != date.size()) {
cout << "Concentration data and date data do not have the same length" << endl;
}
//x and y axis of graph
Double_t x[conc.size()];
Double_t y[date.size()];
for (int i = 0; i < conc.size(); i++) {
x[i] = date.at(i);
y[i] = conc.at(i);
}

auto graph = new TGraph(conc.size(), x, y);
c->SetLogy();
graph->SetMarkerStyle(20);
graph->Draw("ALP");
graph->SetTitle("AV Cover Gas Radon Monitor Concentration vs Time;Date (dd/mm/yr);Concentration (counts/hour);");
graph->GetHistogram()->GetXaxis()->SetTimeDisplay(1);
graph->GetHistogram()->GetXaxis()->SetTimeFormat("%d\/%m\/%y%F1970-01-01 00:00:00");
c->Print("c.pdf");
conc.clear();
date.clear();
}

I had a few errors in this final function.

a. Graph shows up blank

When I started, the canvas output by this function was always blank. I believe the fix for this was to make sure graph->Draw() had an “A” option as one of its parameters, as the “A” indicates to draw the axes. You can combine options in ROOT so graph->Draw(“ALP”) is acceptable.

b. Using SetTimeDisplay()

The ftimestamp records the date in Unix epoch time (seconds since 1970), which on its own makes for an ugly display:

The documentation for SetTimeDisplay is very sparse, but the way I’ve used it seems to work pretty well:

graph->GetHistogram()->GetXaxis()->SetTimeDisplay(1);
graph->GetHistogram()->GetXaxis()->SetTimeFormat("%d\/%m\/%y%F1970-01-01 00:00:00");

The time format’s first parameter is the way the day, month and year should be ordered, and the second parameter is the offset, which is the start date of Unix time in my case.

5. Putting all functions together

As mentioned before, I wanted to break up the problem into a few functions and then put them all together. This is the “main” function that the file is named for that calls all the other functions.

void rnAnalysis2() {
vector <string> files = listFiles();
vector <double> datesOnly = extractDates(files);
cout << "Found " << files.size() << " files" << endl;
vector<double> rnConcentrations;
for (int x = 0; x < files.size(); x++) {
cout << "File # " << x + 1 << ": " << files.at(x) << endl;
double amount = histIntegrateRN(files.at(x));
double timeInHours = runTimeInHours(files.at(x));
rnConcentrations.push_back(amount / timeInHours);
cout << "Concentration for " << datesOnly.at(x) << ": " << amount / timeInHours << " [counts/hour]" << endl
<< endl;
}
plot(rnConcentrations, datesOnly);
}

The function starts by getting a vector of the file names using listFiles, a vector of the starting times for each file using extractDates, and then printing out the number of files found.

In a loop, it uses histIntegrateRN and timeInHours for every file listed by listFiles, calculating a concentration by dividing the results.

Finally, plot takes the y and x data and plots it.

I only had one error on the scope of the entire macro: memory leaks.

a. Memory leaks

I can no longer recreate the error message, but I have a cut-off screenshot of it:

When you get an error like this, it’s best to read through it to get some hints. The solution to this supposed memory leak was to clear the vectors I was using as seen at the bottom of plot():

conc.clear();
date.clear();

The problem was probably that I was trying to overwrite vectors every time a new file was opened, which is bad.

My code is in bits and pieces all over this article, so for the full macro, check out my github.

C. General advice

I suspect that most people will be reading this article because of a specific error that they had. However, if you’re here trying to get a grasp of ROOT, here is my general advice:

  • Use Google to look for solutions to your problems.
  • Add “CERN” to your ROOT Google searches to improve your odds of finding the right answer (e.g. “root cern open file”).
  • If you can’t find any solutions to your problem, ask the question yourself on the ROOT Forum. There are many helpful experts that answer within a few hours, and I wouldn’t have made the progress that I did without their help.
  • Don’t give up!

My final output looked like this:

Radon concentration as a function of time in a Radon Monitor

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store