The Fresco files contained here are part of a project at Purdue University conducted by Saurabh Bagchi and Carol Song, et al, to collect and analyze failure data on supercomputer clusters.
The portion of files here are targeted at breaking Torque logs down into tables to be imported into ELK where they will be joined with TACC Stats.
Certain assumptions are made about the environment. The parent directory will be referred to here as "Fresco" and the presence of the child directories "anon", "csv", and "log is assumed. Also, the Torque statistics portion consists of two seperate Python files, anon.py and csvfile.py which are located in the Fresco directory.
The anon.py script reads the Torque logs and anonymizes them by removing any data which could be used to identify a user or Purdue University. The anonymized files are written to the fresco/anon directory with the same name as the data file and the ".anon" extension added to it. gzipped files will be 'unzipped' in this process.
The output files from anon.py will, in a separate process, be used as input by csvfile.py. csvfile.py will analyze the anonymized Torque logs and convert those to a CSV file format which while have a top row of column heading followed by lines of data values. Those .csv files will be placed in the csv directory, alocg with other CSV files used to hold data from a number of Python dictionaries.
It is assumed, though it has not be tested, that those CSV files are ready to be imported into ELK.
The Python scripts were developed in Python 2.66 using some backports from newer versions of Python. Version 2.66 was used as the development version as that is available on all clusters.
The parameter for anon.py is the path containing the Torque logs to be anonymized. All files in that path that have no extension will be treated as a log to process. Any compressed files in the .gz format will be uncomressed and then processed. The anonymized output files will be sent to the directory anon/ where they will have the same filename with the extenson .anon.
*** Note: The Torque logs should be found in /depot/sbagchi/data.
*** Afterthought: It would probably be a good idea to change anon.py so that it accepts the output directory as the second parameter since the amount of input from a depot directory could fill a normal directory.
The parameter for csvfiles.py is the directory where the files that have been previously anonymized by anon.py reside. That directory by default is anon. csvfiles.py converts the anonymized files to CSV files formatted as a table complete with the first row of headings. The CSV files should be ready for importation into ELK.
*** Note: The clumsy name of csvfiles.py was adopted after discovering that the name csv.py caused problems for anon.py which imports csv.
*** Obvious Afterthought: The same modifications for anon.py apply to csvfiles.py.