Need a python programmmer to write script in python for the following 2 jobs
Job 1:
Beginning at 7 PM job checks for the text/zip file or table it should process. The job will go to sleep once the file is found and processed or if it is 7 AM and the file was still not downloaded. If the file was not downloaded by 7 AM, update the email notification that the file was not processed.
For a table, check that the modified date on the table(s) is today and that the row count is not changing (to make sure that the file is not in middle of being loaded).
For zip and txt , once the file is found, keep checking the file size until it stops changing. This means the FTP from the mainframe is complete.
If it is a zip file, unzip the file, move the zip to a backup directory.
Check that the file size is not below the low rec bound which would indicate there was a problem with the FTP. If it is below the low rec bound, move file to badfile directory.
If file size is within bounds, update runlog that the job is starting, run the data flow.
Update dlmast with reccount, the last time the job was run and when the next run will be.
Move text file to backup.
Update runlog that the job ended.
Job 2:
At 7 AM the email notification job should be kicked off. It accomplishes the following:
1) List of txt files left in $GV_SRC_FILE_LOC
2) List of files moved to Badfiles directory
3) List of files in Errfiles directory that contain data in it, meaning errors were written to them.
4) Any table that the reccount was not within the high and low rec bounds.
5) Tables that were due that were not updated. (The job didn't run eventhough it should have.)
4 & 5 can be a crystal report run off dlmast.
6) List of errors from ERRLOG file. Move errlog to errhist and clear errlog.
Tables needed for these jobs:
DLMAST contains a row per table with the following columns.(This is the 'metadata' and it was created based on our old dbase table):
1)Is the source a text/zip/table
2)Name of source (ex:[login to view URL])
4)Target Table name.
5)Data flow/Job name
6)Is the job active. (Sometimes we have to deactivate a job to debug a problem.)
7)high and low bounds of record count
8)current reccount
9)daily/monthly/weekly file
10)last date it was updated
11)next run date
RUNLOG is a historical table that contains the following columns:
1)Datetime
2)Jobname
3)Start/End
ERRLOG:
1) Datetime
2) Error message
(I'm not sure of the format of the di errors but whatever else would be helpful in debugging.)
ERRHISTis a historical file:
Same as errlog. Each day ERRLOG should be moved to ERRHIST