Must have Primary skills required are Cloudera (Hadoop), Spark + Scala or Spark + Java and SQL
The resources should also have good understanding of Hive, Aerospike.
The resources should have strong analytical skills
Scope of work
Persistent resources will be taking the KT from the current team members on the developed framework.
The team will need to work on the following aspects:
Documentation of lineage as per the existing template.
Understanding the DQ rules from the data science team.
Onboarding of new incremental datasets along with configuration of the DQ rules etc.
Perform data validation checks.
Copying of production data into test environments – to be confirmed.
Report on the DQ issues
From:
Asanraj,
Vysystems
asanraj@vysystems.com
Reply to: asanraj@vysystems.com