Apache Nifi - powerful design your workflow instead of winscp/FTP
|
Unpack the archive to any drive i am doing on c drive. and run /bin/run-nifi.bat. It might take a couple of minutes for Nifi to start.
Start apache nifi from : - nif-1.8>>bin>>run-nifi.batch
drag and drop processor
drag all processor see in above image.
namely :- getfile,UnpackContent,Split Avro,CovertAvrotojson,
- Add UnpackContent processor (repeat the steps 4 to 8) and change the packaging format to zip in the properties tab
- In the configuration of the UnpackContent processor go to settings tab and check the checkbox “original” in the “automatically terminate relationships” group (this way the processor will not provide the original file). Apply the changes
- Add SplitAvro processor and do the step 11 for it.
- Add ConvertAvroToJson
- Check the checkbox unmatched in the settings tab (we are not interested in the files that do not match the query)
- In the properties tab add the new property by clicking . This property is used as the filename when we store the results. The name of the property is “filename”, and the value is $.messageId (the messageId is stored in the root of the json).
- dd the processor PutFile. In the settings tab terminate the relationship if the file was successfully stored (the checkbox “success” in the group “Automatically Terminate Relationships”).
Set the Directory in the properties tab to the folder where you want to store the results. Note: Do not choose the same folder where you put source.zip!
- Add Log Attribute processor. Terminate the relationship if it is successful.
- The last step is connecting all the processors.
For each processor drag the arrow in the middle of the processor and connect the line to another processor. At the end you should get:
- That is it. In about 10 minutes we built a data pipeline without writing any code! We can run it by clicking Start button (make sure that no processors are selected otherwise only selected processors will be started).
Apache NIFI
Defination - "NIFI lets us know - Most of the organization likely to design their business as three blocks
data acquisition
data storage
data anatytics
all three block run sequentially and without any user intervention .
in order to automate data flow without any challenges*(data in motion,mapping,transformation,date and time,renaming,etc.. ) need apache nifi.
exact data need to the analytics....
Data-flow:
Producer --->>Transform ---------->>entrance --->> Error -->>error log
------>>Calculate --->> Reporting
IOT - Millions of device sending the data
real time processing
USE CASES
Nifi is very strong in organizing the data pipelines. In our experience, it is one of the best tools for it. For instance, if you need to get data from Kafka, transform it, filter it and upload into HDFS and ElasticSearch Nifi is one of the best candidates. We recommend going through the list of the standard processors to see if your data flow can fit.
Nifi has a way to integrate with other technologies easily. We successfully use it Apache Spark Streaming and Apache Storm.
One scenario where Nifi should not be used is very complex logic, analytics, and machine learning. Nifi is a data flow tool, and other technologies are a better fit for very complex processing like Spark, Ignite etc.
LIMITATIONS OF NIFI
Nifi is a new platform. As it has been developed under Hortonworks for only 2 years, it still has a long way to go to let programmers do the most of their work without writing code. We have seen in the past year that the development ramped up: new processors were added, many existing processors were significantly improved.
The current limitations that we noticed:
- Sharing workflows is not straightforward. If you have a team of developers working on the same data pipeline then sharing workflows can be challenging.
- Limited support of Kafka. Native java client of Kafka is very rich in features and configuration options. For example consumer in Kafka allows you to fully control offset commits to guarantee at least once processing. Nifi processor only supports automatic commits. The consequences: Nifi reads data from Kafka, commits offsets. After that, if anything happens with the node the records will be processed only after the node is fully restored.
- Limited support of ElasticSearch. The standard ElasticSearch processor does not support POST querying of ElasticSearch (the body is the query, and it is the only way to get all the querying capabilities of ElasticSearch).
- Performance considerations. We will discuss how Nifi processes data in the future posts. There are some advantages and disadvantages in how Nifi manages data pipelines
- Limited support of … Depending on the requirements standard processors might lack required functionality.
|
No comments:
Post a Comment