Pipeline Integrity Engineer learning R (The Beginning)

Purnomo Setyawendha
3 min readDec 19, 2020
Photo by Mike Benna on Unsplash

I am a Pipeline Integrity Engineer, not a developer, programmer, or any IT related person, so when I decided to learn a programming language 3 years ago, I choose R language because I think R can do Machine Learning and dealing with Geo-spatial data and I don’t care about production, deployment, object oriented, and any other stuff that I don’t understand.

Yes, Python was also came into the picture, but somehow I stuck for a while, when I have to decide which IDE/environment to start with, while it was quite straight forward for R : RStudio.

Author’s RStudio screen setup

What can we do in R ?

As Pipeline Integrity Engineer, the first project for you should be anything related to IP data, Yes,…IP as Intelligent Pigging, NOT Internet Protocol like they always said. If you are currently not using any proprietary Pipeline Integrity Management System (PIMS) software nor any client-base IP software then I guess you are using Excel. In R (and other languages like Python and Julia) an Excel spreadsheet like object called DataFrame.

In R, DataFrame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column, similar to excel spreadsheet. DataFrame in R coming as default native class under base R when in other language you need to use libraries, Pandas library in Python and DataFrames library in Julia.

The Beginning

IP Data in excel format can be easily uploaded into R as a DataFrame, it is best practices to do data clean up before process it further, you can do it in excel (while you are still new in R) or later in R, as a data language R have various tools to do this. I would also recommend to save your data in CSV file instead of excel file, as text based format file it is easier to edit, modify or reformat using various text editor tools in the speed of light.

Once you have IP data in R, you can start to explore various statistical, and visualization, as examples :

hist(IPData$WallLoss, 
main = 'Defects Depth Distribution',
xlab = '% Wall Loss'
)
Histogram of IP Data — Wall Loss (%)
plot(IPData$LogDistance,IPData$Orientation, 
main = 'Defects Distribution',
ylab='Defects Orientation',
xlab='Log Distance (m)',
pch=19, col=color)
legend('topright',c('Internal','External'),
col = c('red','blue'), pch=19)
grid()
Scatter Plot for Defect Distribution (Log Distance (m) vs Clock Orientation)

What Next ?

There are tonnes of resource for you to start to learn R, eBooks, tutorial video, bookdown, etc,…but …The only way to learn how to code is to start write a code.

After you familiar with syntax, then functions, then libraries, then you can start to do modeling, predictive analytics … so on…..up to Machine Learning implementation for IP Data.

--

--