I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience
3+ years at Mu Sigma
2 years at OYO
1 year and counting at The Data Monk
I am an active trader and a logically sarcastic idiot :)
Follow Me
DataTable is a an extended version of DataFrame and has several additional features
than those provided by a DataFrame. Some of the advantages of using DataTable are as folllows
1) It is much faster at loading data compared to a dataframe.
2) Reading and writing files is much faster with data.table than with read.csv()
3) Inbuilt features like automatic indexing makes processing large data sets faster.
4) It is faster than the dplyr packages used for data manipulation.
A data.frame is a list of vectors of the same length, with a few extra attributes such as column names.
data.table:
1. Allow columns to be assigned or modified by reference.
2. Allow the data table’s to be quickly aggregated or summarized using keys.
3.Create a framework for syntax which is easier to read, leading to less repetition and fewer mistake
Data frames are lists of vectors of equal length while data tables (data.table) is an inheritance of data frames. Therefore data tables are data frames but data frames are not necessarily data tables. The data tables package and function were written to enhance the speed of indexing, ordered joins, assignment, grouping and listing columns etc.
Its objectives:
Allow columns to be assigned or modified by reference.
Allow data table’s to be quickly aggregated or summarized using keys.
Create a framework for syntax which is easier to read, leading to less repetition and fewer mistakes.
#creating a dummy data table
DT DT[Code == “C”, mean(Capacity), State]
I asked data table to filter the rows whose code is C. Then I asked it to calculate the mean capacity of the rows which have code C for every state separately.
Answers ( 3 )
DataTable is a an extended version of DataFrame and has several additional features
than those provided by a DataFrame. Some of the advantages of using DataTable are as folllows
1) It is much faster at loading data compared to a dataframe.
2) Reading and writing files is much faster with data.table than with read.csv()
3) Inbuilt features like automatic indexing makes processing large data sets faster.
4) It is faster than the dplyr packages used for data manipulation.
A data.frame is a list of vectors of the same length, with a few extra attributes such as column names.
data.table:
1. Allow columns to be assigned or modified by reference.
2. Allow the data table’s to be quickly aggregated or summarized using keys.
3.Create a framework for syntax which is easier to read, leading to less repetition and fewer mistake
Data frames are lists of vectors of equal length while data tables (data.table) is an inheritance of data frames. Therefore data tables are data frames but data frames are not necessarily data tables. The data tables package and function were written to enhance the speed of indexing, ordered joins, assignment, grouping and listing columns etc.
Its objectives:
Allow columns to be assigned or modified by reference.
Allow data table’s to be quickly aggregated or summarized using keys.
Create a framework for syntax which is easier to read, leading to less repetition and fewer mistakes.
#creating a dummy data table
DT DT[Code == “C”, mean(Capacity), State]
I asked data table to filter the rows whose code is C. Then I asked it to calculate the mean capacity of the rows which have code C for every state separately.