Rather than creating series or data frames structures from scratch, the most typical use of pandas is based on the loading of information from files or sources of information for further exploration
To read a file in python we use the open() function. This function has a single required argument that is the path to the file and has a single return, the file object
filepath = 'btc-market-price.csv'
with open(filepath,'r') as reader :
print(reader)
#Opens a file
Once the file is opened, we can read its content as the following :
filepath = 'btc-market-price.csv'
with open(filepath,'r') as reader :
for index, line in enumerate(reader.readlines()):
#Read just the first 10 lines
if (index < 10):
print(index, line)
Probably one of the most recurrent types of work for data analysis : public data sources, logs, historical information tables, exports from databases. The pandas library offers us functions to read and write files in multiple formats like CSV, JSON, XML and excels XLSX.
Method | Descriptor |
---|---|
Filepath | Path of file to be read |
sep | character(s) that are used as a field separator in the field |
header | index of row containing the names of the columns |
index_col | Index of the column or sequence of indexes that should be used as index of rows of the data |
names | Sequence containing the names of the columns (used together with header = None) |
skiprows | Number of rows or sequence of row indexes to ignore in the load |
na_values | Sequence of values that, if found in the file, should be treated as NaN |
dtype | Dictionary in which the keys will be column names and the values will be types of NumPy to which their content must be converted |
parse_dates | Flag that indicates if Python should try to parse data with a format similar to dates as dates. You can enter a list of column names that must be joined for the parsing as a date |