Let us now introduce a pager. As we know a pager is responsilble for reading and writing the page to the disk and caching the pages for later use. The pager we will create is relatively simple and will recieve a filename to read and write to. It will have two methods
Let us start by setting some constants these are same as the ones we used in tables
COLUMN_USERNAME_SIZE = 32
COLUMN_EMAIL_SIZE = 255
ROW_FORMAT = f"<I{COLUMN_USERNAME_SIZE}s{COLUMN_EMAIL_SIZE}s"
ROW_SIZE = struct.calcsize(ROW_FORMAT)
PAGE_SIZE = 4096
TABLE_MAX_PAGES = 100
ROWS_PER_PAGE = PAGE_SIZE // ROW_SIZE
TABLE_MAX_ROWS = ROWS_PER_PAGE * TABLE_MAX_PAGES
Next step is to create a class Pager with certain attributes:
class Pager:
def __init__(self,filename):
self.filename=filename
self.file_descriptor=os.open(filename,os.O_RDWR | os.O_CREAT)
self.file_length=os.lseek(self.file_descriptor,0,os.SEEK_END)
self.pages=[None]*TABLE_MAX_PAGES
when the object is instantiated the filename passed is either opened in read_write format or is created.
<aside> 📖
Let us take a step back to understand the os methods , the why and how they differ from the with open methods.
Normally when we use the with open context meanager pyhton creates a TExtIOWrapper This wrapper does a bunch of things like:
UTF-8 or ASCII conversions, turning raw bytes into strings.\\r\\n vs \\n) automatically.By using os.open, we are using Low-Level I/O. We are dealing with File Descriptors (FDs)—simple integers that the Kernel uses to track open files.
bytes (binary) before saving.os.lseek allows you to treat the file like a giant array where you can jump to any index (byte offset) instantly.In a regular text editor, you read a file from start to finish. In a database, that is too slow. If you want "User #500," you don't want to read Users 1 through 499 first.
If you know each user record is exactly 100 bytes, you can calculate the exact location:$$Position = ID \times RecordSize$$
By calling os.lseek(fd, 50000, os.SEEK_SET), you tell the hard drive head to jump directly to byte 50,000. This is called Random Access, and it’s why databases are fast.
</aside>
Now the get_page function must return a page if the page_num exceeds the max pages a table can handle we close the program, otherwise we take the page out of the cache and return it. In case the page is not in cache, we create a byteArray of page_size. the we check how many pages the pages the file already stores. If the requested page_number is already written we write the page in memory after reading from disk, otherwise an empty page is written to the cache
def get_page(self,page_num):
if page_num >= TABLE_MAX_PAGES:
print(f"Tried to fetch page number out of bounds. {page_num} >= {TABLE_MAX_PAGES}")
sys.exit(1)
if self.pages[page_num] is None:
page=bytearray(PAGE_SIZE)
num_pages=self.file_length//PAGE_SIZE
if self.file_length%PAGE_SIZE != 0:
num_pages+=1
if page_num < num_pages:
os.lseek(self.file_descriptor,page_num*PAGE_SIZE,os.SEEK_SET)
bytes_read=os.read(self.file_descriptor,PAGE_SIZE)
page[:len(bytes_read)]=bytes_read
self.pages[page_num]=page
return self.pages[page_num]
Next we implement the pager flush function. It very simply moves the iterator to the the p[age’s postiton and write to the bytearray
def pager_flush(self,page_num,size):
if self.pages[page_num] is None:
print("Tried to flush null page")
sys.exit(1)
os.lseek(self.file_descriptor, page_num * PAGE_SIZE, os.SEEK_SET)
os.write(self.file_descriptor, self.pages[page_num][:size])
Now we must make certain changes to the table
class Table:
def __init__(self,pager,num_rows):
self.pager=pager
self.num_rows=num_rows
def row_slot(self,row_num):
page_num=row_num//ROWS_PER_PAGE
page=self.pager.get_page(page_num)
row_offset = row_num % ROWS_PER_PAGE
byte_offset = row_offset * ROW_SIZE
return page , byte_offset
Firstly we will now store pager as an attribute, the cache is now moved to the pager and so the page will be returned from the pager. rest pretty much remains teh same
We also need to create two methods that will make sense in a moment.
def db_open(filename):
pager=Pager(filename)
num_rows=pager.file_length // ROW_SIZE
return Table(pager,num_rows)
This function instantiates a pager and returns a table with that pager