openbsd-ext4/writeup/3.md

138 lines
6.3 KiB
Markdown

# Mount source code
Let's look at OpenBSD's mounting, as present in `ext2fs_vfsops.c`.
The first two function declarations prove quite useful:
```c
int ext2fs_sbupdate(struct ufsmount *, int);
static int e2fs_sbcheck(struct ext2fs *, int);
```
The first thing we are concerned with, of course, is the super block. With that being said, let's keep reading.
Below that, we have a struct called `ext2fs_vfsops` which just contains values that are defined as certain constants (e.g. `.vfs_mount = ext2fs_mount`), but we don't care about that. Then we have the inode pool, the `ext2gennumber`, and below that, we have the initializer of the inode pool, which probably needs to be tweaked for 64-bit as well. No wonder nobody has done it 😅
At this point, we should see where in the mounting process ext4 fails. Create an ext4 filesystem using qemu and mount it; make sure that the virtual machine manager marks it as a readonly filesystem before mounting it to ensure that it doesn't get corrupted (at least, not yet).
Finally, on line 106 we have the `ext2fs_mountroot` function. This is where the bulk of our analysis will start.
```c
struct m_ext2fs *fs;
struct mount *mp;
struct proc *p = curproc; /* XXX */
struct ufsmount *ump;
int error;
/*
* Get vnodes for swapdev and rootdev.
*/
if (bdevvp(swapdev, &swapdev_vp) || bdevvp(rootdev, &rootvp))
panic("ext2fs_mountroot: can't setup bdevvp's");
```
So here we have to figure out what exactly the bdevvp function is.
From the OpenBSD man page:
> bdevvp() will create a vnode for a block device, and is used for the root device and swap areas, among other things.
I guess our question starts at, what exactly is a virtual node? It's nothing more than "an abstract ayer on top of a more concrete filesystem." (Wikipedia).
Meaning nothing more than what we have to do in order for there to be no difference when we run `ls`, `cd`, `rm`, `mkdir`, etc. Abstractions are necessary.
```c
if ((error = ext2fs_mountfs(rootvp, mp, p)) != 0) {
vfs_unbusy(mp);
vfs_mount_free(mp);
vrele(rootvp);
return (error);
}
```
Looks like something that would case a critical error, causing you to get rid of everything else- it's inviting me to take a look! Let's dive into that function as well.
```c
int
ext2fs_mountfs(struct vnode *devvp, struct mount *mp, struct proc *p)
```
We have a virtual node (dev virtual pointer? what does that stand for?) a mount point, and proc which is probably something to do with the device driver.
```c
struct ufsmount *ump;
struct buf *bp;
struct ext2fs *fs;
dev_t dev;
int error, ronly;
struct ucred *cred;
dev = devvp->v_rdev;
cred = p ? p->p_ucred : NOCRED;
```
`*bp` appears to be a buffer pointer for the filesystem
Now this is interesting. We have a `dev_t` device, which stands for device virtual pointer. It's the first block on the filesystem.
Cred means credentials, so that's if a process has credentials for the filesystem.
Below we have comments that say that this is logic for multiple mounts. Let's skip it.
```
/*
* Read the superblock from disk.
*/
error = bread(devvp, (daddr_t)(SBOFF / DEV_BSIZE), SBSIZE, &bp);
if (error)
goto out;
fs = (struct ext2fs *)bp->b_data;
error = e2fs_sbcheck(fs, ronly);
if (error)
goto out;
ump = malloc(sizeof *ump, M_UFSMNT, M_WAITOK | M_ZERO);
ump->um_e2fs = malloc(sizeof(struct m_ext2fs), M_UFSMNT,
M_WAITOK | M_ZERO);
```
Alright, finally some logic we can break apart! `bread` from `buffercache(9)` reads a block into the buffer specified- in this case the superblock. Then we read the superblock in, check for errors, good. Then malloc is called with some fancy flags, hmmm... its to hold the superblock in memory, but what are those?
This is kernel malloc after all, so we would be looking at `malloc(9)`. After looking at the man page, I can conlude that these are just special flags to do with memory security; the code is ok with sleeping until the buffer is allocated, and it will be zeroed out. Makes sense. `um_e2fs` probably stands for ufs memory ext2 filesystem, so its just allocating the in memory resources for us to use.
I'm starting to think that this part of the code is a bit irrelevant to us. But we do have to eventually read the entire (or the majority) of this code to gain a better understanding, so let's keep going.
```c
/*
* Copy in the superblock, compute in-memory values
* and load group descriptors.
*/
e2fs_sbload(fs, &ump->um_e2fs->e2fs);
if ((error = e2fs_sbfill(devvp, ump->um_e2fs)) != 0)
goto out;
brelse(bp);
bp = NULL;
fs = &ump->um_e2fs->e2fs;
ump->um_e2fs->e2fs_ronly = ronly;
ump->um_fstype = UM_EXT2FS;
```
`e2fs_sbload` holds the bulk of our code here. Everything else is just closing the buffer and setting kernel flags, checking if it is clean, and so forth.
For the rest of the function, all we have is setting other EXT2 flags within the ufs moint point and setting other kernel parameters to work with the filesystem. With that, it's time to look at `e2fs_sbload`. This, alongside `e2fs_sbcheck` will probably be other functions we have to modify. We will also have to analyze the linux code for the same functions, although there we can do a skim and look for 64-bit mode. And for licensing issues, I don't believe that can be more thorough than one or two big quotes. We don't want a conflicting GPL license with the BSD license that we are already forced to implement for the OpenBSD source.
Turns out after some quick digging, `e2fs_sbload` is the macro we relied upon earlier that just transfers the bytes using `strcpy`. That's why we have to use `e2fs_sbcheck`; we have to validate the super block before we move it over.
```c
/* This is called before the superblock is copied. Watch out for endianity! */
static int
e2fs_sbcheck(struct ext2fs *fs, int ronly)
{
u_int32_t mask, tmp;
int i;
tmp = letoh16(fs->e2fs_magic);
if (tmp != E2FS_MAGIC) {
printf("ext2fs: wrong magic number 0x%x\n", tmp);
return (EIO); /* XXX needs translation */
}
```
The mask must again be the same meaning instead of what we traditionally think of in "bitmask"; its the way bits are ordered in the filesystem. Then we have `letoh16(3)` for the byte orderings, which converts 16-bits little endian to a number they can use to check for the magic number. The good news is that we can also use these functions and we don't have to worry about how the underlying logic works that well.