openbsd-ext4/writeup/4.md

6.6 KiB
Raw Permalink Blame History

Even more

So at this point, I think it's important we look at the 64-bit mode and metadata checksum definition from the kernel docs. We can learn the following:

By default a filesystem can contain 2^32 blocks; if the 64bit feature is enabled, then a filesystem can have 2^64 blocks. The location of structures is stored in terms of the block number the structure lives in and not the absolute offset on disk.

But if we glance back at the superblock:

	u_int32_t  e2fs_bcount;		/* blocks count */

Its an unsigned 32-bit integer, meaning that the superblock is modified for 64-bit support as well. Which variables are modified exactly, we can't know until we look at the ext4 source code for Linux or FreeBSD.

Additionally,

Starting in early 2012, metadata checksums were added to all major ext4 and jbd2 data structures. The associated feature flag is metadata_csum. The desired checksum algorithm is indicated in the superblock, though as of October 2012 the only supported algorithm is crc32c. Some data structures did not have space to fit a full 32-bit checksum, so only the lower 16 bits are stored. Enabling the 64bit feature increases the data structure size so that full 32-bit checksums can be stored for many data structures.

They were added to "all major ext4 and jbd2 data structures." This could mean inodes, directories, probably the superblock, and the journal as implied. We have to implement support for generating this checksum from a seed; for that we need to look into the Linux source code, because it's likely there is a different algorithm we have to implement.

That being said, a quick grep -r "csum" . in OpenBSD's ext2 code shows us that there is no functionality implemented around checksum seeds at all, either; only in read only mode is it's existence acknowledged, and it's use hidden once more.

Since FreeBSD's port of ext4fuse is also under GPL, we can just directly look at the Linux code. Hopefully there are no legal issues around incorporating fragments for educational purposes, but if not we will relicense this writeup under the GPL.

If you haven't already, git clone --depth 1 https://github.com/torvalds/linux. Unless you want gigabytes of data downloaded directly, don't omit the --depth 1.

On line 1304 of https://github.com/torvalds/linux/blob/master/fs/ext4/ext4.h, we can look at the superblock source code.

/*00*/	__le32	s_inodes_count;		/* Inodes count */
	__le32	s_blocks_count_lo;	/* Blocks count */
	__le32	s_r_blocks_count_lo;	/* Reserved blocks count */
	__le32	s_free_blocks_count_lo;	/* Free blocks count */

Not coming to us as a surprise, these are the low order bits at the beginning. They get merged 🙃

Now if we look directly for the hi bits, we can probably get a better idea of where the other 64-bit stuff is implemented. Additionally, we would have to rewrite all the functions in OpenBSD src to use the s_blocks_count_lo and s_blocks_count_hi together.

	__le32	s_jnl_blocks[17];	/* Backup of the journal inode */
	/* 64bit support valid if EXT4_FEATURE_INCOMPAT_64BIT */
/*150*/	__le32	s_blocks_count_hi;	/* Blocks count */
	__le32	s_r_blocks_count_hi;	/* Reserved blocks count */
	__le32	s_free_blocks_count_hi;	/* Free blocks count */
	__le16	s_min_extra_isize;	/* All inodes have at least # bytes */
	__le16	s_want_extra_isize; 	/* New inodes should reserve # bytes */

Just to show where it is in context. The _hi ar ethe only ones that matter to us here; everything else is already in the OpenBSD src. Let's pull that up for reference:

	u_int32_t  e2fs_journal_backup[17];

	u_int32_t  e2fs_bcount_hi;	/* high bits of blocks count */
	u_int32_t  e2fs_rbcount_hi;	/* high bits of reserved blocks count */
	u_int32_t  e2fs_fbcount_hi;	/* high bits of free blocks count */
	u_int16_t  e2fs_min_extra_isize; /* all inodes have some bytes */
	u_int16_t  e2fs_want_extra_isize;/* inodes must reserve some bytes */

And turns out they were referenced here as well! So we technically already have 64-bit mode baked into the filesystem, we just thought they were accessing the high bits earlier for whatever reason. I guess now all that needs to be done is change up wherever we get results for grep -r e2fs_bcount . to check if the 64-bit flag is set, and then to add the high bits, right?

Well, the matter doesn't appear to be that simple.

➜  writeup git:(trunk) ✗ grep -r "s_blocks_count" .
./ext4/ext4.h:  __le32  s_blocks_count_lo;      /* Blocks count */
./ext4/ext4.h:/*150*/   __le32  s_blocks_count_hi;      /* Blocks count */
./ext4/ext4.h:  return ext4_read_incompat_64bit_val(es, s_blocks_count);
./ext4/ext4.h:  es->s_blocks_count_lo = cpu_to_le32((u32)blk);
./ext4/ext4.h:  es->s_blocks_count_hi = cpu_to_le32(blk >> 32);
./ext4/mballoc-test.c:  es->s_blocks_count_lo = cpu_to_le32(layout->blocks_per_group *
➜  writeup git:(trunk)

If anything, my shell here explains that s_blocks_count_XX aren't merged together in access requests. Something else is happening under the hood. However, ext4_read_incompat_64bit_val seems like an interesting function we should take a look at.

#define ext4_read_incompat_64bit_val(es, name) \
	(((es)->s_feature_incompat & cpu_to_le32(EXT4_FEATURE_INCOMPAT_64BIT) \
		? (ext4_fsblk_t)le32_to_cpu(es->name##_hi) << 32 : 0) | \
		le32_to_cpu(es->name##_lo))

It checks if the 64-bit flag is set, and if so it merges the high bits with the low bits. Precisely the kind of functionality we were looking for- it's just that there is no repeated code, it's just one macro. With that being said, let's try our grep again:

➜  writeup git:(trunk) ✗ grep -r "ext4_read_incompat_64bit_val" .
./ext4/ext4.h:#define ext4_read_incompat_64bit_val(es, name) \
./ext4/ext4.h:  return ext4_read_incompat_64bit_val(es, s_blocks_count);
./ext4/ext4.h:  return ext4_read_incompat_64bit_val(es, s_r_blocks_count);
./ext4/ext4.h:  return ext4_read_incompat_64bit_val(es, s_free_blocks_count);
➜  writeup git:(trunk)

The only 4 other uses in the ext4.h file. We don't have to dig too deep here, then.

static inline ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es)
{
	return ext4_read_incompat_64bit_val(es, s_blocks_count);
}

static inline ext4_fsblk_t ext4_r_blocks_count(struct ext4_super_block *es)
{
	return ext4_read_incompat_64bit_val(es, s_r_blocks_count);
}

static inline ext4_fsblk_t ext4_free_blocks_count(struct ext4_super_block *es)
{
	return ext4_read_incompat_64bit_val(es, s_free_blocks_count);
}

And they're all used right here 😂 We need to figure out how blocks are read (and written to) in that case. It's time to return to the OpenBSD source for a 32-bit explanation; let's see if we can adapt it to 64-bit.