Me: About Me
Things I Have Done: Linux Toys, Junk
Pictures and Stuff: Planet Horms, Photo Gallery
Sites I Maintain: Verge, Slarken, Ultra Monkey, Super Sparrow

Recent News

(That is news written by me, for other's news, please see Planet Horms)

Sculpture by the Sea

[A Sculpture by the Sea]

Today was a beatiful pre-summer day in Sydney and Chiz and I went to see Sculpture by the Sea, which is heald on the costal walk between Bondi and Tamarama Beaches.

photos

Sun, 19 Oct 2008 21:32:53 +1100


Perisher Ski Trip

[Tree in the Snow] Some photos from our ski trip to Perisher in August. more...

Mon, 13 Oct 2008 21:56:15 +1100


Struan and Genny's Wedding

[Struan and Genny]

Yesterday Chiz and I had the pleasure to gather together with Struan and Genny's friends and family for their wedding in the Royal Botanical Gardens in Sydney. It was a wet morning but the weather cleared just in time for the event. I took a few snaps of the occasion. more...

I also have some snaps of the Skydiving that took place in Picton two weeks ago as part of Struan's bucks party. more...

Sun, 05 Oct 2008 13:20:59 +1100


Xen Netfront/Netback Transmit Flow-Control

Flow-control is implemented from netfront all the way to the end of life for a packet, typically transmit in the destination interface.

Introduction

This post explains briefly how flow-control is implemented in netfront/netback. The paths examined are for the case where dom0 bridges traffic from netback to the destination interface. The same logic should also hold for packets from netback that are routed by dom0.

Netback/Netfont is used to provide networking to paravirtualised domain Us. Thus, much of this discussion is not applicable to fully-virtualised domains, also known as HVM domains.

The code reference is from the linux-2.6.18-xen tree on xenbits.xensource.com, around the time of the Xen 3.3 release.

This analysis is based upon the notion of domU sending a stream of packets as fast as it can. There is no congestion control, no packet filtering, and the CPU(s) of domU are fast enough to processes packets as fast as the physical interface can send them without dropping any packets. Furthermore there are no return packets. In short, the notion is of UDP packets being blindly sent by domU.

The Netfront/Netback Ring-Buffer

Packets are received by netback from netfront using a ring-buffer that is implemented as a grant table. Typically there are 256 slots in this buffer. And once it becomes full netfront can no longer send packets.

The relevant netfront code in the reference netfront implementation is in drivers/xen/netfront/netfront.c:network_start_xmit().

        if (!netfront_tx_slot_available(np))
		netif_stop_queue(dev);

For each packet transmitted by netfront to netback, one ring-buffer slot is used for header information and one additional slot is used per fragment. So in the common case of a packet with one fragment (that is, an unfragmented packet), two slots will be used.

Assuming that the number of packets to be sent is significantly greater than the number of slots in ring-buffer, the rate at which netfront can send packets will be the same as the rate at which netback frees slots in the ring-buffer. So the ring-buffer provides a mechanism for netback to enforce flow-control on netfront.

Strictly speaking the above logic applies to fragments not packets. But as it is common to have unfragmented packets it seems more comfortable to talk in terms of packets.

Packet processing in Netback

Once all fragments for a packet have been received by netback from netfront via the ring-buffer it packs them into an sk_buff, the structure that the Linux networking code uses to represent packets. This work is done by drivers/xen/netback/netback.c:net_tx_action().

At this point the slot used in the ring-buffer for header information is queued to be released for reused by netfront. This is done by calling drivers/xen/netback/netback.c:netif_idx_release(). The release queue is processed by calling net_tx_action_dealloc() the next time that net_tx_action() is called.

The ring-buffer slot associated with each fragment contains a page which in turn contains the data for the fragment. This page is associated with a fragment within the sk_bufs created for the packet. This work is done by drivers/xen/netback/netback.c:netbk_fill_frags() which is called by net_tx_action(). The critical line is:

	frag->page = virt_to_page(idx_to_kaddr(pending_idx));

Once all fragments have been added to the sk_buff it is passed to the Linux networking core using net/core/dev.c:netif_rx(). At this point the packet has been received by the netback device driver.

Usually netif_rx() will simply move the sk_buff onto a received queue and schedule processing of the queue if it is not scheduled already. In any case, sk_buff will end up being destroyed, either after bing transmitted or being dropped - for example because of packet filtering rules.

Notification of the Completion of Packet Processing

At this point a packet is in the Linux networking core in in the form of an sk_buff. The page that was associated with each fragment of each sk_buff in netback by netbk_fill_frags() have a particular property that is of relevance to flow-control. These pages are members of drivers/xen/netback/netback.c:mmap_pages[] which is set up in drivers/xen/netback/netback.c:netback_init(). Each member of mmap_pages[] is marked as foreign and has drivers/xen/netback/netback.c:netif_page_release() set as its destructor using include/linux/page-flags.h:SetPageForeign().

	SetPageForeign(page, netif_page_release);

When an sk_buff is freed, the destructor is called for each page that has both been associated with a fragment within the sk_buff and marked as foreign using SetPageForeign(). This is done in mm/page_alloc.c:free_hot_cold_page().

#ifdef CONFIG_XEN
	if (PageForeign(page)) {
		PageForeignDestructor(page);
		return;
	}
#endif

sk_buffs are freed using kfree_skb(), the call path from there to free_hot_cold_page() is:

    net/core/skbuf.c:kfree_skb()
⟶ net/core/skbuf.c:__kfree_skb()
 ⟶ net/core/skbuf.c:kfree_skbmem()
  ⟶ net/core/skbuf.c:skb_release_data()
   ⟶ mm/swap.c:put_page()
    ⟶ mm/swap.c:__page_cache_release()
     ⟶ mm/page_alloc.c:free_hot_page()
      ⟶ mm/page_alloc.c:free_hot_cold_page()

Thus when an sk_buff is freed, for any reason, netif_page_release() will be called. It simply calls netif_idx_release(), which as discussed above queues a ring-buffer slot up to be released for reuse by netfront. This freeing will typically take place in the physical NIC's driver, when the packet is transmitted on the physical interface.

Netfront to Destination Interface Flow-Control

As sk_buffs are generally freed as packets are transmitted on the physical interface, the rate at which slots are freed relates to the rate at which packets are sent. Thus the rate at which netfront can send packets is effectively the rate at which the destination interface, usually a physical ethernet interface, can sent packets, assuming the dom0 CPU can keep up.

The following diagram shows the usage cycle of ring-buffer slots. They are consumed by domU. Slots used for meta-data are added to the free list immediately by Netback. Slots used for fragments are added to the free list once the skb to which they are associated is freed. Slot buffers that are placed on the free list become available for reuse in the ring-buffer by DomU.

It is worth noting that if packets are dropped, for instance by packet filtering, before they make it to the destination device, then the ring-buffer slot will be freed. This should allow for faster transmission by netfilter. That is, packets that get dropped don't take as long to transmit and thus netback can transmit at a faster rate.

Thu, 25 Sep 2008 21:45:19 +1000


Just Practicing

[Helicopter above Pitt St]

The Army has been practicing running two helicopters up Pitt St. this afternoon. So far they have done about half a dozen laps, down from Circular Quay to about Town Hall where they do some maneuver that I can't quite see out of my window before looping back North over the Domain. They seem to take about a 15-30 minute rest between each lap. Not sure where they go during this time. Perhaps up to Palm Beach. Or perhaps they drop in for scones with the PM at Kirribilli House. In any case, I'm glad that they are keeping themselves busy and giving the office workers something to peer at our of their windows.

Mon, 18 Aug 2008 16:19:55 +1000