Wireshark is the proverbial hammer that makes all networking problems look like nails. Even if there's a more specific tool available there's a good chance I can swing Wireshark at the problem and figure it out.
It continues to blow my mind how many people there are in the world who consider themselves networking professionals but have never used or do not understand Wireshark. It is possibly the most important tool for actually understanding what's really happening in your network, without it you're effectively blind to so many things.
Just yesterday I used it to troubleshoot a weird behavior in a recently upgraded Asterisk/FreePBX system which would have probably taken me days to guess my way through without packet captures, but with them I was able to see clearly what was happening on the network and then track that back from there.
Congrats to everyone on the Wireshark team on 25 years of making network troubleshooting infinitely easier! I would 100% not be where I am today without it.
I can confirm, I didn't know about strace until this very moment. Looking at it, it basically only intercepts system calls? How often is that useful? What do people use it for?
It answers, or at least gives the definitive first clue behind a huge number of slow downs or apparent hangs, given how many of those are actually blocking resource waits or retry loops gone mad.
It's probably most useful to sysadmins working with binaries, or even if you do do have the source, it's usually a shorter path to the solution for any app/os interaction problem.
It's useful for certain classes of optimisation and tuning, because it will give timings and aggregate timings.
I'll use it for things as simple as "where is this program reading it's config files" - often useful when doco is poor and/or there are multiple config locations selected by conditional logic.
There's an "ltrace" as well, for share library tracing, although I've personally found that less useful - bugs that that shows are more likely to be code/logic problems rather that os/infrastructure interaction - which is to say, usually outside my job scope.
On commercial unix, the equivalent to strace is truss, and it's been around forever.
Like many, wirewhark, strace/truss are my go-to tools for a huge amount of troubleshooting.
Program shits itself randomly during execution, crashes, and doesn't tell you why. strace it. Oh, it turns out it's trying to execve() a binary that doesn't exist on the system, which wasn't documented as a dependency, so I didn't install it. Fixed.
Lots of little things like that. Why is this program acting slow at startup when it should be fast? Oh, because it's opening and timing out on a socket connection with an unusually long timeout. Et cetera...
The most fun I've had with strace was debugging a 3-process deadlock. An snmp daemon was blocked waiting for a cli child process to finish, the cli was waiting for a response to a message on a socket it had open with a routing protocol daemon, which was waiting for a response from the snmp daemon.
It is also a great way to figure out why programs without useful debug output die. Ie. after a program opens and reads a config file it doesn't like, it starts cleaning up and exits.
I recently fired it up to quickly check which headers a crosscompiler used on a specific compilation unit. strace, grep, sort, done. I also use it as first check if something seems to hang. Sometimes you can see lock files trying to be acquired or access to wrong paths.
I've used it on occasions to try and find why some app was erroring; typically the app would catch an exception (or ALL of them), and just die with something like "no."
For example if the open call fails with ENOENT, the file it's looking for doesn't exist, and strace will (also) tell you what file it's trying to open.
I use it constantly. I can't even imagine debugging some failures without strace. It's great for servers that don't log things but fail to load some config file (which can be debugged by inspecting the return codes of open() calls.
There is also ltrace, for library calls, although I find it less useful.
It lets you see what a program is doing, where doing means any “effects” of the program that touch the system. Want to see whatS happening to files? You can strace for certain operations to certain files. Weird shitting the bed involving IO? Strace will illuminate the problem
As well as all the other uses, it can be great for a quick way to see what files a program is trying to access. E.g. some undocumented binary where it's not clear where its config should be, strace will quickly show what it tries to access.
it's useful when everything else you've tried failed and you have no clue what is going on. it's extremely helpful in figuring out why a program hangs or crashes. it's a good tool to have in the toolbox
Do you have any favorite sources for really understanding Wireshark? I’m not a networking professional per se, but I’m network-adjacent and I’ve dabbled in Wireshark from time to time. I can see the power, but it’s also one of those tools that’s totally overwhelming when I first approach it unless I have a very small, very specific problem. Or is it one of those tools that you learn as you need it?
Unfortunately I can't really help there, I'm a "learn by doing" type of person who just jumps in the deep end and hopes he figures out how to swim.
Most of my learning was just "capture the problem happening, capture what happens when it works right if possible, open up the relevant RFCs, then try to understand what's different and why.
I work in the VoIP industry so I'm dealing with a lot of NAT problems (insert rant here about lazy ISPs that still haven't enabled IPv6 on their networks) and my main protocol (SIP) is heavily inspired by HTTP and as a result is more or less human readable plaintext, so it was a relatively easy learning curve to just have Wireshark open on one side of the screen and the relevant RFCs on the other side.
All I can really say is have a problem you want to solve and start from there.
> Just yesterday I used it to troubleshoot a weird behavior in a recently upgraded Asterisk/FreePBX system which would have probably taken me days to guess my way through without packet captures
Do you mind sharing with us what was the problem and how you solved it with packet captures, if you have time? A blog post would be very interesting too.
For something like this firefox bug [1], getting down to pcaps helps determine where the problem is. Client is spinning on a request and server doesn't know about it could be a server problem or a client problem or a network in the middle problem.
In this case, the problem was the client wasn't actually sending the request, and with a sizable request that's visible even without decoding the https; although to be totally clear on what was happening, decoding was needed.
I've also debugged issued in remote networks where iirc, connections were being reset by some equipment local to the user. Seq/ack sequencing showed the resets were in response to a specific client sent packet and the timestamps showed it was impossible for that to have come from anywhere but equipment near the user.
For this bug [2], it took a lot of luck and patience to get a good capture, but once I did, the immediate problem became obvious: the machine I controlled was getting an icmp needs frag but DF set at the same mtu it was already using, and responding by sending the whole sendqueue at once, packetized to the new MTU that was the same as the old one. There's actually three problems here: a) there's no reason for the other side to send this packet (I found this is an already fixed linux bug with forwarding and large receive offload, but no way to contact the administrator of that router), b) our side shouldn't resend the whole sendqueue when the mtu changes, c) if the mtu didn't change, then there's no need to take any action. We only fixed c, but that solved the major problem: these resends would trigger more resends and we'd have periods of unavailability as the network was really busy.
This is pretty common when looking at wireshark; unless you work somewhere with full control of all clients and servers and a very network aware developer team, you're going to find lots of non-optimal or semi-broken stuff, and you've got to ignore it and focus on the majorly broken bit.
I had to troubleshoot an issue where several network routers restarted in group without a cause but only when connected to the big wan. The problem was a network discovery software which when poorly configured, would send ssh connection attempts to the management interface and a bug on the specific firmware would crash the router.
I'm not much of a blogger, but here's the short version. If anyone happened to be on #freepbx yesterday morning they might already have seen this.
I had just upgraded and migrated one of my clients from an on premise FreePBX system that was a few years out of date and running on a repurposed desktop computer with a failing fan to a brand new instance running on a VPS. Everything was working fine with basic phone functionality, but their main ring group was taking a few seconds to stop ringing when answered. Calls would ring in to all phones effectively simultaneously as expected, but when someone answered the call certain phones kept ringing for almost four full seconds after that point.
In the past I had seen similar behaviors on AT&T DSL caused by their mandatory modem/router device having an anti-flood filter enabled by default which saw a bunch of nearly identical UDP packets hitting at once and dropped them after the first few. This site has cable internet through a dumb modem so I knew it wasn't that, but they had recently had their IT side taken over by a new company who put in a new firewall so that was a plausible answer.
Their IT however had been taken over from us so I wasn't about to go accusing them of getting it wrong without strong evidence. I'm also just that kind of person, I hate when someone blames me or my gear for problems we're not causing so I do my best to never be that guy either. I'll waste an extra few hours of mine any day of the week to be sure I'm not accusing someone else of getting it wrong without a reason.
I fired up sngrep on the server, waited for a call to come in, and saved all the SIP sessions that resulted. Download that file, load it up in Wireshark, and I see that while the INVITE messages to start ringing all went out more or less simultaneously (27 phones in ~5ms) the CANCEL messages that stop them from ringing once one answered were sent out sequentially, with the PBX waiting for the first one to respond and confirm it had stopped ringing before sending the next. Clearly this wasn't right, and it obviously wasn't a problem with the firewall either.
At that point I started looking at the Asterisk logs and saw that an AGI script was being run for each line that was ringing which wasn't there previously. That script was associated with a new FreePBX module for missed call notifications which was installed but unconfigured on the new server. It didn't indicate it was doing anything in the UI, but it sure seemed to be doing something in the logs.
I uninstalled that module and the next call all the CANCEL messages went out in ~5ms just like the INVITEs. I then filed a bug with FreePBX documenting what happened because I'm pretty sure it's not expected or desired for simply having that module installed to cause massive delays in ring groups.
---
In this case the packet captures demonstrated conclusively that the problem was on the server itself and not in the network. If the capture at the server had looked reasonable my next step would have been to have the IT vendor capture traffic on their firewall at the same time as I was capturing at the server so we could compare and see if it's getting messed with along the way, but here it was not necessary.
Like toast0 mentioned, captures help you narrow down where the problem is.
Forget something as specific and hardcore as networking. I had a week to build a nodejs poc of a legacy spring/java app/service in Amazon that was doing a bunch of service to service auth with some Tibco messaging. I couldn't find any open implementations of Tibco clients (around 2013) and the frugality leadership principle meant getting an official spec would be almost impossible. I just needed a few details of the packet structure on a couple of requests. You can guess which tool saved the day for me here! Principle Eng at the time was surprised such a tool existed!
Back in 1983, when Ethernet was still fat coax and vampire taps, I was working at a military contractor in Silicon Valley. We had built an Ethernet bridge product that linked the DECnet LANs at DSCS (Defense Satellite Communications System) ground stations around the world (over 9600 bps encrypted circuits).
As part of the code, I wrote a packet dumper that put the Ethernet card (a Multibus card from a company called Exelan) into promiscuous mode. It didn't have a dissector like Wireshark, but just being able to dump raw packets in hex to a terminal was a huge advantage for debugging networks.
I love Wireshark and it's one of the first things I install on a new system.
How did Ethernet go from fat cables and vampire taps to RJ35? Is it still exactly the same protocol (is that even the right word?) as it was back then?
> How did Ethernet go from fat cables and vampire taps to RJ35?
Layers. The physical signaling is vastly different but the content that rides on it can remain the same. If you study the OSI model (https://en.wikipedia.org/wiki/OSI_model) you will know more about it than me.
I don't know how faithfully modern (or ancient) Ethernet follows this model - it might predate this work. Some layers might be blended for the sake of efficiency, but there are definitely layers.
There are a lot of L1 layers that have been used under Ethernet. There were several revisions of the coaxial cable used for ethernet before we switched to the rj45 terminated twisted copper cable everyone thinks of when you say "ethernet". Again, there are several iterations for the twisted copper physical layer.
More recently, we also run ethernet over optical fibers. The varience within the fiber family of cables is probably greater than the variance in either copper or coax.
At 10mbps half duplex, the protocol is nearly the same. Just twisted pair uses differential pair signalling and and coax uses a shared ground and high or low on the center conductor relative to the shield (IIRC). And twisted pair relies on a hub to create a bus. From there you go to 10M/full duplex where the rx and tx pairs are fully separated so collision detection can be disabled.
100base-tx increased the symbol rate, added speed and duplex negotiation (layered into the existing link pulse signaling), but otherwise kept things the same; you can even run a 100base-tx hub.
1000Base-T is a wide departure at the signalling level; all 4 pairs are used simultaneously, bidirectionally, the symbol rate is the same as 100base-tx, but each symbol carries more bits. But the ethernet frames are pretty much the same. (Larger frames started appearing around the same time as gigE, as I recall, but that might not be accurate)
> How did Ethernet go from fat cables and vampire taps to RJ35?
At least, at university: students like me that got hired cheaply and rewired everything. :)
That was not a fun summer, but I learned a lot.
> Is it still exactly the same protocol (is that even the right word?) as it was back then?
I would be surprised, given that coax is equivalent to 3 conductors, and catX cables have 8. And that's before we get into fibre. I would expect they have sime high-level protocol (frames etc.) that gets mapped onto the physical signaling, but I don't know much about that (resource suggestions welcome!).
I do know that going from a broadcast medium to switched point-to-point is a lot more efficient etc.
Plus the taps were notoriously unreliable (variable connection quality). And would cause reflections in the cable as well, which is fun.
Yes it is the same protocol. (Minor revisions change details, jumbo frames etc)
The cables form the physical connection, on that Ethernet defines a way to determine who may send a message (essentially anybody can send while quiet, and if a conflict is detected everybody retries after a random time)
The big thing which changed is that we are often using switched networks, instead of all nodes attaching to the same cable, but that's a change in a higher layer.
Ethernet Designers where smart not tontine the spec to properties of a specific material for transport, but abstract ether where signals travel.
Every once in awhile you'll hear the odd story about someone tracking down a bottleneck in their network and finding an old 10 or 100mbps ethernet link somewhere. I doubt it happens much anymore, but your 10 gigabit gear should still be able to talk to your 10 megabit gear no problem, which I do find impressive.
I've done it deliberately. I had to test a cellular device from somewhere in Asia, but I'm in the US. The Asian provider had sent us a femtocell with developer's firmware that bypassed the GPS check at startup, which would create a little bubble of their coverage in our RF test chamber, and we could put the DUT in the same chamber and do the testing.
Trouble is, the femtocell wanted a network connection, and our RF chamber didn't have an RJ45 passthrough. Some emails got sent, the chamber vendor could sell us a new passthrough module but it was on backorder, ETA two months or something.
So the following evening, I swung by the e-waste recycler where I used to volunteer years prior, which meant I could just give the proprietor a wave and then let myself into the back room and pick the pile. And sure enough, I found a couple of 8-port 10base-T ethernet hubs, with 10base-2 connections on the back for connection to a coax segment. I talked him up to twenty bucks so I'd have an expense to submit; the company did not deserve to get this for free.
Back in the RF lab the following day, it was a trivial matter to convert the BNC connector on the hubs to the N connector in the chamber wall, locate one of the hubs inside the chamber, and connect the femtocell to it. The one outside got the internet connection, which had been running at gigabit speeds but now found itself negotiating at 10/half! (I wonder if the campus networking folks get alerts when that happens. Because it's almost surely not what's intended, unless I'm around.)
The younger techs in the lab mere MYSTIFIED at this exotic hardware that could send Ethernet signals over coaxial cable! That must be expensive! How did you come up with it so fast! Whoever made that must've had this application in mind, but what a niche application! Amazing!
Thanks for all the troubles shot with Wireshark over the years!
Sadly, I've largely stopped using it because it appears to be unable to keep up with the data rates typically seen from servers these days. I believe the analysis is single-threaded, and doesn't seem to cache anything either. It struggles with captures "mere" gigabytes in size, which is just seconds on a 10 Gbps link.
Thank you. It has been an indispensable part of my work from the very beginning. Tcpdump was fine, but being able to right on a packet and do "follow TCP stream" then see the entire conversation in a second was a game changer. Same with the "right click->filter out this stream".
Also the fact ethereal/wire shark could read files saved by Tcpdump meant I could ssh onto a remote server, fire Tcpdump, run wire shark in a client and when something failed I was able to look at the network stream "from both ends". It saved me hours and hours, from dodgy ISP Nat being evident at first glance, to misconfigured MPLS networks being provable (no more the routing team could just say : it looks good for us). No, there was proof... I bet countless people continue having the same experience with this software :-)
However, I have to correct one statement made in the article. Ethereal wasn't the first free gui network packet analyzer. There was a Microsoft tool I forgot the name of that was available even in Windows NT days, perhaps "netmon"? It was a long time ago. It was free and it predates ethereal. It only worked on Windows and it used it's own file format.
>There was a Microsoft tool I forgot the name of that was available even in Windows NT days, perhaps "netmon"?
Network Monitor, also called netmon (or Bloodhound internally), which actually had a documented (maybe unsupported IIRC, but still easy to tap into) API. I wrote a tcpdump wrapper around it, before Ethereal was a thing. The API, and hence netmon, became invalid with the "next-gen" TCP stack of Longhorn/Vista.
Eventually, MSNA (Microsoft Network Analyzer) came along, which worked on ETW and was able to analyze network and other ETW traces. You could write handlers for any protocol in a supported DSL. You could even make it parse log files and filter/analyze the data.
The New Microsoft being what they are, they killed MSNA because it was too powerful and useful to Windows developers. It probably wasn't used by a lot of people, but if you knew how to use it it was one of the most powerful analysis tools of its time.
Edit: Microsoft Message Analyzer, not Network Analyzer.
I believe it was Message Analyzer, and what was super cool was its ability to correlate ETW stuff. So you could literally see the interplay between... say... a webserver log, an OS level NIC driver log, and a network capture.
I still don't get why MS stopped its public distribution, although I do know it was pretty buggy as released...
And yeah, netmon is great. I still use it when I want to filer Windows captures on PID, since Wireshark won't do that. (Even though netsh or pktmon -- built in Windows tools for recording captures -- have it in the header...)
Wireshark is like a multimeter in electronics world.... the whole world can run without having one, but once something fails, without it, you're fucked.
Its even better once you learn how to run tcpdump session over ssh dumped into a pipe, and then use wireshark locally via the pipe to get a nice gui for the remote capture.
This tool (Ethereal at the time) was absolutely invaluable to my job as Senior Tech Support of Weblogic family of products. I even got clients to run it and was able to provide solutions like "your large JDBC connection pool had all its connection silently dropped by a network firewall (that client was not aware of) and that's why you having 1 hour transaction delay on first one in a morning. Every pooled connection had to timeout and reset". And "Internet Explorer would abort a TCP connection for already-cached resource and that generates non-standard network level errors on your IBM server' Weblogic installation"
I lost half of my hair on that job. Without Ethereal, I am sure I would have lost all of it and a lot more of my sanity too.
I used wireshark every day for 10+ years supporting load balancers in customer networks. Between pcaps and core dumps, it was some of the most interesting data to work with. Learning libpcap and eventually writing my own version enabled me to pivot out of tech support for the product into development. I joined as a support engineer and left as a principal software engineer writing the code I was previously supporting. Wireshark and gdb let me teach myself so much, I never had to go to college.
Such a great tool and 100% free. Use it often to debug network issues and see where devices connect to. Like someone else said: a multimeter for networks.
Also used it to learn about WiFi connection setup with acces point. Can see all the beacon packets and WiFi packets
I was a very early adopter of Ethereal. My team had a Sniffer PC, but either it was being used by someone else or it didn't adequately decode protocols.
Absolutely terrific software. One good memory is being stuck at a client site (20 years ago) trying to figure out an interop issue with our network equipment. In the two weeks I was there I found it helpful to write a protocol decoder plugin and it was easy work. In the end it was our bug, a bitmask applied for select() was not removed when the implementation changed to epoll() ... in essence a 1-bit memory corruption error that could have very delayed consequences. Funny what memories stand out.
I admire your ability to find joy in these things.
At the beginning of my career, I once spent a week in a secure facility trying to understand an annoying network bug using tcpdump because we weren’t allowed to install wireshark. The whole thing turned out to be a combination of the worst bug I have ever seen in a standard library in our decade old version of GNAT (Ada lib - admittedly it had been corrected seven years before) and an ARP misconfiguration.
The whole week was awful and largely responsible for me moving on to greener pastures. It takes a special kind of character to enjoy these things.
Gosh, I feel old. I remember when Ethereal was released and it got me excited. I've sure learned a ton since then thanks to it, and solved a /lot/ of problems. It really changed the world of network traffic analysis, moving network captures away from the world of special laptops and tools (needing to ask the network team to schedule and do a capture) to something that any competent tech could grab.
It's not something I use often, but the value of it to me is that I can "hook it up" and immediately see what's going on. The generally intuitive interface and the way it decomposes packets make it so easy to pick up and use.
25 years of effort has produced a really useful tool.
Both for learning about networks and in my work as a system and network admin, I found Wireshark (and Ethereal before that) one of the most useful tools around. I once diagnosed a networking problem a friend had by getting him to install it on his laptop and record a packet dump of his traffic, then send it to me via email.
As a network engineer I've used Wireshark weekly for most of my career, but not as much anymore as we moved to the cloud. The ISP I worked for paid for a Wireshark training [0], though I didn't learn anything new we did set up a profile that helps a lot with troubleshooting, still use it 8 years later.
Wanted to learn Go so recently started working on a CLI packet capture tool like tcpdump that parses packets received on a raw socket. Got support for ethernet, ipv4, icmp, arp and udp so far.
A few folks and I built one in C++ for our capstone project at Stevens Tech in 2009. Frontend was GTK. It was much slower than Wireshark, but I was surprised by how easy it was to parse the packets (for normal packets speaking the usual protocols anyway)
It's a fun way to learn networking or a language as you have to do low level parsing and you have to deal with things like endianness of larger header fields. It does get tedious to write types and parsers for each protocol and you're not learning anything new after doing a few of them so I started using Chatgpt to generate code and tests for me which work surprisingly well (also paid for copilot but didn't find it very useful so far).
Also fun debugging VoIP traffic to the local network, and seeing dns-queries from collegues PCs on the network, then asking them: Are you visiting xyz-webpage, and seeing their reaction!?
It continues to blow my mind how many people there are in the world who consider themselves networking professionals but have never used or do not understand Wireshark. It is possibly the most important tool for actually understanding what's really happening in your network, without it you're effectively blind to so many things.
Just yesterday I used it to troubleshoot a weird behavior in a recently upgraded Asterisk/FreePBX system which would have probably taken me days to guess my way through without packet captures, but with them I was able to see clearly what was happening on the network and then track that back from there.
Congrats to everyone on the Wireshark team on 25 years of making network troubleshooting infinitely easier! I would 100% not be where I am today without it.