Categories
Uncategorized

Network Troubleshooting Primer

https://www.sangfor.com

It is the sine qua non of software developers that we have to be prepared to debug the programs we write. No program is perfect the first time, and being able to debug a program–one you’ve written or one you’ve inherited–is a necessary skill.

“Testing proves a programmer’s failure. Debugging is the programmer’s vindication”. Boris Beizer

There are many debugging tools and methods–IDEs often have breakpointing features and context information available that makes it possible to find code- or logic-specific bugs with a high level of efficiency. These tools and methods work well when the problem is within the program (intrinsic), but often fail to help when the problem is extrinsic.

There are many types of extrinsic bugs–database access, API access, etc–but a specific category of extrinsic bugs is often difficult for many programmers to troubleshoot: networking issues.

This is understandable. While on the surface networking seems simple enough, it can actually be quite complex under the hood. Programmers have enough on their plates to be experts in, and so networking–protocols, behaviors, and so on–are not their long suit.

In that context, I want to offer a few simple things that can help enormously in a programmer’s network troubleshooting–perhaps not allowing the programmer to directly solve the problem, but gaining enough information to direct those who can solve the problem–e.g., Enterprise IT–to find and fix it more quickly and easily.

Let’s use a simple example of a program encountering a network problem as the basis for discussing the tools and procedures to use in narrowing down its source.

Imagine you have a program that attempts to connect to a database that is on a local or remote network to read some data from a table. The program references the database endpoint by FQDN: say, “accounts.example.com” and provides a port number for the connection–say, 3306.

Running the program results in connection failure with “Error establishing a database connection”. There are other failures, of course, that will provide more information–e.g., “User authentication failure” that is much more instructive, but receiving this generic failure message is not uncommon.

What do we do now?

We could reach out to the Enterprise IT personnel and ask for help, but the uninformative nature of the error message is just as likely to baffle them as it does you. We need to be more informed before calling them.

So, what to do?

Let’s start by understanding how a network connection works.

In the example given here, we are attempting a TCP connection to a named host and port. Why might it have failed?

Domain Name Service

First: let’s check to make sure the hostname we used for the connection is translatable to an IP address–in other words, let’s check the DNS process that takes place as the first step of this connection attempt.

On the host on which the program is running we will need to do a DNS lookup on the hostname “accounts.example.com”. If we can obtain a shell on the host, great. We can use one of the DNS tools–nslookup, dig, etc.–to see if the hostname can be transformed to an IP address.

So, on a linux system, we might do:

dig accounts.example.com

If we get back a result such as this:

<!-- wp:code -->
<pre class="wp-block-code"><code>; &lt;&lt;>> DiG 9.16.30-RH &lt;&lt;>> accounts.example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 37258
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;example.com.			IN	A

;; ANSWER SECTION:
accounts.example.com.		165	IN	A	104.18.26.120
accounts.example.com.		165	IN	A	104.18.27.120

;; Query time: 12 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Sun Dec 21 14:04:40 CST 2025
</code></pre>
<!-- /wp:code -->

we know the hostname is valid and can be translated by the local DNS system.

If, instead, we get something like this:

; <<>> DiG 9.16.30-RH <<>> accounts.example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 8474
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;accounts.example.com.			IN	A

;; AUTHORITY SECTION:
com.			30	IN	SOA	a.gtld-servers.net. nstld.verisign-grs.com. 1766347559 1800 900 604800 900

;; Query time: 67 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Sun Dec 21 14:06:26 CST 2025
;; MSG SIZE  rcvd: 116

then the local network does not know a host by the name “accounts.example.com”. We have found a source of the error (at least the proximate source, as there may be others): we have the wrong hostname, the hostname is not registered with the DNS server, or we are using the wrong DNS server.

Providing the output of the DNS lookup to the Enterprise IT group should help them get right to the source of the problem, with a quick resolution.

If it is not possible to get to a shell on the host on which your program is running, you can do the equivalent of this DNS lookup programmatically. For instance, Python has a dnspython library, and examples of how to run a simple DNS query abound online.

There is second item to be noted here: even if the hostname is transformed to an IP address by the local DNS server, the IP address itself may not be correct–this would imply either an incorrect hostname or an incorrect entry in the DNS server tables. Either one can be investigated and fixed by the appropriate network engineer.

Connection issues

If the hostname is not our problem, what next?

Well, the first step a network connection has to take after resolving the hostname is to establish and end-to-end TCP connection to the specific host.

There are many things that could prevent such a connection: firewall rules, network permissions, and so on–but they all fall into a single category: establishing a connection from the local host to an endpoint.

Here is where another tool comes into play: ping.

Again, in a shell, if possible (and programatically, if not), we can a command:

ping <ipaddress>

If we get a successful response:

PING 23.46.216.147 (23.46.216.147) 56(84) bytes of data.
64 bytes from 23.46.216.147: icmp_seq=1 ttl=54 time=1.24 ms
64 bytes from 23.46.216.147: icmp_seq=2 ttl=54 time=1.23 ms
64 bytes from 23.46.216.147: icmp_seq=3 ttl=54 time=1.23 ms
64 bytes from 23.46.216.147: icmp_seq=4 ttl=54 time=1.21 ms
64 bytes from 23.46.216.147: icmp_seq=5 ttl=54 time=1.24 ms

we know the host is up, and a route to it is available.

Otherwise:

PING 1.2.3.4 (1.2.3.4) 56(84) bytes of data.
.
.
.

with no response means we have a connection problem.

There are many reasons that a connection is not possible, and getting to the bottom of those is a bit more complex, and beyond the scope of this primer, but at least you can tell Enterprise IT that you can’t reach the host, and they should be able to take it from there.

But let’s say you successfully pinged the host–what now? After all, our program still doesn’t connect.

This is where a powerful tool comes into play: Nmap.

Nmap is a network tool that has many features, but one simple usage of it can help determine further delineate the source of the connection problem.

Again, in a shell, if possible (and programatically, if not), we can a command:

nmap <ipaddress>

This form of the Nmap command probes each of the top 1000 most popular TCP/UDP ports and reports on whether it is “open” (ready to receive a connection attempt) or “closed”. Only the open ports are shown in this form of the command.

We might see an output like this:

Nmap scan report for <hostname> (<ipaddress>)
Host is up (0.052s latency).
Not shown: 995 filtered tcp ports (no-response)
PORT     STATE SERVICE
22/tcp   open  ssh
80/tcp   open  http
443/tcp  open  https
3000/tcp open  ppp
9000/tcp open  cslistener

Nmap done: 1 IP address (1 host up) scanned in 5.48 seconds

We now know that ports 22, 80, 443, 3000, and 9000 are open for connections. We also know that these ports are normally used for ssh, http, https, ppp, and clistener. (These programs may not actually being using the port for the intended purpose–all we care about is the port number is open.)

If, instead, we see something like this:

Nmap scan report for <hostname> (<ipaddress>)
Host is up (0.052s latency).
filtered tcp ports (no-response)
PORT     STATE SERVICE

Then we know no ports are open.

In our case, since we are trying to connect to port 3306, we now this port is not open and receiving connections. Why could this be?

The program using that port (MySQL usually) is not running on the remote host, or that TCP port is blocked by a network firewall rule. How do we tell the difference?

Here’s where the most powerful networking tool comes into play: Wireshark.

Wireshark (and its non-UI, terminal-only counterpart tshark) is a complex and heavyweight application so, depending on your tolerance for complexity and your ability to install the tool on the localhost, you may choose to forego this next step and leave it to the experts to figure out.

But, let’s assume you’re up for adventure. We will assume you’re using tshark for the following work–Wireshark is GUI-oriented, but has the same features.

Once tshark is installed, we can do a live packet capture process to see what’s going on when we are attempting, but failing to make the desire connection from our program.

We start tsharh and have listen for traffic to and from port 3306 on the remote host (and ignore other traffic so we don’t get a cluttered output) with the following command:

tshark -i <interface name> -f "tcp port 3306"

We should see the tshark program output something like this:

Running as user "root" and group "root". This could be dangerous.
Capturing on 'eth0'
 ** (tshark:89717) 14:47:31.330867 [Main MESSAGE] -- Capture started.
 ** (tshark:89717) 14:47:31.331040 [Main MESSAGE] -- File: "/tmp/wireshark_eth0JKA4H3.pcapng"

This is the indication that the program is running, capturing packets on the “eth0” interface, and looking only for packets coming from or going to TCP port 3306.

(Note: we are doing only simple filtering for this example–tshark can also do more complex filters, when can include an ip address, it can limit the total number of packets captured, etc. Typing “tshark –help” or reading the online documentation can provide more details.)

Now, we start our program and let it run and fail with the connection error.

If things are working properly, we would see something like this:

Capturing on 'Loopback: lo'
 ** (tshark:93028) 15:37:38.444422 [Main MESSAGE] -- Capture started.
 ** (tshark:93028) 15:37:38.444718 [Main MESSAGE] -- File: "/tmp/wireshark_lo4255H3.pcapng"
    1 0.000000000    127.0.0.1 → 127.0.0.1    MySQL 116 Request Query
    2 0.002171291    127.0.0.1 → 127.0.0.1    MySQL 80 Response  OK 
    3 0.002289567    127.0.0.1 → 127.0.0.1    TCP 66 40782 → 3306 [ACK] Seq=51 Ack=15 Win=512 Len=0 TSval=1301090850 TSecr=1301090849
    4 0.305042025    127.0.0.1 → 127.0.0.1    MySQL 148 Request Query
    5 0.309678250    127.0.0.1 → 127.0.0.1    MySQL 85 Response  OK 
    6 0.309760804    127.0.0.1 → 127.0.0.1    TCP 66 40770 → 3306 [ACK] Seq=83 Ack=20 Win=512 Len=0 TSval=1301091157 TSecr=1301091157
    7 1.528887608    127.0.0.1 → 127.0.0.1    MySQL 150 Request Query
    8 1.534546258    127.0.0.1 → 127.0.0.1    MySQL 85 Response  OK 
    9 1.534634645    127.0.0.1 → 127.0.0.1    TCP 66 40770 → 3306 [ACK] Seq=167 Ack=39 Win=512 Len=0 TSval=1301092382 TSecr=1301092382
^C   10 3.727212615    127.0.0.1 → 127.0.0.1    MySQL 151 Request Query
   11 3.731105058    127.0.0.1 → 127.0.0.1    MySQL 85 Response  OK 
   12 3.731149223    127.0.0.1 → 127.0.0.1    TCP 66 40770 → 3306 [ACK] Seq=252 Ack=58 Win=512 Len=0 TSval=1301094578 TSecr=1301094578

The details of the response above are a bit too complex for this discussion, but note that items 1 and 2 show we established a connect to the remote host, and item 3 shows that we had an SQL interaction with the host.

However we would now need to look into the actual SQL request sent to the server to see if something is wrong there. We could look into the captured packets or using debugging tools on the program itself to determine what it sent.

If, instead, we see this:

Capturing on 'Loopback: lo'
 ** (tshark:93028) 15:37:38.444422 [Main MESSAGE] -- Capture started.
 ** (tshark:93028) 15:37:38.444718 [Main MESSAGE] -- File: "/tmp/wireshark_lo4255H3.pcapng"

with no further output after running the program, then our program never established a connection with the remote host on port 3306.

This, then, would indicate some type of networking issue beyond the scope of this primer–or, general network debugging for that matter. Time to reach out to Enterprise IT with this info.

However, if we see this:

Capturing on 'Loopback: lo'
 ** (tshark:93028) 15:37:38.444422 [Main MESSAGE] -- Capture started.
 ** (tshark:93028) 15:37:38.444718 [Main MESSAGE] -- File: "/tmp/wireshark_lo4255H3.pcapng"
    1 0.000000000    127.0.0.1 → 127.0.0.1    MySQL 116 Request Query

with no acceptance of the connection request, we’re dealing with a problem on the remote server. The same is true if the packet capture shows a rejection of the request.

At this point–and at earlier points as we’ve seen–we can gather enough information about the network interaction of our program to determine the source of the problem, and provide details to the networking personnel who can track it down and fix it.

What can one do next?

Well, learning more about networking protocols and how to use Wireshark/tshark would make it possible to get even more detail on problems like this which should make it even easier to get tracked down and fixed.

Enjoy!

Categories
SogetiLabs Posted

Functional Fixedness

When the average person thinks of what an IT consultant does (as often are they are likely to think about it) they generally picture what they see in movies: a somewhat nerdy guy (it’s usually a guy, but Sandra Bullock did break that mold) who is sitting in front of multiple monitors, typing away in an attempt to solve a cliff-hanger problem that requires coding a complex algorithm.

It’s all about the coding.

Photo by Lukas: https://www.pexels.com/photo/person-encoding-in-laptop-574071/

Those of us who work in the profession see it a bit differently, of course. Coding is a small part of what we do.

There are the meetings, the discussions, the frustrating attempts to set up development environments–everything but actually putting keystrokes to screen to produce a thing of Pythonic beauty.

And there’s one other task that is the predecessor to actual coding, be it a functional UI, or business logic, or a REST interface.

Design.

The design phase is where the real value of a good developer shines. Almost anyone can learn a programming language. But putting that language to work solving problem or meeting a need? That’s design, and it’s a skill that is of immense value.

Design requires abstract thinking, the ability to take a concrete requirement–“put this logo on that web page only if the user logged in from a private account”–and translate it to an abstract representation–“logos will be held in a database table with this schema, indexed by corporate name, with a separate table with that schema with mappings from the user’s account type to a group number representing private/public accounts”.

The latter allows the developer to implement, in the chosen coding language, the bridge between what the code can do and what the end-result is intended to be.

None of this is a new idea to developers–we do it all the time, often without thinking about it.

Sometimes, however, we get stuck in the design phase.

We can’t quite figure out how to do the “mapping” to code within the restrictions of what we have to work with.

There is a concept in psychology called “functional fixedness” that is often the hindrance to this part of the design process.

Functional fixedness is a cognitive bias that limits a person to use an object only in the way it is traditionally used. […] Karl Duncker defined functional fixedness as being a mental block against using an object in a new way that is required to solve a problem.

Wikipedia

The standard example of functional fixedness is that of the “candle box”.

Photo by SEPpics: https://www.freeimages.com/photo/candle-light-1170871

A participant is given a candle, a box of thumbtacks, and a book of matches, and asked to attached the candle to the wall in a way that will prevent melted wax from dripping on the floor or table.

Most people who are given this task will try to use melted wax or thumbtacks to attach the candle to the wall. This may or may not work, and doesn’t really deal with the request to prevent the melted wax from dripping.

The “best” solution requires thinking outside the box–literally.

Most participants looked at the thumbtack box as just a container for the thumbtacks.

A successful solution required a participant to break through this limited view of the thumbtack box.

The more expansive view? Empty the thumbtacks from the box, and use the thumbtack box as a platform for the candle, held to the wall with a thumbtack.

The ability to overcome functional fixedness was contingent on having a flexible representation of the word box which allows students to see that the box can be used when attaching a candle to a wall.

Wikipedia

This process of “stepping back” from our preconceived notions of the definition (“a container…”) and uses (“…to hold the thumbtacks” ) to something more expansive (“it can hold something other than the thumbtacks and is rigid enough to hold a candle”) is important, and often very difficult.

There are ways in which functional fixedness can be overcome when in the design phase of a new IT system, for instance. One such approach I like to use is the “generic parts technique”.

In this approach, the designer begins by subdividing the components available for the solution. In the candle example, the designer would first define the thumbtack box component as “a box for holding thumbtacks”. Asking the question “does this definition imply a use?” the answer would be “yes”: this definition implies its use as a “thumbtack box”. Then, ask if this definition can be broken down into a new set of components, or modified to remove the usage implication.

In this case, it might be that “a box for holding thumbtacks” is transformed to “a box”, which can be used for “holding something“.

With that in mind, it’s a simple leap to that something being the candle.

In real life it’s not as simple as this. Individuals tend to get hung up at the functional fixedness stage far too easily. The solution: consider making the process a group process, with the context of the group interaction being “can we find new uses for the components that might help us solve the stated problem?”

This is only one way in which to break through functional fixedness–there are many others. A good source of information on this issue and methods for getting past it can be found here.

Enjoy your new-found freedom in solving design problems!

Categories
SogetiLabs Posted

GIGO as applied to AI

copyright James Cornehlsen

The concept of “GIGO”–Garbage In, Garbage Out–has been around almost as long as computer programming itself.

GIGO is the idea that, no matter how well written and definitive a computer program or algorithm is, if you feed it bad data the resulting output will be “bad”–i.e., have no useful meaning or, at worst, misleading meaning.

Nothing surprising here–as programmers we are well aware of this problem and often take great pains to protect an algorithm implementation against “Garbage In”.

It’s not possible to protect against all such cases, of course, human nature being what it is.

Which brings us to the story behind this blog posting: the improper use of Generative AI to “make decisions” in ways that are impactful in the most damaging ways.

The starting point for this story: the state of Iowa in the United States is one of several states that have recently passed laws aimed at protecting young students from exposure to “inappropriate” materials in the school setting.

Senate File 496 includes limitations on school and classroom library collections, requiring that every book available to students be “age appropriate” and free of any “descriptions or visual depictions of a sex act” according to Iowa Code 702.17.

The Gazette

The Gazette (a daily newspaper in Cedar Rapids, Iowa) has the story of a school district in its area that has chosen to use AI (Machine Learning) to determine which books may run afoul of this new law.

Their reasoning for using AI? “Assistant Superintendent of Curriculum and Instruction Bridgette Exman told The Gazette that it was “simply not feasible to read every book…”

Sounds reasonable, right?

Well, the school district chose to generate the list of proscribed books by “feeding it a list of proscribed books [provided from other sources]” and seeing if the resulting output list presented “any surprises” to a staff librarian.

See the problem here? As noted in a blog about the news story:

The district didn’t run every book through the process, only the “commonly challenged” ones; if the end result was a list of commonly challenged books and no books that aren’t commonly challenged, well, there you go.

Daily Kos

It appears that people who don’t understand how to use Machine Learning misused it–GIGO?–and now have a trained AI that they think will allow them to filter out inappropriate books without having a human read and judge them.

Regardless of whether or not any of the titles do or do not contain said content, ChatGPT’s varying responses highlight troubling deficiencies of accuracy, analysis, and consistency. A repeat inquiry regarding The Kite Runner, for example, gives contradictory answers. In one response, ChatGPT deems Khaled Hosseini’s novel to contain “little to no explicit sexual content.” Upon a separate follow-up, the [Large Language Model] affirms the book “does contain a description of a sexual assault.”

Popular Science

This misuse of AI/ML is not uncommon–we’ve seen cases where law enforcement has trained facial recognition programs in a way which creates serious racial bias, for instance.

We, as IT professionals, need to aware of and on the lookout for such misuses, as we are in the best position to spot such situations and understand how to avoid them.

Categories
SogetiLabs Posted

We Work Against the Universe

Crystal structure of hexagonal ice, Wikimedia Commans

There is a concept in physics called entropy.

The simple definition of entropy–the reality is much more complex–is the state of order (or disorder) of a system. It can also be described as the state of information embedded in a system: lower entropy means more information.

An example often use to explain entropy is that of system that starts as an ice cube. An ice cube is a highly ordered state of water–the individual water molecules are arrayed into a regular, repeating pattern of crystals. This pattern can be easily described. Each molecules is locked in placed with no freedom to move. It is highly ordered.

Apply heat to the ice cube. It melts. The individual molecules are now free to move in the resulting liquid–water–and so the overall pattern can no longer be easily described. It is more disordered.

The water, in going from solid to liquid, has increased its entropy. The application of heat has made this change possible

Of course, we can move in the opposite direction: we can remove heat from the liquid water to return it to its highly-ordered, low entropy state.

The universe, as a whole, moves from a state of low entropy to high entropy–stars are running down, galaxies collapsing.

(For an interesting sci-fi take on this concept, read “The Last Question” by Isaac Asimov.)

Only in small, local environments–the freezer compartment of your refrigerator, for instance–can the general trend towards increased entropy be reversed.

To summarize one version of entropy: a low entropy system contains more information than a high entropy system.

What does this have to do with Information Technologists like ourselves?

We are agents of entropy change.

Think about a content delivery system that we might be developing. Certainly there are many ways to describe the purpose of the system–to deliver data to the end-user, to allow new concepts to be generated, and the like.

All of those purposes can be summed up in one simple description:

We have created a system that permits a local decrease in entropy by adding and collecting information. We can create these systems to be used by anyone, anywhere in the world, to increase knowledge and thereby decrease entropy.

With great power comes great responsibility.
The idea—similar to the 1st century BC parable of the Sword of Damocles and the medieval principle of noblesse oblige—is that power cannot simply be enjoyed for its privileges alone but necessarily makes its holders morally responsible both for what they choose to do with it and for what they fail to do with it.

Wikipedia

We can work against the general trend of the universe. This is an amazing power to hold. We can use it–or allow it to be used–for good or evil purposes.

Let’s all chose wisely.

Categories
SogetiLabs Posted

Generative AI: a Warning

Attribute: Image by storyset on Freepik

Artificial intelligence has exploded upon the world in the form of the generative AI chatbot known as ChatGPT.

Only five days after its launch to the general public it had garnered one million users, far outpacing the update–at least by that metric–of any other program or social media system introduced since the dawn of the Internet.

And that amazing pace of uptake has not slowed. By the end of the second month after its introduction, it had shot up to 100 million users.

The astonishing rise of ChatGPT reveals both its usefulness in helping with a wide range of tasks and a general overflowing curiosity about human-like machines.

Time magazine Feb 2023

Others have spent much blog space on examining the why and how of the generative AI revolution that seems to be taking place. And much of that narrative extols the transcendent possibilities of the future of humanity in partnership with this new form of machine intelligence.

I want to take a somewhat different view here–one that is more admonitory and intentional in nature.

As is the case for any new technology we, as IT professionals, will be one of the cheerleading groups for generative AI use more widely in society–though it appears that little help is needed there.

It is also incumbent on us to serve the role of technology guardian on behalf of the society we inhabit. Most users of this new technology will not have the in-depth knowledge we have about the shortcomings of this new technology, and so cannot make fully informed judgements about its safe and proper use.

Some technology experts have warned of apocalyptic and even existential crises attendant upon the widespread use of ChatGPT and similar technologies. This is well and good–we need adverse voices to make us aware of potential problems to society.

I want to point out another pitfall that appears to await us as we rush to the use of generative AI: the fact that, in one way, generative AI seems to mimic humans all to well.

We, and they, are able to lie with sincerity and authenticity.

If we treated AI with the same sense of skepticism with which we treat other humans–who we are aware harbor the same darker impulses that we are capable of–this would not be a major issue.

But, interacting with AI, we seem to be more willing to suspend this skeptical viewpoint. This seems natural as we do not have the same belief in machines failing, and there are few non-verbal clues we can rely on to determine veracity.

This is made worse by the fact that ChatGPT’s goal is to mimic human behavior and language, and can do so with astonishing ease and rapidity.

So, we are led to consider a new “threat” from ChatGPT: that it can appear to provide definitive and truthful answers that can be taken at face value. And in some cases, those deceptive answers can do great harm.

One such case is where ChatGPT invented a sexual harassment scandal where none actually existed. And there are others.

Over the past couple of years, OpenAI and others have shown that AI algorithms trained on huge amounts of images or text can be capable of impressive feats. But because they mimic human-made images and text in a purely statistical way, rather than actually learning how the world works, such programs are also prone to making up facts and regurgitating hateful statements and biases—problems still present in ChatGPT.

Wired magazine Dec 7 2022

Does this mean that we need to call an immediate halt to the widespread use of ChatGPT as some groups have already done? For instance, Italy has already banned the use of ChatGPT. Legislation has been introduced in the US Congress to regulate its use (interestingly, the legislation itself was written by ChatGPT).

I think banning or severely restricting may be a step too far. Pausing may be a better step to take as we grapple with the downsides of this new technology.

Even that, however, may be seen as too much.

I would like to suggest another alternative: that we use our unique position as IT leaders and thinkers to cultivate in our clients, our friends, and ourselves a healthy sense of skepticism about the trustworthiness of this new tool.

Much like most of us already do with social media, we need to critically examine the claims that generative AI makes when interact with it. ChatGPT are the like are only as good as the people who train it and the material chosen for that training.

ChatGPI is not an infallible Oracle of Delphi. It’s a tool, trained by humans to interact with humans in a “human” manner.

With all the good and bad that implies.

Categories
SogetiLabs Posted

The Myth of “Lost Technology”

Attribution: Marsyas

As my wife and I were watching the coverage of the end of the first NASA Orion program Artemis capsule return, I mentioned to her that at the end of the 1970’s Apollo program I never imagined it would be half a century before we returned to the Moon.

After a pause she asked a question: “Did this program use any of the hardware of the original Apollo program?”

I was a bit taken aback–I often forget that people who are not space enthusiasts like me wouldn’t know such things–but told her that this was all new hardware, and that the original Apollo hardware and their designs were long gone.

Which reminded me of a trope that is common when it comes to the Apollo program–the myth of “Lost Technology”.

What is “Lost Technology”?

The definition often used by “The Lost Technology of XXX” TV programs is any process or product produced in the past that we no longer understand and do not have the original process to reproduce.

Now, in strict terms this may be true in a few cases. We do not know how Damascus Steel was produced in the Near East beginning in the 3rd century CE, a process that was no longer in use by the early 19th century CE. Does this mean this technology is lost?

Modern artisans have produced an equivalent to Damascus Steel, so while we do not know how the ancients produced it, we can make its replacement today using modern processes and materials.

Does this mean the production Damascus Steel is a “Lost Technology”? Yes, in the sense that we do not know how it was originally produced. No, in the sense that we can produce its equivalent today, but using different techniques.

I would argue that this definition of “Lost Technology” has little useful meaning. While there is certainly value in knowing how ancient civilizations accomplished a specific task or produced a specific product, the fact that we can use modern techniques to accomplish the same outcome says we never lost the ability to create the end-product.

What we did lose was the institutional knowledge that the technology used in its original form.

Every industry has something called institutional, or tribal knowledge. Knowledge crucial to the industry which is never written down, either because its so basic that it’s not worth writing down or because it’s not something that can easily be written down.

Michael B, Quora

(I would add to this definition that some knowledge was never written down to keep it secret–this seems to have been the case for Damascus Steel.)

This is what happened with the Apollo program processes and designs. While we still have many of the original designs in blueprint or document form, the institutional knowledge is almost completely gone–those who had it are no longer with us, and the few that are still around probably can no longer remember.

So, is the Apollo project technology “lost”? In a very narrow sense, yes. We can no longer produce a Saturn V rocket in the same form that is existed 50 years ago–we don’t have the skilled craftsmen who could, for instance, do the hand-drilling of the rocket engine injector baffle plates or hand-weld the propellant piping seams.

But this is where the definition of “Lost Technology” becomes meaningless.

Why, with the knowledge and processes advanced by 50 years, would we want to try to produce the same rocket engines in the same way it was done then? We can do far better with what we have learned since then, with the systems we now have.

Except for those–I am one–who would love to see that lovely old beast back in operation for one more flight, the fact that we can no longer produce it exactly as it was means little. Today, we can actually do better.

IT processes and products hardly seem old enough to fall prey to this “Lost Technology” syndrome, but computer technology changes much faster than the technologies of old.

And yet, we do see some of the effects of technology obsolescence that are close to producing “lost technologies”.

  • Quite a few institutions still rely on decades-old programs written in Cobol, a language no longer actively taught and for which few tools still exist.
  • The Defense Department’s Strategic Automated Command and Control System (DDSACCS), which is used to send and receive emergency action messages to US nuclear forces, runs on a 1970s IBM computing platform. It still uses 8in floppy disks to store data. “Replacement parts for the system are difficult to find because they are now obsolete.”
  • Whatever you may have, it’s no doubt more current than the system that air traffic controllers use to tell pilots about weather conditions at Paris’s Orly Airport: Windows 3.1. That’s not a typo – these flight-critical systems use an operating system that came out in 1992. When the machines went down in November 2015, planes were grounded while the airport had to find an IT guy who could deal with computers that ancient.
  • Sparkler Filters of Conroe, Texas, prides itself on being a leader in the world of chemical process filtration. If you buy an automatic nutsche filter from them, though, they’ll enter your transaction on a “computer” that dates from 1948. Sparkler’s IBM 402 is not a traditional computer, but an automated electromechanical tabulator that can be programmed (or more accurately, wired) to print out certain results based on values encoded into stacks of 80-column Hollerith-type punched cards.

All of these, of course, represent situations in which the product or system could be updated using more modern techniques, so they are not truly “lost”, except insofar as the original technologies are no longer in common use, and the users would be hard-pressed to make substantial changes or updates.

And therein, to me, lies the beauty of computer technology, its history, and its likely future.

We IT practitioners work in a world where nothing truly disappears or is lost. We keep old systems alive where appropriate, and we use the latest techniques to build new systems better than the old.

The myth of “Lost Technology” is just that–a myth.

Although I am glad that “lost” technologies are kept around in some form for us to see how far we’ve come, and to appreciate the amazing accomplishments of those who came before us.

Categories
SogetiLabs Posted

Long-distance Networking for IoT

In the early days of networking, copper was king.

First was Ethernet over coax–initially thicknet (10Base5) and then thinnet (10Base2). Both used a bus topology and both were limited in the distance over which they could be deployed–1,500 feet for thicknet and 600 feet for thinnet. Because of those limitations, these versions of Ethernet did not make deep inroads into the market.

In the late 1980s, the invention of a version of Ethernet carried over twisted-pair cabling, and which used a star topology, kicked off a land-rush to connect computers and devices together. Combined with the invention of network bridges, routers, and other devices allowing connection of local Ethernet networks to the burgeoning Internet, wired networking became the dominant model.

While not ideal for some applications, this wired model served the market well until the invention of WiFi in the late 1990s. (It’s interesting to note that radio-based networking predated even Ethernet. The Aloha radio network was launched in 1971 and actually provided the template for Ethernet protocol.)

WiFi met an emerging market need, driven by the desire to interconnect networkable devices in places where wiring was not possible or not cost-effective. Short distances–up to typically several hundreds of feet–could be bridged, allowing devices to be movable or in hard-to-reach places.

Aside from improvements in WiFi speeds and encryption mechanisms, little changed over the following years in terms of the distances over which WiFi could be used. Some systems were built using high-gain antennas and specialized receivers and transmitters that extended the range up to several miles, but these were costly and required modified protocol stacks to deal with error conditions unique to radio links.

The rise of IoT drove the need for a new, low-cost wireless network that could provide connectivity over the distances some IoT sensors required. Soil humidity sensors on farms; engine sensors on mobile machinery; monitoring systems on drones. Now the distances needing to be covered could be several miles. Combined with the need to keep power consumption low, existing WiFi systems were not up to the challenge.

For a while, cellular data systems filled the need, but those required high power budgets and were typically expensive.

And so, LoRa entered the scene.

LoRa (Low power, long range radio) is a protocol–and accompanying hardware–that provides networking capability that is exactly what is needed for the new IoT world.

LoRa provides a mechanism which allows the user to determine the desired tradeoff between power, distance, and data rate. Of course, these are not independent of one another, but within limits they can be reasonably determined. And LoRa is not without its own limitations–packet size is small, though for most IoT uses it suffices.

As an example, a LoRa system can be set up to provide a data rate in the range of hundreds to thousands of bits per second over distances of several kilometers with ease. Distances of 700 kilometers and more have been achieved in experimental systems; small satellites (cubesats) using LoRa easily communicate with simple ground stations on a daily basis. While the data rates may seem low, they are adequate for most remotely-positioned IoT devices.

And when not transmitting data the LoRa hardware can be shut down (as IoT sensors tend to be episodic in their data delivery) lowering power requirements to the range in which small batteries can power devices for months at a time.

LoRa is, at its most basic, a point-to-point network but the introduction of LoRaWAN standards, and the use of a gateway device, makes it possible to have widely distributed devices that can interconnected in much the same manner as provided by WiFi.

And LoRaWAN has taken off amazingly in the last few years. Estimates are that over 170 million IoT devices are connected using LoRaWAN in 100 countries. In 2016 the Netherlands became the first county to have a nation-wide LoRaWAN network. Other countries have quickly followed this trend.

For those, like me, who are hackers at heart, the availability of inexpensive LoRa hardware (in the range of $5 – $10) and open-source software across a range of inexpensive platforms is heaven.

In fact, a coalition of enthusiasts has used this intersection of open-source software and low-cost hardware to set up open LoRaWAN networks worldwide. The Things Network boasts more than 20,000 LoRaWAN gateways in operation in 151 countries, all available for any member of the public to use.

To see an example of what can be done with LoRa, check out Tiny GS, a group of enthusiasts that set up low-cost satellite ground stations and receive telemetry from cubesats. Info on what my ground station has received can be found by logging into the Tiny GS website, selecting “Stations” from the hamburger menu, and searching for “Fall”.

Learning about LoRa and LoRaWAN by implementing one yourself is a great introduction to this networking concept, and will prepare you for interactions and projects with our IoT-using clients.

Enjoy!

Categories
SogetiLabs Posted

Speech isn’t free, but it can cost less

In our current peri-COVID world, we all now have far more experience than we could ever have imagined in remote working.

Our homes are now our offices; dress codes have become more relaxed; we can work somewhat more flexible hours to accommodate our personal lives.

This has all come at a cost, of course. The biggest, in my opinion, is the need for higher bandwidth and more reliable Internet connections to our homes. In many cases, Internet Service Providers (ISPs) have been hard-pressed to provide new pipes, and “last mile” service installations have lagged.

The Internet core network has similarly been stressed–in analysis done comparing pre- and peri-COVID data in several cities around the world, backbone data usage has gone up by as much as 40% year-to-year.

Much of this “need for speed” has been driven by widespread use of teleconferencing software. Zoom, Microsoft Teams, Skype, Chime and others are in constant use around the world. Even with clever bandwidth-saving measures, the massively increased use of teleconferencing has created what will probably remain with us post-COVID.

One of the contributors to the need for higher bandwidth in teleconferencing is the requirement to transmit timely and clear representations of speech in a digital format. Generally, audio is highly resistant to most compression technologies–it’s too full of unpredictable data patterns and, with noise added in, becomes even more of a problem.

A number of coder/decoder algorithms have been invented for the problem of transforming speech, in particular, to a digital form. Some are very clever, making use of models of speech generation to build compression models that are reasonably efficient of time and bandwidth. The models are made much more complex by the need to model a wide range of languages–many of which have substantial differences in their phonemes. Add in accents, speaking rate, and other variables and the models become extremely complex.

With the long history of language coder/decoder research, it would be easy to believe that there would be nothing new under the sun.

And that would be wrong.

Google has announced a new speech coding algorithm that appears to use much less bandwidth than existing algorithms, while preserving speech clarity and “normalness” better.

The new algorithm, named “Lyra”, is based on research done on new models for speech coding, generative models.

These shortcomings have led to the development of a new generation of high-quality audio generative models that have revolutionized the field by being able to not only differentiate between signals, but also generate completely new ones.

One of the major issues with using these generative models is their computational complexity. Google has offered a solution to that problem and the solution appears to offer better performance, at lower bandwidth, and with better apparent normalness to the sound quality.

Lyra is currently designed to operate at 3kbps and listening tests show that Lyra outperforms any other codec at that bitrate and is compared favorably to Opus at 8kbps, thus achieving more than a 60% reduction in bandwidth. Lyra can be used wherever the bandwidth conditions are insufficient for higher-bitrates and existing low-bitrate codecs do not provide adequate quality.

The Google webpage announcing this news has examples of their algorithm in action compared to existing, widely used algorithms. The results are quite impressive.

What impacts will this have on teleconferencing? Google predicts that it will make teleconference possible over lower bandwidth connections, and provide an algorithm that can be incorporated into existing and new applications.

Google plans to continue work in this area, most importantly to provide implementations that can be accelerated through GPUs and TPUs.

Be sure to listen for more exciting developments in speech coding, no matter what algorithm you use….

Categories
SogetiLabs Posted

Apple’s New iPod? A New AI Weakness Revealed

Artificial Intelligence–AI–has come far since its first incarnation in 1956 as a theorem-proving program.

Most recently OpenAI, a machine learning research organization, announced the availability of CLIP, a general-purpose vision system based on neural networks. CLIP outperforms many existing vision systems on many of the most difficult test datasets.

[These datasets] stress tests the model’s robustness to not recognizing not just simple distortions or changes in lighting or pose, but also to complete abstraction and reconstruction—sketches, cartoons, and even statues of the objects.

https://openai.com/blog/multimodal-neurons/

It’s been known for several years from work by brain researchers that there exist “multimodal neurons” in the human brain, capable of responding not just to a single stimulus (e.g., vision) but to a variety of sensory inputs (e.g., vision and sound) in an integrated manner. These multimodal neurons permit the human brain to categorize objects in the real world.

The first example found of these multimodal neurons was the “Halle Berry neuron“, found by a team of researchers in 2005 and which responds to pictures of the actress–including those that are somewhat distorted, such as caricatures–and even to typed letter sequences of her name.

[P]ictures of Halle Berry activated a neuron in the right anterior hippocampus, as did a caricature of the actress, images of her in the lead role of the film Catwoman, and a letter sequence spelling her name.

Many more such neurons have been found since this seminal discovery.

The existence of multimodal neurons in artificial neural networks has been suspected for a while. Now, within the CLIP system, the existence of multimodal neurons has been demonstrated.

One such neuron, for example, is a “Spider-Man” neuron (bearing a remarkable resemblance to the “Halle Berry” neuron) that responds to an image of a spider, an image of the text “spider,” and the comic book character “Spider-Man” either in costume or illustrated.

OpenAI.org

This evidence for the same structures in both the human brain and neural networks provides a powerful tool for better understanding how to understand the functioning of both, and how to better develop and train AI systems using neural networks.

The degree of abstraction found in the CLIP networks, while a powerful investigative tool, also exposes one of its weaknesses.

CLIP’s multimodal neurons generalize across the literal and the iconic, which may be a double-edged sword.

OpenAI.org

As a result of the multimodal sensory input nature of CLIP, it’s possible to fool the system by providing contradictory inputs.

For instance, providing the system a picture of a standard poodle results in correct identification of the object in a substantial percentage of cases. However, there appears to exist in CLIP a “finance neuron” that responds to pictures of piggy banks and “$” text characters. Forcing this neuron to fire by place “$” characters over the image of the poodle causes CLIP to identify the dog as a piggy bank with an even higher percentage of confidence.

This discovery leads to the understanding that a new attack vector exists in CLIP, and presumably other similar neural networks. It’s been called the “typographic attack”.

This appears to be more than an academic observation–the attack is simple enough to be done without special tools, and thus may appear easily “in the wild”.

As an example of this, the CLIP researchers showed the network a picture of an apple. CLIP easily identified the apple correctly, even going so far as to identify the type of the apple–a Granny Smith–with high probability.

Adding a handwritten note to the apple with the word “iPod” on it caused CLIP to identify the item as an iPod with an even higher probability.

The more serious issues here are easy to see: with the increased use of vision systems in the public sphere it would be very easy to fool such a system into making a biased categorization.

There’s certainly humor in being able to fool an AI vision system so easily, but the real lesson here is two-fold.

  • The identification of multimodal neurons in AI systems can be a powerful tool to understanding and improving their behavior.
  • With this power comes the need to understand and prevent the misuse of this power in ways that can seriously undermine the system’s accuracy.

We believe that these tools of interpretability may aid practitioners [in] the ability to preempt potential problems, by discovering some of these associations and ambiguities ahead of time.

OpenAI.org

With great power comes great responsibility, as Spiderman has said.

Categories
SogetiLabs Posted

It’s in the Water-Poor Security has Real Life Consequences

As IT professionals, we are all painfully aware of the need for high-quality security in the systems we work with and deliver.

We know that if a system containing sensitive user information, such as bank account numbers, is not properly protected we risk exposure of that data to hackers and the resultant financial losses.

Encryption of data in flight and at rest; database input sanitizing; array bounds checking; firewalls; intrusion detection systems. All these, and more, are familiar security standards that we daily apply to the systems we design, implement, and deploy. eCommerce websites; B2B communications networks; public service APIs. These are the systems to which we apply these best practices.

If we do not take due care, we risk the public’s confidence in the banking system, the services sector, and even the Internet itself.

Even the widespread issues that could result from breaches of these systems pales in comparison, I believe, to systems that are more pervasive and more directly impactful of our everyday lives.

Much of our modern world is dependent on the workings of its vast infrastructure. Roadways, power plants, airports, shipping ports–all of these are fundamental to our existence. Infrastructure security is such an important issue that the United States government has a agency dedicated to this issue: the Cybersecurity & Infrastructure Security Agency–CISC.

Here in the US we just had a reminder of how important this topic is.

Just yesterday there was an intrusion into a water treatment plant in Oldsmar, Florida in which the attacker attempted to raise the amount of sodium hydroxide by a factor 0f 100, raising it from pipe-protecting levels to an amount that is potentially harmful to humans.

The good news is that the change was noticed by an attentive administrator, who then reserved the change before it could take effect. The system in question has been taken offline until the intrusion is investigated and proper steps taken.

It’s unclear at this point whether the attacker was a bored teenager or a nation-state, or something in-between, but the effect would have been the same: danger to 15,000 people and a resulting lack of trust in the water delivery system.

As of the writing of this blog post there is little detail about how the hack was accomplished, though it appears that the hacker gained the use of credentials permitting remote access to the water treatment management system. From there, it was only a matter of the hacker poking around to find something of interest to “adjust”.

The Florida Governor has called this incident a “national security threat”, and in this case I don’t believe he is indulging in hyperbole.

CISC considers the US water supply one of the most critical infrastructure elements, and devotes an entire team of specialists to this topic.

Safe drinking water is a prerequisite for protecting public health and all human activity. Properly treated wastewater is vital for preventing disease and protecting the environment. Thus, ensuring the supply of drinking water and wastewater treatment and service is essential to modern life and the Nation’s economy.

CISC website

What should we take as a lesson from this?

I believe this incident is a cogent example of how brittle our national infrastructure is to bad actors. Further, I believe that this incident makes abundantly clear that we need a renewed focus on updating, securing, and minimizing the attack surface of existing infrastructure control systems.

As IT professionals it is our responsibility to lend our expertise and unique viewpoint to inform our leaders in government and industry of the issues, their importance, and their potential solutions. To do so actively, and to do so regularly.

Computing professionals’ actions change the world. To act responsibly, they should reflect upon the wider impacts of their work, consistently supporting the public good.

ACM Code of Ethics and Professional Conduct, preamble