In our current peri-COVID world, we all now have far more experience than we could ever have imagined in remote working.
Our homes are now our offices; dress codes have become more relaxed; we can work somewhat more flexible hours to accommodate our personal lives.
This has all come at a cost, of course. The biggest, in my opinion, is the need for higher bandwidth and more reliable Internet connections to our homes. In many cases, Internet Service Providers (ISPs) have been hard-pressed to provide new pipes, and “last mile” service installations have lagged.
The Internet core network has similarly been stressed–in analysis done comparing pre- and peri-COVID data in several cities around the world, backbone data usage has gone up by as much as 40% year-to-year.
Much of this “need for speed” has been driven by widespread use of teleconferencing software. Zoom, Microsoft Teams, Skype, Chime and others are in constant use around the world. Even with clever bandwidth-saving measures, the massively increased use of teleconferencing has created what will probably remain with us post-COVID.
One of the contributors to the need for higher bandwidth in teleconferencing is the requirement to transmit timely and clear representations of speech in a digital format. Generally, audio is highly resistant to most compression technologies–it’s too full of unpredictable data patterns and, with noise added in, becomes even more of a problem.
A number of coder/decoder algorithms have been invented for the problem of transforming speech, in particular, to a digital form. Some are very clever, making use of models of speech generation to build compression models that are reasonably efficient of time and bandwidth. The models are made much more complex by the need to model a wide range of languages–many of which have substantial differences in their phonemes. Add in accents, speaking rate, and other variables and the models become extremely complex.
With the long history of language coder/decoder research, it would be easy to believe that there would be nothing new under the sun.
And that would be wrong.
Google has announced a new speech coding algorithm that appears to use much less bandwidth than existing algorithms, while preserving speech clarity and “normalness” better.
The new algorithm, named “Lyra”, is based on research done on new models for speech coding, generative models.
One of the major issues with using these generative models is their computational complexity. Google has offered a solution to that problem and the solution appears to offer better performance, at lower bandwidth, and with better apparent normalness to the sound quality.
The Google webpage announcing this news has examples of their algorithm in action compared to existing, widely used algorithms. The results are quite impressive.
What impacts will this have on teleconferencing? Google predicts that it will make teleconference possible over lower bandwidth connections, and provide an algorithm that can be incorporated into existing and new applications.
Google plans to continue work in this area, most importantly to provide implementations that can be accelerated through GPUs and TPUs.
Be sure to listen for more exciting developments in speech coding, no matter what algorithm you use….
Most recently OpenAI, a machine learning research organization, announced the availability of CLIP, a general-purpose vision system based on neural networks. CLIP outperforms many existing vision systems on many of the most difficult test datasets.
It’s been known for several years from work by brain researchers that there exist “multimodal neurons” in the human brain, capable of responding not just to a single stimulus (e.g., vision) but to a variety of sensory inputs (e.g., vision and sound) in an integrated manner. These multimodal neurons permit the human brain to categorize objects in the real world.
The first example found of these multimodal neurons was the “Halle Berry neuron“, found by a team of researchers in 2005 and which responds to pictures of the actress–including those that are somewhat distorted, such as caricatures–and even to typed letter sequences of her name.
Many more such neurons have been found since this seminal discovery.
The existence of multimodal neurons in artificial neural networks has been suspected for a while. Now, within the CLIP system, the existence of multimodal neurons has been demonstrated.
This evidence for the same structures in both the human brain and neural networks provides a powerful tool for better understanding how to understand the functioning of both, and how to better develop and train AI systems using neural networks.
The degree of abstraction found in the CLIP networks, while a powerful investigative tool, also exposes one of its weaknesses.
As a result of the multimodal sensory input nature of CLIP, it’s possible to fool the system by providing contradictory inputs.
For instance, providing the system a picture of a standard poodle results in correct identification of the object in a substantial percentage of cases. However, there appears to exist in CLIP a “finance neuron” that responds to pictures of piggy banks and “$” text characters. Forcing this neuron to fire by place “$” characters over the image of the poodle causes CLIP to identify the dog as a piggy bank with an even higher percentage of confidence.
This discovery leads to the understanding that a new attack vector exists in CLIP, and presumably other similar neural networks. It’s been called the “typographic attack”.
This appears to be more than an academic observation–the attack is simple enough to be done without special tools, and thus may appear easily “in the wild”.
As an example of this, the CLIP researchers showed the network a picture of an apple. CLIP easily identified the apple correctly, even going so far as to identify the type of the apple–a Granny Smith–with high probability.
Adding a handwritten note to the apple with the word “iPod” on it caused CLIP to identify the item as an iPod with an even higher probability.
The more serious issues here are easy to see: with the increased use of vision systems in the public sphere it would be very easy to fool such a system into making a biased categorization.
There’s certainly humor in being able to fool an AI vision system so easily, but the real lesson here is two-fold.
The identification of multimodal neurons in AI systems can be a powerful tool to understanding and improving their behavior.
With this power comes the need to understand and prevent the misuse of this power in ways that can seriously undermine the system’s accuracy.
With great power comes great responsibility, as Spiderman has said.
As IT professionals, we are all painfully aware of the need for high-quality security in the systems we work with and deliver.
We know that if a system containing sensitive user information, such as bank account numbers, is not properly protected we risk exposure of that data to hackers and the resultant financial losses.
Encryption of data in flight and at rest; database input sanitizing; array bounds checking; firewalls; intrusion detection systems. All these, and more, are familiar security standards that we daily apply to the systems we design, implement, and deploy. eCommerce websites; B2B communications networks; public service APIs. These are the systems to which we apply these best practices.
If we do not take due care, we risk the public’s confidence in the banking system, the services sector, and even the Internet itself.
Even the widespread issues that could result from breaches of these systems pales in comparison, I believe, to systems that are more pervasive and more directly impactful of our everyday lives.
Much of our modern world is dependent on the workings of its vast infrastructure. Roadways, power plants, airports, shipping ports–all of these are fundamental to our existence. Infrastructure security is such an important issue that the United States government has a agency dedicated to this issue: the Cybersecurity & Infrastructure Security Agency–CISC.
Here in the US we just had a reminder of how important this topic is.
Just yesterday there was an intrusion into a water treatment plant in Oldsmar, Florida in which the attacker attempted to raise the amount of sodium hydroxide by a factor 0f 100, raising it from pipe-protecting levels to an amount that is potentially harmful to humans.
The good news is that the change was noticed by an attentive administrator, who then reserved the change before it could take effect. The system in question has been taken offline until the intrusion is investigated and proper steps taken.
It’s unclear at this point whether the attacker was a bored teenager or a nation-state, or something in-between, but the effect would have been the same: danger to 15,000 people and a resulting lack of trust in the water delivery system.
As of the writing of this blog post there is little detail about how the hack was accomplished, though it appears that the hacker gained the use of credentials permitting remote access to the water treatment management system. From there, it was only a matter of the hacker poking around to find something of interest to “adjust”.
The Florida Governor has called this incident a “national security threat”, and in this case I don’t believe he is indulging in hyperbole.
CISC considers the US water supply one of the most critical infrastructure elements, and devotes an entire team of specialists to this topic.
What should we take as a lesson from this?
I believe this incident is a cogent example of how brittle our national infrastructure is to bad actors. Further, I believe that this incident makes abundantly clear that we need a renewed focus on updating, securing, and minimizing the attack surface of existing infrastructure control systems.
As IT professionals it is our responsibility to lend our expertise and unique viewpoint to inform our leaders in government and industry of the issues, their importance, and their potential solutions. To do so actively, and to do so regularly.
Over the last few years I’ve seen a number of articles on how, as IT professionals, we can work to build users’ trust in the systems we produce. Clearly this is important, as a system that is not trusted by its targeted users will not be used, or will be used in efficiently.
This seems an obvious topic of interest to IT professionals.
For instance, if customers of a bank do not trust that the mobile app allowing them to interact with their funds cannot be trusted to accurately complete requested actions, it won’t be used.
But there’s a flip side to this trust coin that is not often talked about or studied: how do we design systems that we can be sure will not be trusted by all-too-trusting humans when it is inappropriate or unsafe to do so.
We actually experience this in our everyday lives, often without thinking about what it really means.
One example: compared to Google Maps on my phone, I have lower trust in my car’s navigation system to get me to the destination by the quickest route. As an IT professional, I know that Google Maps has access to real-time traffic information that the built-in system does not, and so I will rely on it more if getting to my destination in a timely manner is important.
My wife, who is not in the IT business, has almost complete trust in the vehicle navigation system to get her where she wants to go without making serious mistakes.
In a case like this, it’s not really of monumental important which one of us can be accused of misplaced trust in a system. But there are cases where it’s very important.
For instance, current autonomous vehicles available to the general public are SAE level 3, which means they must be monitored by a human who is ready to intervene should it be necessary. If a Tesla computer cannot find the lane markings, it notifies the driver and hands over control.
But how many reports have we seen of Tesla drivers who treat the system as though it can take care of all situations, thereby making it safe for them to engage fully in other activities from which they cannot easily be interrupted?
One could say “there will always be stupid people” but this just sweeps the important problem under the rug: how do we design systems which install an appropriate level of trust in the user? Clearly the Tesla system in these cases, or the context of the system’s use, instilled too much trust on the part of the user.
Unsurprisingly the study found that a user’s opinion of the technology is the biggest determining factor in the user’s trust in the product. Surprisingly, the study also found that users who had either a positive or negative opinion of the technology tended to have higher levels of trust.
This makes something clear: if we are to design systems that are to be trusted appropriately, we must understand that the relationship between the user’s knowledge, mood, and opinion of the system is more complex than we might imagine. We need to take into account more than just a level of trust we can install through the system’s interaction with the human, but other confounding factors: age, gender, education. How to elicit and use this information in a manner that is not intrusive and doesn’t itself generate distrust is not currently clear–more study is needed.
As IT professionals, we must be aware that instilling a proper level of trust in the systems we build is important and focus on how to achieve that.
I have a fondness for watching documentaries about aviation disasters.
Now, before you judge me as someone with a psychological disorder–we all slow down when we see an accident on the highway, but planes crashing into each other or the ground?–let me explain why I watch these depressing films and what it has to do with IT work.
I should start by noting that, as a private pilot, I have a direct interest in why aviation accidents happen. Learning from others’ mistakes is an important part of staying safe up there.
But, then, there’s another reason I watch the documentaries that’s only recently become clear to me: seeing how mistakes are made in a domain where mistakes can kill can can be generalized to understand how some mistakes can be avoided in other domain where, while the results might be less catastrophic to human life, are still of high concern.
In my case, and likely in anyone’s case who is reading this, that’s the domain of IT work.
The most important fact I take away from the aviation disaster stories is that disasters are rarely the result of a single mistake but result from a chain of mistakes, any one of which if caught would have prevented the negative outcome.
Let me give an example one such case and see how we, as IT professionals, might learn from it.
On the night of July 1, 2002, two aircraft collided over Überlingen, Germany, resulting in the death of 71 people onboard the two aircraft.
The accident investigation that followed determined that the following chain of events led to the disaster:
The Air Traffic Controller in charge of the safety of both planes was overloaded as the result of the temporary departure of another controller in the center.
An optical collision warning system was out of service for maintenance but the controller had not been informed of this.
A phone system used by controllers to coordinate with other ATC centers had been taken down for service during his shift.
A change to the TCAS (Traffic Collision Avoidance Systems) on both aircraft that would have helped–and which was derived from a similar accidents months earlier–had not yet been implemented.
The training manuals for both airplanes provided confusing information about whether TCAS or the ATC’s instructions should take priority if they conflicted.
Another change to TCAS, which would have informed the controller of the conflict between their instructions and TCAS instructions was not yet deployed.
Many issues led to the disaster (which thankfully, have been resolved as of today)–but the important thing to note is that if any one of these issues had not arisen, the accident would likely not have happened.
That being true, what can we learn from this?
I would argue that, in each case, the “system” of air traffic control, airplane systems design, and crew training taken as individual items, each could have recognized that each issue could lead to a disaster and should have been dealt with in a timely manner. This is true even though each issue by itself could have been (and probably was) dismissed as being of little important by itself.
In other words, having a mindset that any single issue should be addressed as soon as possible without detailed analysis of how it could contribute to a negative outcome might have made all the difference here.
And here is where I think we can apply some lessons from this accident, and many others, to our work on IT projects.
We should always assume that if, absent evidence to the contrary, a single issue during a project could result in negative implications that are not immediately obvious, it should be addressed and remediated as soon as practicable.
The difficult part of implementing this advice clearly results from questioning whether a single issue could affect the entire project, and the cost of immediate remediation vs. its cost. There is not an easy answer to this–I tend to believe that unless there is a strong argument showing why a single event cannot become part of a failure chain, then it becomes something that should be fixed now. Alternatively if the cost of immediate remediation is seen as less than the cost of failure, then the issue can be safely put aside–but not ignored–for the time being.
To put this into perspective in our line of work:
Let’s imagine a system to be delivered that provides web-based consumer access to a catalog of items.
Let’s further imagine that the following are true:
The catalog data is loaded into the system database using a CSV export of data from another system of ancient vintage.
Some of the data imported goes into text fields.
Those text fields are directly used by the services layer.
Some of those text fields determine specific execution paths through the service layer code.
That service code assumes the execution paths can be completely specified at design time.
The UI layer is designed assuming that delivery of catalog data for display will be “browser safe”–i.e., no characters that will not display as intended.
This is a simple example, and over-constrained, but I think you can see where this is going.
If the source system has data, to be placed in the target system text fields. has characters that are not properly handled by the services layer and/or the UI layer, bad actions are likely to result.
For instance, some older systems permit the use of text documents produced in MSWord that promote raw single- and double-quote characters to “curly versions” and take the resulting Unicode data in raw form. Downstream this might result in failure within the service layer or improper display in the UI layer.
Most of us, as experienced IT professionals, would likely never let this happen. We would sanitize the data at some point in the process, and/or provide protections in the service/UI layers to prevent such data from producing unacceptable outcomes.
But, for a moment, I want you to think of this as less than an argument for “defense in depth” programming. I want you to think of it as taking each step of the process outlined above as a separate item without knowing how each builds to the ultimate, undesirable outcome, and deciding to mitigate it on the basis of the simple possibility that it might cause a problem.
For example, if the engineer responsible for coding the CSV import process says “the likelihood of having problems with bad data can be ignored or taken care of in the services layer”, my suggested answer would be “you cannot be sure of that, and if we cannot be sure it won’t happen, you need to code against it”.
And, I would give the same answer to the services layer engineer who says “the CSV process will deal with any such issues”. You need to code against it.
It may sound like I’m simply suggesting that “defensive coding” is a good idea–and it is. But–and perhaps the example given is too easy–I would argue that the general idea I am suggesting is that you need to have a mindset that removes each and every item in a possible failure chain without knowing, for certain, that it could be a problem.
This suggestion is not without its drawbacks, and I would encourage you to provide your thoughts, pro or con, in the comments section of this blog.
In the meantime, I’ll be over here watching another disaster documentary….
The advent of autonomous vehicles, and in particular for personal use, has already had measurable impacts on our societies.
The impacts can be seen in a number of areas:
New road construction, and updates to existing road construction, now take into account the need to provide supporting infrastructure for autonomous vehicles.
Most US states, and many countries, have put into place and continue to update standards, regulations, and programs to support the use of autonomous vehicles on public roads.
The work is in these areas is well-covered in a number of websites and posts: see the notes at the end of this blog for some links.
I am interested here in sharing some thoughts I’ve had that may not have been well-covered in the literature but which are of interest to me in terms of the positive and negative impacts these may have on infrastructure, culture, and how we handle the new legal issues that would arise.
Infrastructure should change
The current roadway infrastructure in most of the world is predicated on the behaviors expected of human drivers and attempt to minimize the opportunity for accidents while maximizing “throughput” of the system. This is best seen in the age-old problem of dealing with intersecting roadways, where traffic control of some kind must be instituted to avoid collisions and allow pedestrians to cross safely (where appropriate). This is usually accomplished by some combination of stop signs, yield signs, traffic signals, and traffic regulations controlling expected behavior at such intersections.
With such an existing infrastructure, massive when considered on a world-wide basis, and the fact that autonomous vehicles must mix with human-driven vehicles, it’s not surprising that most autonomous vehicles are programmed to live within existing infrastructures and rules.
For example, current autonomous vehicles that operate on surface streets are expected to recognize and properly respond to traffic signals and traffic flow control signage. They must, as there are human drivers that flow with them and abide by the same rules.
Further, as current autonomous vehicles are generally lower than class 5 (“full driving automation”–i.e., no steering wheel or manual brakes), until these are the majority (or all) of on-the-road vehicles, this is not likely to change.
But let’s imagine for a moment what could (and perhaps should) change if no vehicles other than class 5 are allowed on some, most, or all public roadways.
At this point, the restrictions that apply when there are non-autonomous vehicles in the mix may be completely eliminated or severely restricted in location or extent.
For instance, intersections would no longer need infrastructure-related flow control mechanisms–the vehicles themselves could contain this information. In-vehicle street maps would contain specifics on the details of an intersection (width, number of lanes, turn vs straight-ahead lanes, etc.) that would be needed to make stop/go/speed decisions. Additionally, with the inclusion of between-vehicle networking, the cars and trucks could decide dynamically who should have the right-of-way, at what speed the intersection should be traversed, and so on to maintain the optimal flow rate of the traffic.
Some interesting work has been done in this area by a team at Cornell University  in which they modeled the idea of having “platoons” of vehicles–tightly clustered cars that travel together–that are allowed to pass through an intersection while cross-traffic is held. This differs from human-piloted vehicles in that the cycle time of the stoplights can be faster to provide shorter hold times for the cross-traffic. Throughput increases of up to 138% are possible with this approach.
Let’s go a little further with this idea. What if the following were true?
Traffic lights were removed/deprecated at an intersection.
All autonomous vehicles approaching an intersection were in constant communication.
Common decision algorithms based on dynamic inputs (other vehicles, weather, desired flow rate, existence of approaching pedestrians, etc.) were used by all vehicles.
Platoon sizes as low as one vehicle were permitted.
I can foresee–though I have no direct proof through modeling, for instance–that even larger throughput increases could be achieved.
Certainly one obvious advantage to be achieved from such an approach would be the removal of the need to install and maintain traffic control mechanism–bulbs that need to be replace, signs that must be replaced after being damaged, etc. Infrastructure costs could be reduced or, at the least, be available for redirection to more meaningful efforts such as the maintenance of roadways themselves.
This approach is not without its downsides or costs, of course.
There must be industry-wide agreement on decision standards for the dynamic flow control of such vehicles.
There cannot be more than a few, if any, non-fully-autonomous vehicles on the roadway.
Legal issues around where responsibility would lie for any problems would have to be resolved and codified.
But, all in all, I would argue that conversion of the flow control systems in public roadways to an in-vehicle, shared system would permit maximal use of public roadways in the world in which human-operated vehicles are the exception rather than the rule.
 Many countries are establishing standards for roadway markings, for instance, to meet the needs of autonomous vehicles. These include reflectivity standards for paints, minimum widths for lane markers, etc. See this interesting article for some background.