Software Engineering for Data Scientists: Getting Better
Is there any other way to improve? Yes, there is content on design itself. But to learn best, you must actively apply this content to an actual problem. Many design principles, methods, and patterns won’t make sense initially and go stale quickly.
As said, design is about human factors. So it can be stated cognitively (e.g., written in a book) but often must be experienced to be genuinely understood. You can’t read the spec for a touch screen in a world of mice and keyboards and get how intuitive it feels. Likewise, you won’t really understand a design principle or pattern until you can apply it. You won’t be able to use anything in outlining until you get good enough at the practice itself. Then, and only then, will you feel comfortable adding one new idea at a time.
Let’s assume you’ve been outlining for a few weeks or months. While you’re still seeing things you can improve, you’re starting to feel comfortable; your rate of change has slowed. What content should you read as “food for thought” on how to continue to improve?
It’s easiest to start with the bad because bad is usually easy to spot. You may even be doing bad things and not really understand the negative consequences; thus, you don’t know what’s holding you back.
I’d start with one idea at a time. Some patterns are more challenging to spot than others. So try starting with patterns that seem apparent.
Code smells are going to introduce you to a host of refactorings. Now, we’ve used that word somewhat loosely – basically, changing the design of code without changing its functionality. However, there are more formal sets of refactorings: known, named common changes to code.
Some of these are automatable in an IDE, and some are not. All can be useful at different times.
Looking at refactoring as a set of repeatable steps to take on code is itself a layer of abstraction to your design work. You no longer think of coding as merely typing on a keyboard. Instead, you now think of coding as repeatable patterns and transformations on those patterns. Unfortunately, we usually would try and hide these patterns away in a library. However, some idioms and patterns don’t inherently fit in a library as reusable code. So they remain merely reusable techniques.
When you look at code smells, you might try and find one to focus on at a time. Find it in your code; apply the recommended refactoring. Then, once you get a feel for it, you will learn about a new code smell. Similarly, you want to look at refactorings one at a time. Find a place where they may apply. Try them out; only when you get them do you move on.
We discussed a few principles so far, YAGNI and KISS. There are a handful of others. There are some attempts at lists, but I can’t find anything comprehensive where each principle is helpful in a general situation. For instance, there is the “SOLID” collection. However, these appear more valuable as an acronym than universally applicable. (Though many letters are helpful, so we’ll start there).
Single Responsibility: don’t use the term “and” in your class or function names.
Is there one logical thing each function does and each class represents? Does it tend to stick to the same level of abstraction? Or does it use a lot of high domain language, then suddenly jump into low-level database code?
Prefer writing new code to modifying old code
This is called the “open-closed” principle in SOLID; however, it’s somewhat hard to understand when framed that way. Designs that allow you to extend them easily are better than those that must have code modified.
Extension means “writing new code,” usually in “new files.” Modifying means changing “old code” in “old files.” Passing functions around themselves (called the “strategy” pattern) or using interfaces are both ways to create “hooks” that new code can plug into new code.
Think in terms of interfaces and abstractions
When different classes have similar methods, you may actually have an interface. If two classes share the names and arguments in methods, consider using an abstract base class to denote those shared methods. What is the code telling you when two classes look very similar?
Don’t repeat yourself (DRY)
Repetitive code is prone to error when it’s updated or maintained. Try and refactor repeated code into reused code. You’ll see more and more repeated code when you think in terms of abstractions.
Repeating yourself twice is sometimes okay. It’s often best to wait until you’ve repeated yourself three times. That way, you know the general outline of the code that needs to be reusable and the parts that vary.
Cohesion is the code’s relationship to other code. Code can be cohesive or not cohesive. Logical cohesion: these things “feel” like they need to live in the same file, which is often pretty good. That being said, the goal of cohesion is that change should cluster. If you make changes in one file, other changes – if they’re needed – should occur in that file!
This is a broader principle than “think in terms of abstractions” and is similar to cohesion. Code coupling is a kind of relationship code has with other code. When a change cascades through code unexpectedly, we say that code is coupled. One common way to reduce coupling is encapsulation – hiding details. If the only thing we see of other code is a clean interface, and that interface hasn’t changed, then our code shouldn’t need to change either.
You can see from many of these principles that it’s all about making change easy while maintaining correctness. Change is easy when code is easy to understand and extend. Are your designs easy to comprehend and grow? Think about these principles and introduce them to your process one by one - research them more thoroughly and try and find instances where they apply to your code as part of your improvement process.
So you’ve tried to incrementally understand and apply knowledge of code smells. You spot them in your older code and avoid them in new outlining and designs.
Then you tried to learn a bit more about refactoring, made many code transformations more “at hand,” and your designs began to look more similar and uniform since they’ll more and more be the result of a refactoring process.
Then you tried to look at some of the more general principles behind the code smells – why is some code easy to change and extend while other code is not? Based on your ability to refactor, targeting these new principles has become within reach.
Finally, let’s talk about known “good” designs. We could have started here; but I wanted to recommend you try and understand why certain things are good first. Otherwise, you’re liable to go pattern crazy. Generally speaking, you don’t want to implement design patterns. You want to refactor to patterns.
Code, as always, is a conversation. Patterns are higher-level strategies. As with all strategies, knowing what to apply means understanding the strengths and weaknesses of any particular pattern in context.
That usually means you try not to go crazy on patterns during the discovery and design phase. Instead, be aware of what problems patterns solve. And similar to DRY’ing out code, apply the rule of three. Consider the pattern if you’ve changed or extended your solution three times or more in a direction that a pattern would make more manageable.
You can easily find a list of the “standard” patterns. These patterns go back to 1990s Java; though many still apply. See if there are idiomatic ways (or libraries!) to implement these patterns in your language today. Thirty years is a long time to try and get patterns into libraries and have them be supported by the language. For instance, most languages have libraries or built-in Observer frameworks.
More modern pattern-oriented approaches would be found in Domain Driven Design. For instance, the “value object” we described earlier is actually a pattern from that book.
There are also various other resources. You can always Google for your domain + “design patterns” and see what comes up. Patterns tend to be very industry-driven. There are patterns for high-frequency trading or video games, for example.
Finally, keep your ears perked for any new “buzzword” in the media. The microservice is now a pattern, as is the monolith (or maybe it’s an anti-pattern?). Event sourcing is a pattern, as well as a technology. The message queue became a pattern. Then it was a technology. Now it is an architecture reliant on the technology. These things change with time.
Look at one thing at a time. Though you may want to do a broad overview to prioritize what might be helpful and not. Find one thing that may be useful to you, and focus on that thing and only that thing. Research, in particular, idioms and frameworks that make that pattern effective in your language. Attempt to refactor to use it – wash, rinse, repeat.
The OODA loop tells us that we can continually improve by observing our situation, orienting around it, deciding what to do with that orientation, then taking action to follow through with our decision.
It’s a great way to keep improving; but it doesn’t exactly tell us how to start.
Outlining is a great way to bootstrap your way into a design OODA loop – becoming an increasingly better software designer and gaining the benefits of well-designed code early in your process.
It fills a significant gap between journaling (like mini-requirements and project management) and drafting (which helps us get down the nuts and bolts of the code).
Outlining boils down to pretending that your design is perfect and already done: what objects would you have? What functions? It’s similar to acceptance test-driven development - we drive our ideal interface forward.
It provides that initial seed and bootstrap to continually improve the design based on the “feedback” the code gives you. Is it easy to extend? Is it helping you understand the domain? Is the coding exercise helping you to learn about the problem you’re trying to solve? Code is more than just a way to tell a computer to do something; it’s a rigorous language with which you can describe any problem. That language can give feedback based on: how it reads and feels, what flags a linter finds, whether it type checks, and so on.
It’s not always perfect. Using outlining to figure out the “right question” to ask can take a while. But it is methodical.
Describe your problem in your journal.
Find the nouns and verbs in your journal. Reify them.
Elaborate in your docstrings and add types to your arguments.
Reify the nouns and verbs in your docstrings.
Use the holes in your design to drive research.
Use feedback from peer reviews to drive research.
While our example stayed in outlining as long as possible, it’s not meant to be used that way. You should often switch between journaling, outlining, and drafting. Each school of techniques will give you some return per unit of effort. These rates of return will change over time based on what problem you’re trying to solve and where the code is. Over time, you’ll develop a sixth sense for where you’ll get the most bang for your buck (including keeping your energy and spirits high).
Each method can also be developed based on your interest. Journaling can learn more from project management, product management, and agile methodologies. Drafting from great programming books and tutorials. Finally, outlining can continue to grow and improve by reading and thinking about design in general.
They also all exhibit a dose-response effect. Even if outlining has given you nothing but occasionally adding a type to a method call, it should benefit you. You can incrementally and iteratively learn these methods; and at each step, you should see some value added after the learning period.
Good luck outlining!
We’re on a mission to make jobs suck less, one software management tip at a time. We need your help!
Do you want to stay current on the latest management tips and data science techniques?
Click “Subscribe to the Soapbox” below for more!
Follow us on Twitter
Follow us on LinkedIn