Code Less, Benefits More: Tips from Python Data Science Handbook

Friday 17 July 2020, by Emma Coffinet

Code Less, Benefits More: Tips from Python Data Science Handbook

If you are searching for an essential data science book from which to Learn Python, then the Python data science handbook is it. Although it does not talk about Python’s basics, it’s ideal for people who want to understand Python’s use for data science.

Coding less and benefitting more is very achievable with some ground knowledge of Python and data science. Here are 10 tips from paper writing services to help you be more efficient.

One abstraction level per function

To write readable codes, all functions need to have a single task. Instead of having very long features, confusing, and doing several things, it’s preferable that there are plenty of such functions. In order to make certain that each function only does one thing, the best thing is to restrict the function to one abstraction level.

The purpose of the abstraction level is to obtain data from the database, then preprocess the data.

Improve readability with type hint

Since the goal in Python is the manipulation of data, the functions might take up different types of data as arguments (Pandas DataFrame, tuples, list, Numpy Array, dict, etc.). So, you must specify each argument type and also the returned objects type.

To do this, you have to have to use hints for specifying each argument type, and the nature of the returned object. This way, the code is not only easier to read but maintain.

Comments

Comments in Python are useful when they use them properly. But they can also cause over-information and confusion when they are misused. They are used when the developed code is not explicit enough.

There are suitable types of Python comments and bad types too. Examples of suitable types are legal comments, informative comments, intent explanation, clarification, warning of consequences, and amplification.

Examples of bad types are unclear comment, redundant, noise information, commented-out code.

Tool for reformatting code

The Python structure makes it essential to be meticulous so that the codes are formatted properly. The number of lines that need skipping might vary based on the declaration in the code.

There are plenty of rules to follow, such as space rules, the list, and line length must not be very long, etc.

Arguments

This might surprise you, but you have to limit the argument as much as you can.

There are different function types. The niladic function (0 argument), monadic function (1 argument), dyadic function (2 arguments), triadic function (3 arguments), and polyadic function (greater than 3 arguments). It’s okay for you to use monadic, while you should avoid the dyadic function. However, triadic and polyadic functions are too much.

This is because it’s hard to understand them as they require conceptual effort, especially with complex workflows in pipelines. Another reason is the unit tests. In order to carry out unit tests on this function, all the possible combinations have to be tested between each argument. This makes this phase more complex

Blocks and indenting

Make the functions small. Otherwise, how do you find an error in a function of 2000 lines? Each statement having a minimum number of lines makes it easier to read the feature.

Let each statement have one line for calling the function and exact name. Limit yourself to a two-level indentation, when there is no choice. Otherwise, the code’s readability will be affected.

Handling error

Managing errors is an integral part of data science, even if it’s not the most exciting. It is vital for supervising your pipeline executions and making the right decisions.

One way is to give the top-level scripts a try because they define the scope for executing your program. If it fails in execution, you will make informed decisions based on the error type as it resumes in the catch.

You can as well make an exception through by developing your own.

Use logger instead of print

While prints are useful for testing and debugging, a logger is more appropriate for producing your code. It is much harder to format prints; it is less readable and carries very little information.

On the other hand, you have more information with loggers, formatting is direct, and there is a level of intelligence that allows you to identify how important the information is (WARNING, INFO, ERROR, etc.).

Docstring

Docstring is essential for Python and has to appear in any statement declaration (method definition, class, or function). Docstring helps with the object documentation and also makes the code explicit.

Multiple line docstrings and single line docstrings are the 2 types of doctrines.

The single-line docstrings come to play when the code fits into one line. It has a concise description and also gives extra information to the ones provided in the function name.

Multiple-line docstrings come into play in more complex cases. It has a more elaborate description that precedes a blank line. It has to fit on one line & only a blank line separates it from other docstrings.

Unit test

Python testing is complicated and broad. But it's essential to make the Python code cleaner. An excellent way to test the effectiveness of testing is by using the F.I.R.S.T. principle and the 3 Laws of TDD.

FIRST Principle

Fast: the test has to sprint as frequently as you test them.

Independent: test has to be independent on their own and not set the conditions for future tests.

Repeatable: the test must run in all environments and on any machine, whether in the production or prototyping phase.

Self-Validating: the test must have a boolean output. It is either passed or not.

Timely: it must be developed at the test phase (the right time), not in the production phase.

Laws of TDD

  • Without a failing unit test, there is no production code.
  • Failing is not compiling, and there’s no more unit test than there is enough to fail.
  • No more production code is needed to pass the unit test that’s currently failing.

Comments

Comments in Python are useful when they use them properly. But they can also cause over-information and confusion when they are misused. They are used when the developed code is not explicit enough.

There are suitable types of Python comments and bad types too. Examples of suitable types are legal comments, informative comments, intent explanation, clarification, warning of consequences, and amplification.

Examples of bad types are unclear comment, redundant, noise information, commented-out code.

Tool for reformatting code

The Python structure makes it essential to be meticulous so that the codes are formatted properly. The number of lines that need skipping might vary based on the declaration in the code.

There are plenty of rules to follow, such as space rules, the list, and line length must not be very long, etc.


About The Author

Emma Coffinet produces content for websites, such as the Best essay writing service the UK, blogs, articles, white papers, and social media platforms. She is keen on capturing the attention of a target audience. Feel free to connect with her on Twitter.

Join the discussion by adding your comments below: